文章/答案/技术大牛

发布

社区首页 >问答首页 >重塑R数据帧，其中唯一的键是一个新行

问重塑R数据帧，其中唯一的键是一个新行
EN

Stack Overflow用户

提问于 2013-10-23 01:41:35

回答 1查看 168关注 0票数 0

我有一个带有客户评论的R DataFrame数据，其中审计师复制了整个评审，并将每个原因代码插入到新行中，从而输入了多个原因代码。我现在拥有的是：

Item    Category        Reason                 Review  
Vacuum  Performance     Bad Suction            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Design          Cord is too short      I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Color           Wrong Color            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size            too big                The boat was way too big, and was slow.
Boat    Performance     slow                   The boat was way too big, and was slow.
Tube    Inflation       low inflation          The tube was not inflated enough

我希望根据共享列(Item和Review)对其进行分组，并为多个原因和类别创建类别和原因列。让我们提前假设，我不知道每个项目的唯一原因和类别的数量，因为我正在向您展示虚拟数据。

所以，我想要的是：

Item    Category.1    Category.2   Category.3  Reason.1       Reason.2           Reason.3      Review  
Vacuum  Performance   Design       Color       Bad Suction    Cord is too short  Wrong Color   I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size          Performance    NA        too big        slow               NA            The boat was way too big, and was slow.
Tube    Inflation     NA             NA        low inflation  NA                 NA            The tube was not inflated enough

我尝试使用以下代码，但没有结果：

reshape(data, direction = "wide", 
        idvar = c("Item", "Review" ), 
        timevar = c("Category", "Reason"))

以下是数据：

dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat", 
"Boat", "Tube"), Category = c("Performance", "Design", 
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color", 
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.", 
"The tube was not inflated enough")), .Names = c("Item", "Category", 
"Reason", "Review"), class = "data.frame", row.names = c(NA, 
-6L))

reshape

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-10-23 02:20:29

您只需要从"item“列中创建一个"time”变量：

Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along)
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview")
names(out)
#  [1] "Item"       "Category.1" "Reason.1"   "Review.1"   "Category.2" "Reason.2"  
#  [7] "Review.2"   "Category.3" "Reason.3"   "Review.3"

下面是来自结果"wide“数据集的”类别“和”原因“列(因此它适合屏幕)。

out[, grep("Item|Category|Reason", names(out))]
#     Item  Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Vacuum Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4   Boat        Size       too big Performance              slow       <NA>        <NA>
# 6   Tube   Inflation low inflation        <NA>              <NA>       <NA>        <NA>

此外，library(reshape)不引用您试图使用的内置reshape函数。相反，这是"reshape2“包的旧版本。

重新阅读您的问题和评论，因为您可以假设“评论”列可以被视为自己的ID列，因此只需相应地更改reshape命令：

reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview")
#     Item
# 1 Vacuum
# 4   Boat
# 6   Tube
#                                                                                        Review
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
# 4                                                     The boat was way too big, and was slow.
# 6                                                            The tube was not inflated enough
#    Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4        Size       too big Performance              slow       <NA>        <NA>
# 6   Inflation low inflation        <NA>              <NA>       <NA>        <NA>

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/19531315

复制

相似问题

问重塑R数据帧，其中唯一的键是一个新行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重塑R数据帧，其中唯一的键是一个新行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重塑R数据帧，其中唯一的键是一个新行
EN