我有一个带有客户评论的R DataFrame数据,其中审计师复制了整个评审,并将每个原因代码插入到新行中,从而输入了多个原因代码。我现在拥有的是:
Item Category Reason Review
Vacuum Performance Bad Suction I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Design Cord is too short I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Color Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size too big The boat was way too big, and was slow.
Boat Performance slow The boat was way too big, and was slow.
Tube Inflation low inflation The tube was not inflated enough我希望根据共享列(Item和Review)对其进行分组,并为多个原因和类别创建类别和原因列。让我们提前假设,我不知道每个项目的唯一原因和类别的数量,因为我正在向您展示虚拟数据。
所以,我想要的是:
Item Category.1 Category.2 Category.3 Reason.1 Reason.2 Reason.3 Review
Vacuum Performance Design Color Bad Suction Cord is too short Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size Performance NA too big slow NA The boat was way too big, and was slow.
Tube Inflation NA NA low inflation NA NA The tube was not inflated enough我尝试使用以下代码,但没有结果:
reshape(data, direction = "wide",
idvar = c("Item", "Review" ),
timevar = c("Category", "Reason"))以下是数据:
dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat",
"Boat", "Tube"), Category = c("Performance", "Design",
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color",
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.",
"The tube was not inflated enough")), .Names = c("Item", "Category",
"Reason", "Review"), class = "data.frame", row.names = c(NA,
-6L))发布于 2013-10-23 02:20:29
您只需要从"item“列中创建一个"time”变量:
Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along)
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview")
names(out)
# [1] "Item" "Category.1" "Reason.1" "Review.1" "Category.2" "Reason.2"
# [7] "Review.2" "Category.3" "Reason.3" "Review.3" 下面是来自结果"wide“数据集的”类别“和”原因“列(因此它适合屏幕)。
out[, grep("Item|Category|Reason", names(out))]
# Item Category.1 Reason.1 Category.2 Reason.2 Category.3 Reason.3
# 1 Vacuum Performance Bad Suction Design Cord is too short Color Wrong Color
# 4 Boat Size too big Performance slow <NA> <NA>
# 6 Tube Inflation low inflation <NA> <NA> <NA> <NA>此外,library(reshape)不引用您试图使用的内置reshape函数。相反,这是"reshape2“包的旧版本。
重新阅读您的问题和评论,因为您可以假设“评论”列可以被视为自己的ID列,因此只需相应地更改reshape命令:
reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview")
# Item
# 1 Vacuum
# 4 Boat
# 6 Tube
# Review
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
# 4 The boat was way too big, and was slow.
# 6 The tube was not inflated enough
# Category.1 Reason.1 Category.2 Reason.2 Category.3 Reason.3
# 1 Performance Bad Suction Design Cord is too short Color Wrong Color
# 4 Size too big Performance slow <NA> <NA>
# 6 Inflation low inflation <NA> <NA> <NA> <NA>https://stackoverflow.com/questions/19531315
复制相似问题