假设我有一个数据框架
Data <- data.frame("Name", "Age", "Weight", "School", "Book" , "Author")
Data[1,] <- c("Paul", 26, 150, "Helgason U", "Intro to Smooth Manifolds", "John Lee")
Data[2,] <- c("Paul", 26, 150, "Helgason U", "A Tale of Two Cities", "Charles Dickens")
Data[3,] <- c("Paul", 26, 150, "Helgason U", "Fear and Loathing in Las Vegas", "Hunter Thompson")
Data[4,] <- c("Paul", 26, 150, "Helgason U", "Gravity's Rainbow", "Thomas Pynchon")
Data[5,] <- c("David", 35, 165, "Turing College", "Brave New World", "Aldous Huxley")
Data[6,] <- c("David", 35, 165, "Turing College", "Vashista's Yoga", "Vashista")
Data[7,] <- c("David", 35, 165, "Turing College", "C++ For Dummies", "Anonymous")我想压缩数据,这样对应于同一个人的所有行都可以放进一行,而众多的书籍和作者可以连在一起。换句话说,我希望我的产出是:
Name Age Weight School Books Authors
Paul 26 150 Helgason U Intro to Smooth Manifolds John Lee
A Tale of Two Cities Charles Dickens
Fear and Loathing in Las Vegas Hunter Thompson
Gravity's Rainbow Thomas Pynchon
David 35 165 Turing College Brave New World Aldous Huxley
Vashista's Yoga Vashista
C++ For Dummies Anonymous理想情况下,我希望这些书可以连接为"Intro to Smooth Manifolds\nA Tale of Two Cities\nFear and Loathing in Las Vegas\nGravity's Rainbow"。
最初我使用了for循环,但这太慢了,因为我的实际数据远大于此。想了解一下我是怎么循环的:
for (i in 1:L){
Names = subset(Data, Data$Name == unique(Data$Names)[i])
rows = nrow(Names)
Name_Matches = which(duplicated(Names[,Cols]) | duplicated(Names[nrow(Names):1, Cols])[nrow(Names):1])
Name_UnMtchs = setdiff(1:nrow(Names), Name_Matches)
Books = Names$Book[Name_Matches]
New_Books = paste(as.character(Books), collapse = "\n")
Authors = Names$Author[Name_Matches]
New_Authors = paste(Authors, collapse = "\n")
New_Data[count_New, Cols] = Names[Name_Matches[1], Cols]
New_Data$Book = New_Books
New_Data$Author = New_Authors
count_New = count_New + 1
}对于一个人(年龄、体重、学校、姓名),Cols是我知道的条目的列索引,L是数据帧中唯一名称的数目,count_New是在1初始化开始的计数器,New_Data是一个空数据框架,与Data列相同。我可以使用什么函数来合并数据而不使用这种for循环呢?
发布于 2015-08-11 01:08:54
这类事情可以用基R来完成,但是最好是使用一个专门为数据争论设计的包。
在dplyr中:
require(dplyr)
Data %>%
group_by(Name, Age, Weight, School) %>%
summarise(Books=paste(Book, collapse="\n"), Authors=paste(Author, collapse="\n"))不过,我想这是你真正想要的。它不是将书名(和作者)粘贴到每个名称的一个字符串中,而是将它们转换为标题的向量,然后再用于进一步的处理。
Data %>%
group_by(Name, Age, Weight, School) %>%
summarise(Books=list(Book), Authors=list(Author))发布于 2015-08-11 04:43:13
考虑一下这个R基解决方案(尽管没有效率或优雅):
# OBTAIN UNIQUE PERSONS DATAFRAME
Data2 <- unique(Data[1:4])
rownames(Data2) <- NULL
# GET LIST OF DISTINCT PERSONS
persons = c(Data2[1])
# LOOP THROUGH DISTINCT PERSONS
for (j in persons){
for (k in 0:length(persons)+1){
# BOOK COLUMN (PULL INTO LIST, THEN CONCATENATE)
books <- c(Data[Data$Name==j[k],][5])
booksconcat <- paste(books[[1]], collapse="\n")
Data2$Book[Data2$Name==j[k]] <- booksconcat
# AUTHOR COLUMN (PULL INTO LIST, THEN CONCATENATE)
authors <- c(Data[Data$Name==j[k],][6])
authorsconcat <- paste(authors[[1]], collapse="\n")
Data2$Author[Data2$Name==j[k]] <- authorsconcat
}
}https://stackoverflow.com/questions/31931422
复制相似问题