我在R中有一个数据帧,第一列中存储了重复的索引。
df <- data.frame("Index" = c(1,2,1), "Age" = c("Jane Doe","John Doe","Jane
Doe"), "Address" = c("123 Fake Street","780 York Street","456 Elm
Street"),"Telephone" = c("xxx-xxx-xxxx","zzz-zzz-zzzz","yyy-yyy-yyyy"))
Index Name Address Telephone
1 Jane Doe 123 Fake Street xxx-xxx-xxxx
2 John Doe 780 York Street zzz-zzz-zzzz
1 Jane Doe 456 Elm Street yyy-yyy-yyyy我想将上面的数据框架组合成如下所示:
Index Name Address Telephone Address 2 Telephone 2
1 Jane, Doe 123 Fake Street xxx-xxx-xxxx 456 Elm Street yyy-yyy-yyyy
2 John Doe 780 York Street zzz-zzz-zzzz NA NA我可以在同一个数据帧上使用"merge“吗?还是他们在R中的另一个命令可以完成这个任务?谢谢。
发布于 2018-06-19 00:58:40
用tidyverse
df %>%
group_by(Age) %>%
summarize_at(vars(Telephone,Address),paste, collapse="|") %>%
separate(Address,into=c("Address1","Address2"),sep="\\|") %>%
separate(Telephone,into=c("Telephone1","Telephone2"),sep="\\|")
# # A tibble: 2 x 5
# Age Telephone1 Telephone2 Address1 Address2
# <fct> <chr> <chr> <chr> <chr>
# 1 Jane Doe xxx-xxx-xxxx yyy-yyy-yyyy 123 Fake Street 456 Elm Street
# 2 John Doe zzz-zzz-zzzz <NA> 780 York Street <NA> 一般来说,我们可以使用summarize和list嵌套值,并以正确的格式将内容重新格式化为unnest:
df %>%
group_by(Age) %>%
summarize_at(vars(Telephone,Address),
~lst(setNames(invoke(tibble,.),seq_along(.)))) %>%
unnest(.sep = "")
# # A tibble: 2 x 5
# Age Telephone1 Telephone2 Address1 Address2
# <fct> <fct> <fct> <fct> <fct>
# 1 Jane Doe xxx-xxx-xxxx yyy-yyy-yyyy 123 Fake Street 456 Elm Street
# 2 John Doe zzz-zzz-zzzz <NA> 780 York Street <NA> 汇总中的函数有点吓人,但是如果您想再次使用它,可以将它包装成更友好的名称(我添加了一个名称参数,以防万一):
nest2row <- function(x,names = seq_along(x))
lst(setNames(invoke(tibble,x),names[seq_along(x)]))
df %>%
group_by(Age) %>%
summarize_at(vars(Telephone,Address), nest2row) %>%
unnest(.sep = "")我想这将是我建议的整洁的方式:
df %>%
group_by(Age) %>%
mutate(id=row_number()) %>%
gather(key,value,Address,Telephone) %>%
unite(key,key,id,sep="") %>%
spread(key,value)
# # A tibble: 2 x 6
# # Groups: Age [2]
# Index Age Address1 Address2 Telephone1 Telephone2
# <dbl> <fct> <chr> <chr> <chr> <chr>
# 1 1 Jane Doe 123 Fake Street 456 Elm Street xxx-xxx-xxxx yyy-yyy-yyyy
# 2 2 John Doe 780 York Street <NA> zzz-zzz-zzzz <NA>在我的第二个解决方案中,您保留了自己的因素,并且没有这种尴尬,在惯用的方法的同一列中强制使用不同类型的变量。
发布于 2018-06-19 00:48:38
试着做这样的事情:
df <- data.frame("Index" = c(1,2,1), "Age" = c("Jane Doe","John Doe","Jane Doe"),
"Address" = c("123 Fake Street","780 York Street","456 Elm Street"),
"Telephone" = c("xxx-xxx-xxxx","zzz-zzz-zzzz","yyy-yyy-yyyy"),
stringsAsFactors = F)
df$unindex=paste(df$Index,df$Age)
sapply(unique(df$unindex),function(li){ # li="1 Jane Doe"
dft=df[li==df$unindex,3:4]
if(nrow(dft)==1)dft else c(t(dft))
})https://stackoverflow.com/questions/50919060
复制相似问题