我有一个tibble,看起来像这样。
# A tibble: 1,000 x 3
id question answer
<chr> <chr> <chr>
1 aaa What is your favorite color? Green
2 aaa What is your favorite band? Green Day
3 aaabb What is your favorite color? Blue
4 aaabb What is your favorite band? Blue
5 ccc What is your favorite color? Blue
6 ccc What is the difference between you and me? Five bank accounts
# ... with more rows我想把它扩展成一个宽的数据框架。我用了这段代码。
aTibble %>% distinct() %>% spread(question, answer)但是,我最终得到了一个充满空行的数据框。
# A tibble: 1,000 x 3
id V1 What is your favorite color? What is your favorite band? What is the difference between you and me?
1 aaa NA NA NA
2 aaa NA NA NA
3 aaabb NA NA NA
4 aaabb NA NA NA
5 ccc NA NA NA
6 ccc NA NA NA
# ... with more rows在最初的tibble中,一些行有ID,然后问题和答案为null。单个ID没有重复的问题。也就是说,不同的ID可以回答不同的问题,它们并不都有相同的问题。
另外,我没有写V1行,这也不在我最初的tibble中。它出现在排列后()。
令人沮丧的是,当我在一个小数据集上执行这个函数时,它工作得很好。当我对整个数据集(大约150K条记录)执行该函数时,我得到NAs。
发布于 2019-02-26 13:18:06
很难理解为什么这不起作用。从reshape2中使用dcast是一个很好的选择。你也可以达到同样的效果。
aTibble %>% distinct() %>% dcast(id ~ question, value.var = "answer")https://stackoverflow.com/questions/54878514
复制相似问题