首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R: NA和dcast

R: NA和dcast
EN

Stack Overflow用户
提问于 2021-07-04 14:37:10
回答 1查看 56关注 0票数 0

我正在使用R编程语言。

假设我有以下数据:

代码语言:javascript
复制
my_data <- data.frame(

"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)

#view original data format
 my_data

  id  name points gender country
1  1  john    150   male     usa
2  1 jason    165   male     usa
3  1  jack    183   male     usa
4  1   jim    191   male     usa
5  2  john    151   male     usa
6  2 jason    166   male     usa
7  2  jack    184   male     usa
8  2   jim    192   male     usa

让我们假设对于上述数据:“性别”和“国家”将始终具有相同的值。此外,这4个名称将始终一起出现-每次它们一起出现时,所有这些名称的"id“都是相同的数字。唯一可以改变的数字是它们从一个迭代到另一个迭代的“点”的数量(即,它们的"id")。

这就是我想要做的:

代码语言:javascript
复制
my_data_1 <- data.frame(

"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)

#view desired data format

  my_data_1
  id john_points jason_points jack_points jim_points gender country
1  1         150          165         183        191   male     usa
2  2         151          166         184        192   male     usa

我发现了之前的堆栈溢出post How to reshape data from long to wide format,其中的"data.table“库和"dcast”函数可以用来解决这类问题。

我尝试了"dcast“函数的不同组合,但我无法获得预期的最终结果:

代码语言:javascript
复制
 library(data.table)
 
#attempt 1 : not correct
 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1:        <NA>   <NA>   <NA>   <NA>   <NA>      1      2   <NA>   <NA>
2:        <NA>   <NA>   <NA>      1      2   <NA>   <NA>   <NA>   <NA>
3:         usa   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>      1      2
4:        <NA>      1      2   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>

#attempt 2 : not correct

 setDT(my_data)
 dcast(my_data, name ~ points, value.var = c("gender", "country"))
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192
1:        <NA>
2:        <NA>
3:         usa
4:        <NA>

#attempt 3 - not correct:

 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
    name  150  151  165  166  183  184  191  192
1:  jack <NA> <NA> <NA> <NA>    1    2 <NA> <NA>
2: jason <NA> <NA>    1    2 <NA> <NA> <NA> <NA>
3:   jim <NA> <NA> <NA> <NA> <NA> <NA>    1    2
4:  john    1    2 <NA> <NA> <NA> <NA> <NA> <NA>

有人能教我怎么解决这个问题吗?为什么会有这么多?是否有可能获得我所展示的最终表格(即my_data_1)?是否可以以name_points (例如john_points)的格式重命名变量?

谢谢

EN

回答 1

Stack Overflow用户

发布于 2021-07-04 14:43:14

我会使用tidyr,因为从长格式到宽格式的转换非常简单。

代码语言:javascript
复制
library(tidyr)
wide = my_data %>% 
  tidyr::spread(name, points)

结果

代码语言:javascript
复制
  id gender country jack jason jim john
1  1   male     usa  183   165 191  150
2  2   male     usa  184   166 192  151
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68242306

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档