我正在使用R编程语言。
假设我有以下数据:
my_data <- data.frame(
"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)
#view original data format
my_data
id name points gender country
1 1 john 150 male usa
2 1 jason 165 male usa
3 1 jack 183 male usa
4 1 jim 191 male usa
5 2 john 151 male usa
6 2 jason 166 male usa
7 2 jack 184 male usa
8 2 jim 192 male usa让我们假设对于上述数据:“性别”和“国家”将始终具有相同的值。此外,这4个名称将始终一起出现-每次它们一起出现时,所有这些名称的"id“都是相同的数字。唯一可以改变的数字是它们从一个迭代到另一个迭代的“点”的数量(即,它们的"id")。
这就是我想要做的:
my_data_1 <- data.frame(
"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)
#view desired data format
my_data_1
id john_points jason_points jack_points jim_points gender country
1 1 150 165 183 191 male usa
2 2 151 166 184 192 male usa我发现了之前的堆栈溢出post How to reshape data from long to wide format,其中的"data.table“库和"dcast”函数可以用来解决这类问题。
我尝试了"dcast“函数的不同组合,但我无法获得预期的最终结果:
library(data.table)
#attempt 1 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1: <NA> <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: <NA> <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: usa <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: <NA> 1 2 <NA> <NA> <NA> <NA> <NA> <NA>
#attempt 2 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country"))
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192
1: <NA>
2: <NA>
3: usa
4: <NA>
#attempt 3 - not correct:
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
name 150 151 165 166 183 184 191 192
1: jack <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: jason <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: john 1 2 <NA> <NA> <NA> <NA> <NA> <NA>有人能教我怎么解决这个问题吗?为什么会有这么多?是否有可能获得我所展示的最终表格(即my_data_1)?是否可以以name_points (例如john_points)的格式重命名变量?
谢谢
发布于 2021-07-04 14:43:14
我会使用tidyr,因为从长格式到宽格式的转换非常简单。
library(tidyr)
wide = my_data %>%
tidyr::spread(name, points)结果
id gender country jack jason jim john
1 1 male usa 183 165 191 150
2 2 male usa 184 166 192 151https://stackoverflow.com/questions/68242306
复制相似问题