我已经从俄克拉荷马州选举委员会下载了与最近选举有关的数据,并希望将这些数据转化为能够进行更合理分析的数据。不幸的是,数据的格式并没有使这变得容易。
目前,数据如下:
precinct race_description cand_party cand_tot_votes
1 10001 GOVERNOR DEM 106
2 10001 GOVERNOR IND 5
3 10001 GOVERNOR LIB 7
4 10001 GOVERNOR REP 118
5 10002 GOVERNOR DEM 84
6 10002 GOVERNOR IND 9
7 10002 GOVERNOR LIB 3
8 10002 GOVERNOR REP 151
9 10003 GOVERNOR DEM 36
10 10003 GOVERNOR IND 2有什么办法让我把这个缩小到每一个分局都是一行,其他数据就在一列中吗?因此,我会有一排分区号10001的结果,和一个列标题,类似于"Gov Dem","Gov Rep",等等。
老实说,我不知道该怎么处理这个问题。
发布于 2022-11-11 18:58:55
您可以使用names_sep参数在pivot_wider中为名称组合两列。
library(tidyverse)
df %>%
pivot_wider(
names_from = c(race_description, cand_party),
names_sep = "_",
values_from = cand_tot_votes
)输出
precinct GOVERNOR_DEM GOVERNOR_IND GOVERNOR_LIB GOVERNOR_REP
<int> <int> <int> <int> <int>
1 10001 106 5 7 118
2 10002 84 9 3 151
3 10003 36 2 NA NA如果你想要不同的标题,那么你总是可以在旋转之前清理这些标题:
df %>%
mutate(race_description = str_extract(str_to_title(race_description), "^.{3}"),
cand_party = str_to_title(cand_party)) %>%
pivot_wider(
names_from = c(race_description, cand_party),
names_sep = "_",
values_from = cand_tot_votes
)
# precinct Gov_Dem Gov_Ind Gov_Lib Gov_Rep
# <int> <int> <int> <int> <int>
#1 10001 106 5 7 118
#2 10002 84 9 3 151
#3 10003 36 2 NA NA数据
df <- structure(list(precinct = c(10001L, 10001L, 10001L, 10001L, 10002L,
10002L, 10002L, 10002L, 10003L, 10003L), race_description = c("GOVERNOR",
"GOVERNOR", "GOVERNOR", "GOVERNOR", "GOVERNOR", "GOVERNOR", "GOVERNOR",
"GOVERNOR", "GOVERNOR", "GOVERNOR"), cand_party = c("DEM", "IND",
"LIB", "REP", "DEM", "IND", "LIB", "REP", "DEM", "IND"), cand_tot_votes = c(106L,
5L, 7L, 118L, 84L, 9L, 3L, 151L, 36L, 2L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))https://stackoverflow.com/questions/74406929
复制相似问题