如果要旋转的值包含在多个列中,我可以将数据以更广泛的格式枢轴。
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe)
)
# A tibble: 52 x 6
GEOID NAME income_estimate rent_estimate income_moe rent_moe
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 01 Alabama 24476 747 136 3
2 02 Alaska 32940 1200 508 13
3 04 Arizona 27517 972 148 4
4 05 Arkansas 23789 709 165 5
5 06 California 29454 1358 109 3
6 08 Colorado 32401 1125 109 5
7 09 Connecticut 35326 1123 195 5
8 10 Delaware 31560 1076 247 10
9 11 District of Columbia 43198 1424 681 17
10 12 Florida 25952 1077 70 3
# ... with 42 more rows在这个代码输出中,我希望列的顺序是income_estimate、income_moe、rent_estimate和rent_moe。设置names_sort = T无助于此。更改names_glue中的顺序也没有帮助。我知道我可以通过select和其他函数重新排序列,但是我只想知道在pivot_wider中有什么参数可以这样做吗?
发布于 2022-02-17 16:04:34
随着tidyr 1.2.0的出现,现在使用参数names_vary变得非常容易。
library(tidyr)
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe),
names_vary = 'slowest'
)
#> # A tibble: 52 x 6
#> GEOID NAME income_estimate income_moe rent_estimate rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Columbia 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # ... with 42 more rows在包帮助页上给出的names_vary的解释是-
names_vary当names_from标识具有多个唯一值的列(或多个列)并提供多个values_from列时,结果列名应按什么顺序组合?
names_from值变化最快,导致表单的列命名方案:value1_name1、value1_name2、value2_name1、value2_name2。这是默认的。"slowest"最慢地改变names_from值,导致表单的列命名方案:value1_name1、value2_name1、value1_name2、value2_name2。发布于 2020-12-27 17:09:49
对于细粒度的控件,可以使用pivot_wider_spec(),它允许您定义结果数据框架的规范:
library(tidyverse)
spec <- tibble(
.name = c("income_estimate", "income_moe", "rent_estimate", "rent_moe"),
.value = c("estimate", "moe", "estimate", "moe"),
variable = c("income", "income", "rent", "rent")
)
us_rent_income %>% pivot_wider_spec(spec)输出:
# A tibble: 52 x 6
GEOID NAME income_estimate income_moe rent_estimate rent_moe
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 01 Alabama 24476 136 747 3
2 02 Alaska 32940 508 1200 13
3 04 Arizona 27517 148 972 4
4 05 Arkansas 23789 165 709 5
5 06 California 29454 109 1358 3
6 08 Colorado 32401 109 1125 5
7 09 Connecticut 35326 195 1123 5
8 10 Delaware 31560 247 1076 10
9 11 District of Columbia 43198 681 1424 17
10 12 Florida 25952 70 1077 3
# … with 42 more rows通过一些预处理步骤,您可以避免手动输入spec中的所有值。
field <- us_rent_income %>% distinct(variable) %>% pull()
sub_field <- colnames(us_rent_income)[4:5]
pivot_names <- map(field, ~paste(., sub_field, sep = "_")) %>% unlist()
pivot_vals <- rep(sub_field, 2)
pivot_vars <- map(field, rep, 2) %>% unlist()
spec <- tibble(.name = pivot_names, .value = pivot_vals, variable = pivot_vars)
us_rent_income %>% pivot_wider_spec(spec)发布于 2020-12-27 16:48:08
在旋转之后,我们可以通过对列名的子字符串进行select执行order。
library(dplyr)
library(tidyr)
library(stringr0
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe)
) %>%
select(GEOID, NAME, order(str_remove(names(.)[-(1:2)], "_.*")) + 2)-output
# A tibble: 52 x 6
# GEOID NAME income_estimate income_moe rent_estimate rent_moe
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 01 Alabama 24476 136 747 3
# 2 02 Alaska 32940 508 1200 13
# 3 04 Arizona 27517 148 972 4
# 4 05 Arkansas 23789 165 709 5
# 5 06 California 29454 109 1358 3
# 6 08 Colorado 32401 109 1125 5
# 7 09 Connecticut 35326 195 1123 5
# 8 10 Delaware 31560 247 1076 10
# 9 11 District of Columbia 43198 681 1424 17
#10 12 Florida 25952 70 1077 3
# … with 42 more rowsordering基于names_from列,因此names_sort不影响来自values_from的列名,即在OP的解决方案中,如果在names_glue中更改顺序,则不会改变。在数据中,“变量”列unique值出现在income中,后面是rent。所以,当默认的names_sort = FALSE时,它会执行这个顺序。如果它被更改为TRUE,它会按字母顺序排列,这再次是i,然后是r。
它可以被检查,如果我们首先将“long”重塑为'long',unite列,然后执行pivot_wider
us_rent_income %>%
pivot_longer(cols = c(estimate, moe)) %>%
unite(variable, variable, name) %>%
pivot_wider(names_from = variable, values_from = value)-output
# A tibble: 52 x 6
# GEOID NAME income_estimate income_moe rent_estimate rent_moe
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 01 Alabama 24476 136 747 3
# 2 02 Alaska 32940 508 1200 13
# 3 04 Arizona 27517 148 972 4
# 4 05 Arkansas 23789 165 709 5
# 5 06 California 29454 109 1358 3
# 6 08 Colorado 32401 109 1125 5
# 7 09 Connecticut 35326 195 1123 5
# 8 10 Delaware 31560 247 1076 10
# 9 11 District of Columbia 43198 681 1424 17
#10 12 Florida 25952 70 1077 3
# … with 42 more rows现在,通过使用factor更改为自定义顺序进行检查,并指定names_sort = TRUE,它将按照我们想要的顺序进行。
us_rent_income %>%
pivot_longer(cols = c(estimate, moe)) %>%
unite(variable, variable, name) %>%
mutate(variable = factor(variable,
levels = c('income_estimate', 'rent_moe', 'rent_estimate', 'income_moe'))) %>%
pivot_wider(names_from = variable, values_from = value, names_sort = TRUE)
# A tibble: 52 x 6
# GEOID NAME income_estimate rent_moe rent_estimate income_moe
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 01 Alabama 24476 3 747 136
# 2 02 Alaska 32940 13 1200 508
# 3 04 Arizona 27517 4 972 148
# 4 05 Arkansas 23789 5 709 165
# 5 06 California 29454 3 1358 109
# 6 08 Colorado 32401 5 1125 109
# 7 09 Connecticut 35326 5 1123 195
# 8 10 Delaware 31560 10 1076 247
# 9 11 District of Columbia 43198 17 1424 681
#10 12 Florida 25952 3 1077 70
# … with 42 more rowshttps://stackoverflow.com/questions/65467620
复制相似问题