文章/答案/技术大牛

发布

社区首页 >问答首页 >如何控制R中旋转(更宽)列的顺序？第839期

问如何控制R中旋转(更宽)列的顺序？第839期
EN

Stack Overflow用户

提问于 2020-12-27 15:57:13

回答 3查看 1.9K关注 0票数 3

如果要旋转的值包含在多个列中，我可以将数据以更广泛的格式枢轴。

us_rent_income %>%
  pivot_wider(
    names_from = variable,
    names_glue = "{variable}_{.value}",
    values_from = c(estimate, moe)
  )

# A tibble: 52 x 6
   GEOID NAME                 income_estimate rent_estimate income_moe rent_moe
   <chr> <chr>                          <dbl>         <dbl>      <dbl>    <dbl>
 1 01    Alabama                        24476           747        136        3
 2 02    Alaska                         32940          1200        508       13
 3 04    Arizona                        27517           972        148        4
 4 05    Arkansas                       23789           709        165        5
 5 06    California                     29454          1358        109        3
 6 08    Colorado                       32401          1125        109        5
 7 09    Connecticut                    35326          1123        195        5
 8 10    Delaware                       31560          1076        247       10
 9 11    District of Columbia           43198          1424        681       17
10 12    Florida                        25952          1077         70        3
# ... with 42 more rows

在这个代码输出中，我希望列的顺序是income_estimate、income_moe、rent_estimate和rent_moe。设置names_sort = T无助于此。更改names_glue中的顺序也没有帮助。我知道我可以通过select和其他函数重新排序列，但是我只想知道在pivot_wider中有什么参数可以这样做吗？

编辑这个问题似乎已经在开发中；至少已经讨论了这里和这里。

pivot

tidyverse

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-02-17 16:04:34

随着tidyr 1.2.0的出现，现在使用参数names_vary变得非常容易。

library(tidyr)
us_rent_income %>%
  pivot_wider(
    names_from = variable,
    names_glue = "{variable}_{.value}",
    values_from = c(estimate, moe),
    names_vary = 'slowest'
  )
#> # A tibble: 52 x 6
#>    GEOID NAME                 income_estimate income_moe rent_estimate rent_moe
#>    <chr> <chr>                          <dbl>      <dbl>         <dbl>    <dbl>
#>  1 01    Alabama                        24476        136           747        3
#>  2 02    Alaska                         32940        508          1200       13
#>  3 04    Arizona                        27517        148           972        4
#>  4 05    Arkansas                       23789        165           709        5
#>  5 06    California                     29454        109          1358        3
#>  6 08    Colorado                       32401        109          1125        5
#>  7 09    Connecticut                    35326        195          1123        5
#>  8 10    Delaware                       31560        247          1076       10
#>  9 11    District of Columbia           43198        681          1424       17
#> 10 12    Florida                        25952         70          1077        3
#> # ... with 42 more rows

在包帮助页上给出的names_vary的解释是-

names_vary当names_from标识具有多个唯一值的列(或多个列)并提供多个values_from列时，结果列名应按什么顺序组合？

“最快”的names_from值变化最快，导致表单的列命名方案：value1_name1、value1_name2、value2_name1、value2_name2。这是默认的。
"slowest"最慢地改变names_from值，导致表单的列命名方案：value1_name1、value2_name1、value1_name2、value2_name2。

票数 5

Stack Overflow用户

发布于 2020-12-27 17:09:49

对于细粒度的控件，可以使用pivot_wider_spec()，它允许您定义结果数据框架的规范：

library(tidyverse)

spec <- tibble(
  .name = c("income_estimate", "income_moe", "rent_estimate", "rent_moe"),
  .value = c("estimate", "moe", "estimate", "moe"),
  variable = c("income", "income", "rent", "rent")
)

us_rent_income %>% pivot_wider_spec(spec)

输出：

# A tibble: 52 x 6
   GEOID NAME                 income_estimate income_moe rent_estimate rent_moe
   <chr> <chr>                          <dbl>      <dbl>         <dbl>    <dbl>
 1 01    Alabama                        24476        136           747        3
 2 02    Alaska                         32940        508          1200       13
 3 04    Arizona                        27517        148           972        4
 4 05    Arkansas                       23789        165           709        5
 5 06    California                     29454        109          1358        3
 6 08    Colorado                       32401        109          1125        5
 7 09    Connecticut                    35326        195          1123        5
 8 10    Delaware                       31560        247          1076       10
 9 11    District of Columbia           43198        681          1424       17
10 12    Florida                        25952         70          1077        3
# … with 42 more rows

通过一些预处理步骤，您可以避免手动输入spec中的所有值。

field <- us_rent_income %>% distinct(variable) %>% pull()
sub_field <- colnames(us_rent_income)[4:5]

pivot_names <- map(field, ~paste(., sub_field, sep = "_")) %>% unlist()
pivot_vals <- rep(sub_field, 2)
pivot_vars <- map(field, rep, 2) %>% unlist()

spec <- tibble(.name = pivot_names, .value = pivot_vals, variable = pivot_vars)

us_rent_income %>% pivot_wider_spec(spec)

票数 3

Stack Overflow用户

发布于 2020-12-27 16:48:08

在旋转之后，我们可以通过对列名的子字符串进行select执行order。

library(dplyr)
library(tidyr)
library(stringr0
us_rent_income %>%
  pivot_wider(
    names_from = variable,
    names_glue = "{variable}_{.value}",
    values_from = c(estimate, moe)
 ) %>%
  select(GEOID, NAME, order(str_remove(names(.)[-(1:2)], "_.*")) + 2)

-output

# A tibble: 52 x 6
#   GEOID NAME                 income_estimate income_moe rent_estimate rent_moe
#   <chr> <chr>                          <dbl>      <dbl>         <dbl>    <dbl>
# 1 01    Alabama                        24476        136           747        3
# 2 02    Alaska                         32940        508          1200       13
# 3 04    Arizona                        27517        148           972        4
# 4 05    Arkansas                       23789        165           709        5
# 5 06    California                     29454        109          1358        3
# 6 08    Colorado                       32401        109          1125        5
# 7 09    Connecticut                    35326        195          1123        5
# 8 10    Delaware                       31560        247          1076       10
# 9 11    District of Columbia           43198        681          1424       17
#10 12    Florida                        25952         70          1077        3
# … with 42 more rows

ordering基于names_from列，因此names_sort不影响来自values_from的列名，即在OP的解决方案中，如果在names_glue中更改顺序，则不会改变。在数据中，“变量”列unique值出现在income中，后面是rent。所以，当默认的names_sort = FALSE时，它会执行这个顺序。如果它被更改为TRUE，它会按字母顺序排列，这再次是i，然后是r。

它可以被检查，如果我们首先将“long”重塑为'long'，unite列，然后执行pivot_wider

us_rent_income %>%
  pivot_longer(cols = c(estimate, moe)) %>% 
  unite(variable, variable, name) %>% 
  pivot_wider(names_from = variable, values_from = value)

-output

# A tibble: 52 x 6
#   GEOID NAME                 income_estimate income_moe rent_estimate rent_moe
#   <chr> <chr>                          <dbl>      <dbl>         <dbl>    <dbl>
# 1 01    Alabama                        24476        136           747        3
# 2 02    Alaska                         32940        508          1200       13
# 3 04    Arizona                        27517        148           972        4
# 4 05    Arkansas                       23789        165           709        5
# 5 06    California                     29454        109          1358        3
# 6 08    Colorado                       32401        109          1125        5
# 7 09    Connecticut                    35326        195          1123        5
# 8 10    Delaware                       31560        247          1076       10
# 9 11    District of Columbia           43198        681          1424       17
#10 12    Florida                        25952         70          1077        3
# … with 42 more rows

现在，通过使用factor更改为自定义顺序进行检查，并指定names_sort = TRUE，它将按照我们想要的顺序进行。

us_rent_income %>%
   pivot_longer(cols = c(estimate, moe)) %>% 
   unite(variable, variable, name) %>%
   mutate(variable = factor(variable, 
    levels = c('income_estimate', 'rent_moe', 'rent_estimate', 'income_moe'))) %>%
    pivot_wider(names_from = variable, values_from = value, names_sort = TRUE)
# A tibble: 52 x 6
#   GEOID NAME                 income_estimate rent_moe rent_estimate income_moe
#   <chr> <chr>                          <dbl>    <dbl>         <dbl>      <dbl>
# 1 01    Alabama                        24476        3           747        136
# 2 02    Alaska                         32940       13          1200        508
# 3 04    Arizona                        27517        4           972        148
# 4 05    Arkansas                       23789        5           709        165
# 5 06    California                     29454        3          1358        109
# 6 08    Colorado                       32401        5          1125        109
# 7 09    Connecticut                    35326        5          1123        195
# 8 10    Delaware                       31560       10          1076        247
# 9 11    District of Columbia           43198       17          1424        681
#10 12    Florida                        25952        3          1077         70
# … with 42 more rows

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65467620

复制

相似问题

问如何控制R中旋转(更宽)列的顺序？第839期
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何控制R中旋转(更宽)列的顺序？第839期EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何控制R中旋转(更宽)列的顺序？第839期
EN