首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从宽到长数据帧使用pivot_longer函数时的R问题

从宽到长数据帧使用pivot_longer函数时的R问题
EN

Stack Overflow用户
提问于 2020-07-22 23:10:33
回答 2查看 72关注 0票数 1

我有一些具有以下特征的数据:id, group, sex, datebirth, date1, date2, date3, ctrl1, ctrl2, ctrl3, ab4v1, ab4v2, ab4v3

我想要的是将此数据帧转换为另一个数据帧,其中包含以下长格式的列:id, group, sex, datebirth, version, date, ctrl, ab4

(注意:version将得到值1、2或3)。

通常,我会在R中使用重塑函数,但我必须使用pivot_longer。我该如何进行这种转换呢?

我试过这样的东西:

代码语言:javascript
复制
df %>% pivot_longer(cols = -c("id","group","sex","datebirth"), 
                    names_to = c("version",".value"), 
                    names_pattern = "([A-Za-z]+)(\\d+)")

但我什么也得不到。有什么想法吗?提前谢谢你。

这就是我所拥有的:

代码语言:javascript
复制
  id group    sex  datebirth    date1      date2      date3     ctrl1 ctrl2 ctrl3 ab4v1 ab4v2 ab4v3
1  1     A   Male 1975-01-08 2010-10-10 2011-11-12 2011-12-12   183   835   139   745   584   817
2  2     B   Male 1998-05-12 2010-10-10 2011-11-12 2011-12-12   172   727   214   793   653   499
3  3     A   Male 2005-12-28 2010-10-10 2011-11-23 2011-12-23   157   667   222   664   505   924
4  4     C Female 1957-07-01 2010-10-10 2011-11-25 2011-12-25   186   123   344   584   582   653

这就是我想要的:

代码语言:javascript
复制
      id group   sex    datebirth   version      date      ctrl   ab4   
1     1    A    Male    1975-01-08     1      2010-10-10   183    745  
2     2    B    Male    1998-05-12     1      2010-10-10   172    793  
3     3    A    Male    2005-12-28     1      2010-10-10   157    664 
4     4    C   Female   1957-07-01     1      2010-10-10   186    584  
.........
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-07-23 02:03:47

我们需要改变names_to的顺序。我们可以使用names_sepnames_pattern。唯一的区别是names_sep指向一个分隔符。这里的分隔符是字母((?<=[A-Za-z]))和数字((?=[0-9]$))之间的边界。在这里,它意味着检查在字母之后和数字之前的边界。使用names_pattern,我们将捕获组((...))中的特定字符集。OP的帖子使用该"([A-Za-z]+)(\\d+)",即一个或多个字母作为第一组,数字作为第二组。

代码语言:javascript
复制
library(dplyr)
library(tidyr)
df %>% 
 pivot_longer(cols = date1:ab4v3, names_to = c(".value", "version"), 
         names_sep = "(?<=[A-Za-z])(?=[0-9]$)")
# A tibble: 12 x 8
#      id group sex    datebirth  version date        ctrl  ab4v
#   <int> <chr> <chr>  <chr>      <chr>   <chr>      <int> <int>
# 1     1 A     Male   1975-01-08 1       2010-10-10   183   745
# 2     1 A     Male   1975-01-08 2       2011-11-12   835   584
# 3     1 A     Male   1975-01-08 3       2011-12-12   139   817
# 4     2 B     Male   1998-05-12 1       2010-10-10   172   793
# 5     2 B     Male   1998-05-12 2       2011-11-12   727   653
# 6     2 B     Male   1998-05-12 3       2011-12-12   214   499
# 7     3 A     Male   2005-12-28 1       2010-10-10   157   664
# 8     3 A     Male   2005-12-28 2       2011-11-23   667   505
# 9     3 A     Male   2005-12-28 3       2011-12-23   222   924
#10     4 C     Female 1957-07-01 1       2010-10-10   186   584
#11     4 C     Female 1957-07-01 2       2011-11-25   123   582
#12     4 C     Female 1957-07-01 3       2011-12-25   344   653

数据

代码语言:javascript
复制
df <- structure(list(id = 1:4, group = c("A", "B", "A", "C"), sex = c("Male", 
"Male", "Male", "Female"), datebirth = c("1975-01-08", "1998-05-12", 
"2005-12-28", "1957-07-01"), date1 = c("2010-10-10", "2010-10-10", 
"2010-10-10", "2010-10-10"), date2 = c("2011-11-12", "2011-11-12", 
"2011-11-23", "2011-11-25"), date3 = c("2011-12-12", "2011-12-12", 
"2011-12-23", "2011-12-25"), ctrl1 = c(183L, 172L, 157L, 186L
), ctrl2 = c(835L, 727L, 667L, 123L), ctrl3 = c(139L, 214L, 222L, 
344L), ab4v1 = c(745L, 793L, 664L, 584L), ab4v2 = c(584L, 653L, 
505L, 582L), ab4v3 = c(817L, 499L, 924L, 653L)), class = "data.frame",
row.names = c("1", 
"2", "3", "4"))
票数 2
EN

Stack Overflow用户

发布于 2020-07-22 23:59:08

下面的代码很难看,但我相信它可能会起作用。它是一系列pivot_longer语句,每次只处理一个宽格式的变量。

代码语言:javascript
复制
library(dplyr)
library(tidyr)

fun <- function(X, Var){
  Vard <- paste0(Var, "\\d")
  X %>%
    select(1:4, matches( {{ Vard }} )) %>%
    pivot_longer(
      cols = matches( {{ Vard }} ),
      names_to = "version",
      values_to = Var
    ) %>%
    mutate(version = sub(Var, "", version))
}

vars <- c("date", "ctrl", "ab4v")

Reduce(function(x, y) merge(x, y), lapply(vars, function(v) fun(df1, v)))
#   id group    sex  datebirth version       date ctrl ab4v
#1   1     A   Male 1975-01-08       1 2010-10-10  183  745
#2   1     A   Male 1975-01-08       2 2011-11-12  835  584
#3   1     A   Male 1975-01-08       3 2011-12-12  139  817
#4   2     B   Male 1998-05-12       1 2010-10-10  172  793
#5   2     B   Male 1998-05-12       2 2011-11-12  727  653
#6   2     B   Male 1998-05-12       3 2011-12-12  214  499
#7   3     A   Male 2005-12-28       1 2010-10-10  157  664
#8   3     A   Male 2005-12-28       2 2011-11-23  667  505
#9   3     A   Male 2005-12-28       3 2011-12-23  222  924
#10  4     C Female 1957-07-01       1 2010-10-10  186  584
#11  4     C Female 1957-07-01       2 2011-11-25  123  582
#12  4     C Female 1957-07-01       3 2011-12-25  344  653
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63037501

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档