我有一些具有以下特征的数据:id, group, sex, datebirth, date1, date2, date3, ctrl1, ctrl2, ctrl3, ab4v1, ab4v2, ab4v3。
我想要的是将此数据帧转换为另一个数据帧,其中包含以下长格式的列:id, group, sex, datebirth, version, date, ctrl, ab4。
(注意:version将得到值1、2或3)。
通常,我会在R中使用重塑函数,但我必须使用pivot_longer。我该如何进行这种转换呢?
我试过这样的东西:
df %>% pivot_longer(cols = -c("id","group","sex","datebirth"),
names_to = c("version",".value"),
names_pattern = "([A-Za-z]+)(\\d+)")但我什么也得不到。有什么想法吗?提前谢谢你。
这就是我所拥有的:
id group sex datebirth date1 date2 date3 ctrl1 ctrl2 ctrl3 ab4v1 ab4v2 ab4v3
1 1 A Male 1975-01-08 2010-10-10 2011-11-12 2011-12-12 183 835 139 745 584 817
2 2 B Male 1998-05-12 2010-10-10 2011-11-12 2011-12-12 172 727 214 793 653 499
3 3 A Male 2005-12-28 2010-10-10 2011-11-23 2011-12-23 157 667 222 664 505 924
4 4 C Female 1957-07-01 2010-10-10 2011-11-25 2011-12-25 186 123 344 584 582 653这就是我想要的:
id group sex datebirth version date ctrl ab4
1 1 A Male 1975-01-08 1 2010-10-10 183 745
2 2 B Male 1998-05-12 1 2010-10-10 172 793
3 3 A Male 2005-12-28 1 2010-10-10 157 664
4 4 C Female 1957-07-01 1 2010-10-10 186 584
.........发布于 2020-07-23 02:03:47
我们需要改变names_to的顺序。我们可以使用names_sep或names_pattern。唯一的区别是names_sep指向一个分隔符。这里的分隔符是字母((?<=[A-Za-z]))和数字((?=[0-9]$))之间的边界。在这里,它意味着检查在字母之后和数字之前的边界。使用names_pattern,我们将捕获组((...))中的特定字符集。OP的帖子使用该"([A-Za-z]+)(\\d+)",即一个或多个字母作为第一组,数字作为第二组。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = date1:ab4v3, names_to = c(".value", "version"),
names_sep = "(?<=[A-Za-z])(?=[0-9]$)")
# A tibble: 12 x 8
# id group sex datebirth version date ctrl ab4v
# <int> <chr> <chr> <chr> <chr> <chr> <int> <int>
# 1 1 A Male 1975-01-08 1 2010-10-10 183 745
# 2 1 A Male 1975-01-08 2 2011-11-12 835 584
# 3 1 A Male 1975-01-08 3 2011-12-12 139 817
# 4 2 B Male 1998-05-12 1 2010-10-10 172 793
# 5 2 B Male 1998-05-12 2 2011-11-12 727 653
# 6 2 B Male 1998-05-12 3 2011-12-12 214 499
# 7 3 A Male 2005-12-28 1 2010-10-10 157 664
# 8 3 A Male 2005-12-28 2 2011-11-23 667 505
# 9 3 A Male 2005-12-28 3 2011-12-23 222 924
#10 4 C Female 1957-07-01 1 2010-10-10 186 584
#11 4 C Female 1957-07-01 2 2011-11-25 123 582
#12 4 C Female 1957-07-01 3 2011-12-25 344 653数据
df <- structure(list(id = 1:4, group = c("A", "B", "A", "C"), sex = c("Male",
"Male", "Male", "Female"), datebirth = c("1975-01-08", "1998-05-12",
"2005-12-28", "1957-07-01"), date1 = c("2010-10-10", "2010-10-10",
"2010-10-10", "2010-10-10"), date2 = c("2011-11-12", "2011-11-12",
"2011-11-23", "2011-11-25"), date3 = c("2011-12-12", "2011-12-12",
"2011-12-23", "2011-12-25"), ctrl1 = c(183L, 172L, 157L, 186L
), ctrl2 = c(835L, 727L, 667L, 123L), ctrl3 = c(139L, 214L, 222L,
344L), ab4v1 = c(745L, 793L, 664L, 584L), ab4v2 = c(584L, 653L,
505L, 582L), ab4v3 = c(817L, 499L, 924L, 653L)), class = "data.frame",
row.names = c("1",
"2", "3", "4"))发布于 2020-07-22 23:59:08
下面的代码很难看,但我相信它可能会起作用。它是一系列pivot_longer语句,每次只处理一个宽格式的变量。
library(dplyr)
library(tidyr)
fun <- function(X, Var){
Vard <- paste0(Var, "\\d")
X %>%
select(1:4, matches( {{ Vard }} )) %>%
pivot_longer(
cols = matches( {{ Vard }} ),
names_to = "version",
values_to = Var
) %>%
mutate(version = sub(Var, "", version))
}
vars <- c("date", "ctrl", "ab4v")
Reduce(function(x, y) merge(x, y), lapply(vars, function(v) fun(df1, v)))
# id group sex datebirth version date ctrl ab4v
#1 1 A Male 1975-01-08 1 2010-10-10 183 745
#2 1 A Male 1975-01-08 2 2011-11-12 835 584
#3 1 A Male 1975-01-08 3 2011-12-12 139 817
#4 2 B Male 1998-05-12 1 2010-10-10 172 793
#5 2 B Male 1998-05-12 2 2011-11-12 727 653
#6 2 B Male 1998-05-12 3 2011-12-12 214 499
#7 3 A Male 2005-12-28 1 2010-10-10 157 664
#8 3 A Male 2005-12-28 2 2011-11-23 667 505
#9 3 A Male 2005-12-28 3 2011-12-23 222 924
#10 4 C Female 1957-07-01 1 2010-10-10 186 584
#11 4 C Female 1957-07-01 2 2011-11-25 123 582
#12 4 C Female 1957-07-01 3 2011-12-25 344 653https://stackoverflow.com/questions/63037501
复制相似问题