我有一个数据,采取类似的形式,玩具数据。如果var1、var2和var3都是相等的值,则我希望合并行,从而在合并的行中创建数据的组合。对于第4-6行,行中有不同的值,我想知道是否有一种方法可以将它们放在同一列中,中间有一个分隔符。
df <- data.frame(var1 = c("1635", "1635", "1729", "1847", "1847", "1847"),
var2 = c("Aa", "Aa", "Bb", "Cc", "Cc", "Cc"),
var3 = c("28", "28", "85", "27", "27", "27"),
var4 = c("apple", NA, "orange", "pear", NA, NA),
var5 = c(NA, "tree", NA, NA, "ground", "desk")
)所以输出应该是这样的:

发布于 2022-06-21 14:25:55
在R基,你会做:
aggregate(.~var1+var2+var3, df, \(x)toString(unique(na.omit(x))), na.action = identity)
var1 var2 var3 var4 var5
1 1847 Cc 27 pear ground, desk
2 1635 Aa 28 apple tree
3 1729 Bb 85 orange 在潮汐中:
library(tidyverse)
df %>%
group_by(var1,var2,var3) %>%
summarize(across(var4:var5, ~toString(unique(na.omit(.x)))),.groups = 'drop')
# Groups: var1, var2 [3]
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 1635 Aa 28 apple "tree"
2 1729 Bb 85 orange ""
3 1847 Cc 27 pear "ground, desk"发布于 2022-06-21 14:27:35
使用dplyr,您可以使用group_by的三列,然后使用summarize连接字符串,如果它们不是NA。
library(dplyr)
df %>%
group_by(var1, var2, var3) %>%
summarize(across(var4:var5, ~ifelse(all(is.na(.x)), NA, paste0(na.omit(.x), collapse = ","))), .groups = "drop")
# A tibble: 3 × 5
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 1635 Aa 28 apple tree
2 1729 Bb 85 orange NA
3 1847 Cc 27 pear ground,desk发布于 2022-06-21 14:40:52
用data.table
setDT(df)
df[,
lapply(.SD, \(x) if (all(is.na(x))) NA_character_ else paste(na.omit(x), collapse = "; ")),
by = var1:var3]
setDF(df)
# var1 var2 var3 var4 var5
# <char> <char> <char> <char> <char>
# 1: 1635 Aa 28 apple tree
# 2: 1729 Bb 85 orange <NA>
# 3: 1847 Cc 27 pear ground; deskhttps://stackoverflow.com/questions/72702455
复制相似问题