我希望你能帮我解决这个问题,我有如下的数据:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow每次与base (base_yellow,base_red)会面时,它都会创建新的组,即预期的输出,如下所示,这将提供一个新的变量:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow发布于 2022-03-26 10:03:22
这是你可能能够适应你的需要的东西。
首先,创建一个向量vec,其中包括colour以"base“开头的行位置。
然后,您可以使用来自map2_dfr的purrr,它将提供基于vec的从开始到结束的位置范围的colour。这将有助于最终在多个行中使用相同colour的情况。在此步骤中还创建了一个分组变量group。
在按group进行分组之后,您只能保留具有多个colour和str_c的colour组,以便为同一个group将它们折叠在一起。
library(tidyverse)
vec <- which(grepl("^base", df$colour))
map2_dfr(
vec[-length(vec)],
vec[-1],
~df[.x:.y, ],
.id = "group"
) %>%
group_by(group) %>%
filter(n_distinct(colour) > 1) %>%
summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
select(-group)输出
ID colour
<int> <chr>
1 1 base_yellow; blue; base_red
2 1 base_red; blue; pink; blue; base_yellow
3 2 base_yellow; blue; base_red
4 2 base_red; blue; pink; blue; base_yellow
5 3 base_yellow; blue; pink; blue; base_yellow
6 4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow发布于 2022-03-26 02:36:27
试试这个:
library(tidyverse)
# Read data
mydata <- tibble::tribble(~ID,~colour,
1,"base_yellow",
1,"blue",
1,"base_red",
1,"blue",
1,"pink",
1,"blue",
1,"base_yellow",
2,"base_yellow",
2,"blue",
2,"base_red",
2,"blue",
2,"pink",
2,"blue",
2,"base_yellow",
3,"base_yellow",
3,"blue",
3,"pink",
3,"blue",
3,"base_yellow",
4,"base_yellow",
4,"blue",
4,"green",
4,"blue",
4,"green",
4,"blue",
4,"pink",
4,"blue",
4,"base_yellow")
# Add column to group by words starting with "base_"
mydata <- mydata %>%
mutate(base = str_starts(colour, "base_")) %>%
mutate(base = ifelse(base, colour, NA)) %>%
fill(base, .direction = "down")
# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>%
group_by(ID, base) %>%
summarise(colour = paste(colour, collapse = ";")) %>%
select(-base)结果:
> mydata
# A tibble: 6 × 2
# Groups: ID [4]
ID colour
<dbl> <chr>
1 1 base_red;blue;pink;blue
2 1 base_yellow;blue;base_yellow
3 2 base_red;blue;pink;blue
4 2 base_yellow;blue;base_yellow
5 3 base_yellow;blue;pink;blue;base_yellow
6 4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellowhttps://stackoverflow.com/questions/71620536
复制相似问题