我有一份数据如下:
+------+-----+----------+
| from | to | priority |
+------+-----+----------+
| 1 | 8 | 1 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 4 |
| 6 | 2 | 5 |
| 7 | 8 | 2 |
| 4 | 3 | 5 |
| 2 | 1 | 1 |
| 6 | 6 | 4 |
| 1 | 7 | 5 |
| 8 | 4 | 6 |
| 9 | 5 | 3 |
+------+-----+----------+我的目标是根据from列对" to“列进行分组,但是如果变量已经出现在这两个列中,我也不想进一步考虑它们,总优先级将是所有组优先级的总和。
因此,产生的数据将如下所示:
+------+------+----------------+
| from | to | Total Priority |
+------+------+----------------+
| 1 | 8, 7 | 6 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 9 | 5 | 3 |
+------+------+----------------+此外,我希望在分组时保持与原始表相同的顺序。
我能够使用下面的"splitstackshape“包折叠from列
library(splitstackshape)
cSplit(df, 'to', sep = ','
+ , direction = 'long')[, .(to = toString(unique(to)))
+ , by = from]这确实引入了dupicate值,我想知道是否有一种方法可以使用任何其他包来获得所需的结果
发布于 2020-01-30 05:50:39
使用“注释”末尾可重复显示的DF,通过from对DF2进行排序,然后遍历其行,删除带有重复的任何行。我们需要一个循环在这里,因为每次删除取决于先前的。最后对结果进行总结。
library(dplyr)
DF2 <- arrange(DF, from)
i <- 1
while(i <= nrow(DF2)) {
ix <- seq_len(i-1)
dup <- with(DF2, (to[i] %in% c(to[ix], from[ix])) | (from[i] %in% to[ix]))
if (dup) DF2 <- DF2[-i, ] else i <- i + 1
}
DF2 %>%
group_by(from) %>%
summarize(to = toString(to), priority = sum(priority)) %>%
ungroup给予:
# A tibble: 4 x 3
from to priority
<int> <chr> <int>
1 1 8, 7 6
2 2 6 1
3 3 4 1
4 9 5 3备注
Lines <- "from | to | priority
1 | 8 | 1
2 | 6 | 1
3 | 4 | 1
4 | 5 | 3
5 | 6 | 4
6 | 2 | 5
7 | 8 | 2
4 | 3 | 5
2 | 1 | 1
6 | 6 | 4
1 | 7 | 5
8 | 4 | 6
9 | 5 | 3"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE)发布于 2020-01-30 04:51:18
目前还不清楚你是如何创建团队的,但这至少会让你得到正确的答案:
library(tidyverse)
df <- tribble(~from, ~to, ~priority,
1,8,1,
2,6,1,
3,4,1,
4,5,3,
5,6,4,
6,2,5,
7,8,2,
4,3,5,
2,1,1,
6,6,4,
1,7,5,
8,4,6,
9,5,3)
df %>%
group_by(from) %>%
summarise(to = toString(to),
`Total Priority` = sum(priority, na.rm=T))你的结果是:
# A tibble: 9 x 3
from to `Total Priority`
<dbl> <chr> <dbl>
1 1 8, 7 6
2 2 6, 1 2
3 3 4 1
4 4 5, 3 8
5 5 6 4
6 6 2, 6 9
7 7 8 2
8 8 4 6
9 9 5 3https://stackoverflow.com/questions/59979111
复制相似问题