希望这是直截了当的,我只是想得太仔细了。我有一个质谱(MS)的峰值计数矩阵,其中峰值是行,列是样本名。样本地点有几个取样点,我想在各地点之间添加计数。
例如,一个具有三个副本的样本被识别为"S19S_0010_Sed_Field_ICR.D_p2“、"S19S_0010_Sed_Field_ICR.M_p2”和"S19S_0010_Sed_Field_ICR.U_p2“,它们位于同一位置,但位于下游(D)、中流(M)和上游(U)。前两个样本每个都有一个特定峰值的计数,所以我想将这三个样本合并为"S19S_0010_Sed_Field_ICR.all_p2“和两个波长计数。示例数据集:
> dput(data.sed.ex)
structure(list(S19S_0004_Sed_Field_ICR.M_p15 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0), S19S_0006_Sed_Field_ICR.D_p2 = c(0, 0, 0,
0, 0, 0, 1, 1, 0, 0), S19S_0006_Sed_Field_ICR.M_p2 = c(0, 0,
0, 0, 0, 0, 1, 0, 0, 0), S19S_0006_Sed_Field_ICR.U_p2 = c(0,
0, 0, 0, 0, 0, 1, 1, 0, 0), S19S_0008_Sed_Field_ICR.M_p15 = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 0), S19S_0009_Sed_Field_ICR.M_p2 = c(0,
0, 1, 0, 0, 0, 1, 0, 0, 0), S19S_0009_Sed_Field_ICR.U_p2 = c(0,
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.D_p15 = c(0,
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.M_p15 = c(0,
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.U_p15 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c("200.002276", "200.015107",
"200.0564158", "200.0565393", "200.0578394", "200.0677581", "200.092796",
"200.1291723", "200.1292836", "200.9238455"), class = "data.frame")提亚
发布于 2022-06-29 21:34:37
也许长篇大论的争论会有所帮助。在这种格式中,您可以使用sum、mean、sd等按组(例如示例或示例和位置)进行汇总。
希望这能帮上忙
转换为长格式
## dd is the `data.sed.ex` object above
library(tidyverse)
ddLong <- dd %>%
rownames_to_column(var = "peak") %>%
pivot_longer(cols = matches("^S")) %>%
mutate(sample = gsub("(.*)\\.(.*)", "\\1", name), ## pull sample info
location = gsub("(.*)\\.([DMU])_(.*)", "\\2", name), ## pull D M U
p = gsub("(.*)\\.([DMU])_(p.*)", "\\3", name), ## get p2, p15
peak = as.numeric(peak)) ## coerce peak to numeric
ddLong
#> # A tibble: 100 × 6
#> peak name value sample location p
#> <dbl> <chr> <dbl> <chr> <chr> <chr>
#> 1 200. S19S_0004_Sed_Field_ICR.M_p15 0 S19S_0004_Sed_Field… M p15
#> 2 200. S19S_0006_Sed_Field_ICR.D_p2 0 S19S_0006_Sed_Field… D p2
#> 3 200. S19S_0006_Sed_Field_ICR.M_p2 0 S19S_0006_Sed_Field… M p2
#> 4 200. S19S_0006_Sed_Field_ICR.U_p2 0 S19S_0006_Sed_Field… U p2
#> 5 200. S19S_0008_Sed_Field_ICR.M_p15 0 S19S_0008_Sed_Field… M p15
#> 6 200. S19S_0009_Sed_Field_ICR.M_p2 0 S19S_0009_Sed_Field… M p2
#> 7 200. S19S_0009_Sed_Field_ICR.U_p2 0 S19S_0009_Sed_Field… U p2
#> 8 200. S19S_0010_Sed_Field_ICR.D_p15 0 S19S_0010_Sed_Field… D p15
#> 9 200. S19S_0010_Sed_Field_ICR.M_p15 0 S19S_0010_Sed_Field… M p15
#> 10 200. S19S_0010_Sed_Field_ICR.U_p15 0 S19S_0010_Sed_Field… U p15
#> # … with 90 more rows按一个或多个组进行总结
## summarise using group_by + verbs
ddLong %>%
group_by(sample, location) %>%
summarise(n = n(),
sum.value = sum(value),
mean.peak = mean(peak))
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 10 × 5
#> # Groups: sample [5]
#> sample location n sum.value mean.peak
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 S19S_0004_Sed_Field_ICR M 10 0 200.
#> 2 S19S_0006_Sed_Field_ICR D 10 2 200.
#> 3 S19S_0006_Sed_Field_ICR M 10 1 200.
#> 4 S19S_0006_Sed_Field_ICR U 10 2 200.
#> 5 S19S_0008_Sed_Field_ICR M 10 1 200.
#> 6 S19S_0009_Sed_Field_ICR M 10 2 200.
#> 7 S19S_0009_Sed_Field_ICR U 10 1 200.
#> 8 S19S_0010_Sed_Field_ICR D 10 1 200.
#> 9 S19S_0010_Sed_Field_ICR M 10 1 200.
#> 10 S19S_0010_Sed_Field_ICR U 10 0 200.
ddLong %>%
group_by(sample, p) %>%
summarise(n = n(),
sum.value = sum(value),
mean.peak = mean(peak))
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 5 × 5
#> # Groups: sample [5]
#> sample p n sum.value mean.peak
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 S19S_0004_Sed_Field_ICR p15 10 0 200.
#> 2 S19S_0006_Sed_Field_ICR p2 30 5 200.
#> 3 S19S_0008_Sed_Field_ICR p15 10 1 200.
#> 4 S19S_0009_Sed_Field_ICR p2 20 3 200.
#> 5 S19S_0010_Sed_Field_ICR p15 30 2 200.https://stackoverflow.com/questions/72807708
复制相似问题