首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按样本列名称平均质量规格峰值计数

按样本列名称平均质量规格峰值计数
EN

Stack Overflow用户
提问于 2022-06-29 20:50:15
回答 1查看 24关注 0票数 1

希望这是直截了当的,我只是想得太仔细了。我有一个质谱(MS)的峰值计数矩阵,其中峰值是行,列是样本名。样本地点有几个取样点,我想在各地点之间添加计数。

例如,一个具有三个副本的样本被识别为"S19S_0010_Sed_Field_ICR.D_p2“、"S19S_0010_Sed_Field_ICR.M_p2”和"S19S_0010_Sed_Field_ICR.U_p2“,它们位于同一位置,但位于下游(D)、中流(M)和上游(U)。前两个样本每个都有一个特定峰值的计数,所以我想将这三个样本合并为"S19S_0010_Sed_Field_ICR.all_p2“和两个波长计数。示例数据集:

代码语言:javascript
复制
> dput(data.sed.ex)
structure(list(S19S_0004_Sed_Field_ICR.M_p15 = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0), S19S_0006_Sed_Field_ICR.D_p2 = c(0, 0, 0, 
0, 0, 0, 1, 1, 0, 0), S19S_0006_Sed_Field_ICR.M_p2 = c(0, 0, 
0, 0, 0, 0, 1, 0, 0, 0), S19S_0006_Sed_Field_ICR.U_p2 = c(0, 
0, 0, 0, 0, 0, 1, 1, 0, 0), S19S_0008_Sed_Field_ICR.M_p15 = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0), S19S_0009_Sed_Field_ICR.M_p2 = c(0, 
0, 1, 0, 0, 0, 1, 0, 0, 0), S19S_0009_Sed_Field_ICR.U_p2 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.D_p15 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.M_p15 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.U_p15 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c("200.002276", "200.015107", 
"200.0564158", "200.0565393", "200.0578394", "200.0677581", "200.092796", 
"200.1291723", "200.1292836", "200.9238455"), class = "data.frame")

提亚

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-06-29 21:34:37

也许长篇大论的争论会有所帮助。在这种格式中,您可以使用summeansd等按组(例如示例或示例和位置)进行汇总。

希望这能帮上忙

转换为长格式

代码语言:javascript
复制
## dd is the `data.sed.ex` object above

library(tidyverse)                                                                                                                                                           
ddLong <- dd %>%
  rownames_to_column(var = "peak") %>%
  pivot_longer(cols = matches("^S")) %>%
  mutate(sample = gsub("(.*)\\.(.*)", "\\1", name),           ## pull sample info                                                                                          
         location = gsub("(.*)\\.([DMU])_(.*)", "\\2", name), ## pull D M U                                                                                                
         p = gsub("(.*)\\.([DMU])_(p.*)", "\\3", name),       ## get p2, p15                                                                                               
         peak = as.numeric(peak))             ## coerce peak to numeric                                                                                                    

ddLong
#> # A tibble: 100 × 6
#>     peak name                          value sample               location p    
#>    <dbl> <chr>                         <dbl> <chr>                <chr>    <chr>
#>  1  200. S19S_0004_Sed_Field_ICR.M_p15     0 S19S_0004_Sed_Field… M        p15  
#>  2  200. S19S_0006_Sed_Field_ICR.D_p2      0 S19S_0006_Sed_Field… D        p2   
#>  3  200. S19S_0006_Sed_Field_ICR.M_p2      0 S19S_0006_Sed_Field… M        p2   
#>  4  200. S19S_0006_Sed_Field_ICR.U_p2      0 S19S_0006_Sed_Field… U        p2   
#>  5  200. S19S_0008_Sed_Field_ICR.M_p15     0 S19S_0008_Sed_Field… M        p15  
#>  6  200. S19S_0009_Sed_Field_ICR.M_p2      0 S19S_0009_Sed_Field… M        p2   
#>  7  200. S19S_0009_Sed_Field_ICR.U_p2      0 S19S_0009_Sed_Field… U        p2   
#>  8  200. S19S_0010_Sed_Field_ICR.D_p15     0 S19S_0010_Sed_Field… D        p15  
#>  9  200. S19S_0010_Sed_Field_ICR.M_p15     0 S19S_0010_Sed_Field… M        p15  
#> 10  200. S19S_0010_Sed_Field_ICR.U_p15     0 S19S_0010_Sed_Field… U        p15  
#> # … with 90 more rows

按一个或多个组进行总结

代码语言:javascript
复制
## summarise using group_by + verbs                                                                                                                                          
ddLong %>%                                                                                                                                                                   
  group_by(sample, location) %>%                                                                                                                                           
  summarise(n = n(),                                                                                                                                                       
            sum.value = sum(value),                                                                                                                                        
            mean.peak = mean(peak))                                                                                                                                        
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 10 × 5
#> # Groups:   sample [5]
#>    sample                  location     n sum.value mean.peak
#>    <chr>                   <chr>    <int>     <dbl>     <dbl>
#>  1 S19S_0004_Sed_Field_ICR M           10         0      200.
#>  2 S19S_0006_Sed_Field_ICR D           10         2      200.
#>  3 S19S_0006_Sed_Field_ICR M           10         1      200.
#>  4 S19S_0006_Sed_Field_ICR U           10         2      200.
#>  5 S19S_0008_Sed_Field_ICR M           10         1      200.
#>  6 S19S_0009_Sed_Field_ICR M           10         2      200.
#>  7 S19S_0009_Sed_Field_ICR U           10         1      200.
#>  8 S19S_0010_Sed_Field_ICR D           10         1      200.
#>  9 S19S_0010_Sed_Field_ICR M           10         1      200.
#> 10 S19S_0010_Sed_Field_ICR U           10         0      200.

                                                                                                    
ddLong %>%                                                                                                                                                                   
    group_by(sample, p) %>%                                             
    summarise(n = n(),                                                                                                                                                       
              sum.value = sum(value),                                                                                                                                        
              mean.peak = mean(peak))                                                                                                                                        
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 5 × 5
#> # Groups:   sample [5]
#>   sample                  p         n sum.value mean.peak
#>   <chr>                   <chr> <int>     <dbl>     <dbl>
#> 1 S19S_0004_Sed_Field_ICR p15      10         0      200.
#> 2 S19S_0006_Sed_Field_ICR p2       30         5      200.
#> 3 S19S_0008_Sed_Field_ICR p15      10         1      200.
#> 4 S19S_0009_Sed_Field_ICR p2       20         3      200.
#> 5 S19S_0010_Sed_Field_ICR p15      30         2      200.
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72807708

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档