文章/答案/技术大牛

发布

社区首页 >问答首页 >R phyloseq中忽略丢失参数的有效子集

问R phyloseq中忽略丢失参数的有效子集
EN

Stack Overflow用户

提问于 2019-10-25 22:06:17

回答 1查看 363关注 0票数 3

我在工作中经常使用phyloseq。我的数据集通常包含多个条件或参数，这些条件或参数需要以相同的方式进行分析(例如，夏季或冬季的细菌以及Lake1或Lake2中的细菌的相同图)，因此我希望对此使用函数。我写了一个子集函数，它允许我通过循环来组合多个参数。输出存储在列表中以供进一步分析。

然而，这似乎相当笨拙。所以我的第一个问题是关于函数的改进。

1)具体来说，我想知道

a)使用多个for loops来生成子集是一个好主意。

b)还可以优化for loops和lapply的组合。和

c)也许有更好的方法来防止现有列表被再次添加相同对象的新迭代而无法识别？我实现了这一点，因为我在开发代码时有很多测试执行。

这里讨论了for循环是否比一般的应用慢：lapply vs for loop - Performance R

我认为phyloseq内部调用了which，所以它不必是特定于phyloseq的解决方案。

2)我的第二个问题是如何处理这种情况，如果不是所有的搜索参数都出现在所有的子集中？因此，在下面的例子中，如果没有丹麦男性，“丹麦”和"M“的组合将会中断。我想避免这种情况，在这个例子中，只有3个子集(丹麦x F，美国x F，美国x M)，而不是4个子集。目前，函数需要适应每个特殊的子集，这就破坏了编写它的初衷。

library(phyloseq)
data(enterotype)
# reduce the size of the data set
phyloseq <- filter_taxa(enterotype, function (x) {sum(x > 0.001) >= 1}, prune = TRUE)

# arguments for the subsetting function
phyloseq_object <- phyloseq
Nationality <- c("american", "danish")
Gender <- c("F", "M")

# define a function to obtain sample subsets from the phyloseq object 
# per combination of parameters
get_sample_subsets <- function(phyloseq_object, nation, gender) {
  sample_subset <- sample_data(phyloseq_object)[ which(sample_data(phyloseq_object)$Nationality == nation &
    sample_data(phyloseq_object)$Gender == gender),]
  phyloseq_subset <- merge_phyloseq(tax_table(phyloseq_object),
    otu_table(phyloseq_object),
    #refseq(phyloseq_object),
    sample_subset)
  phyloseq_subset2 <- filter_taxa(phyloseq_subset, function (x) {sum(x > 0) >= 1 }, prune = TRUE)
  return(phyloseq_subset2)
}

# here we pass the arguments for subsetting over two for loops
# to create all possible combinations of the subset parameters etc.
# the subsets are stored within a list, which has to be empty before running the loops 
sample_subset_list <- list()
if(length(sample_subset_list) == 0) {
  for (nations in Nationality) {
    for (gender in Gender) {
      tmp <- get_sample_subsets(phyloseq_object = phyloseq_object,
        nation = nations, gender = gender)
      sample_subset_list[[paste(nations, gender, sep = "_")]] <- tmp
    }
  }
  print(sample_subset_list)
} else {
  print("list is not empty, abort to prevent appending...")
}

# You could now for example use the output to calculate ordinations for each subset (this data set has too few entries per subset for that)

# create a list where the distance metrics for the sample subsets are stored
ordination_nmds <- list()
ordination_nmds <- lapply(sample_subset_list, ordinate, method = "NMDS",
  dist = "bray", try = 100, autotransform = TRUE)

subset

phyloseq

function

回答 1

Stack Overflow用户

发布于 2019-11-14 19:08:07

适用于S3，但不适用于S4 (请参阅注释)

因为我不熟悉S4，所以如果有更好的答案，我可能会删除这个答案。

基于我的评论，这里有一些东西可能会对你有所帮助。如果您需要更好的解决方案，或者它不能解决您的问题，请让我知道。

# I changed the data because "phyloseq" package require further install
    ex_data = mtcars

# this line might replace your "get_sample_subsets" function and your loop to check if they are empty lists
# You can modify the elements inside list(...) to get the wanted subsets, it is very flexible
    sampled_data = split(ex_data, list(ex_data$cyl, ex_data$vs), drop = TRUE) # note the drop = TRUE, to avoid "empty" elements

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58560139

复制

相似问题

问R phyloseq中忽略丢失参数的有效子集
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R phyloseq中忽略丢失参数的有效子集EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R phyloseq中忽略丢失参数的有效子集
EN