文章/答案/技术大牛

发布

社区首页 >问答首页 >多对比对在数据帧上执行chisq.test

问多对比对在数据帧上执行chisq.test
EN

Stack Overflow用户

提问于 2017-09-21 10:47:40

回答 2查看 2.8K关注 0票数 1

我有以下数据：

species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- cbind(species, category, minus, plus)
df<-as.data.frame(df)

我想为每一类物种组合做一个chisq.test，如下所示：

物种a，类别h和l: P-值

物种a，类别h和m: P-值

物种a，类别1和m: P-值

物种b ..。诸若此类

使用以下chisq.test (虚拟代码)：

chisq.test(c(minus(cat1, cat2),plus(cat1, cat2)))$p.value

我想最后得到一个表，该表显示每个比较的每个chisq.test p值，如下所示：

Species   Category1  Category2   p-value
a         h          l           0.05
a         h          m           0.2
a         l          m           0.1
b...

其中，类别和类别2是chisq.test中的比较类别。

这可以使用dplyr吗？我已经尝试过调整here和here中提到的内容，但它们并不真正适用于这个问题，正如我所看到的。

编辑：--我还想看看如何对以下数据集这样做：

species <- c(1:11)
minus <- c(132,78,254,12,45,76,89,90,100,42,120)
plus <- c(1,2,0,0,0,3,2,5,6,4,0)

我想做一个chisq。对表格中的每一种进行测试，并将其与表格中的其他物种进行比较(对所有物种而言，每个物种之间的配对比较)。我想以这样的方式结束：

species1  species2  p-value
1         2         0.5
1         3         0.7
1         4         0.2
...
11        10        0.02

我尝试将上面的代码更改为：

species_chisq %>%
do(data_frame(species1 = first(.$species),
            species2 = last(.$species),
            data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
mutate(p.value = map_dbl(chi_test, "p.value")) %>%
ungroup() %>%
select(species1, species2, p.value) %>%

然而，这只创造了一个表格，其中每个物种只与自己相比，而不是其他物种。我不太明白，在@ycw给出的原始代码中，它指定了哪些比较。

编辑2：

我通过找到here的代码成功地做到了这一点。

dataframe

chi-squared

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-09-21 13:01:29

来自dplyr和purrr的解决方案。请注意，我不熟悉卡方测试，但我遵循@Vincent的帖子：chisq.test(test, correct = FALSE)中您指定的方式。

此外，要创建示例数据框架，不需要使用cbind，只使用data.frame就足够了。stringsAsFactors = FALSE对于防止列成为因素非常重要。

# Create example data frame
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species, category, minus, plus, stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(purrr)

# Process the data
df2 <- df %>%
  group_by(species) %>%
  slice(c(1, 2, 1, 3, 2, 3)) %>%
  mutate(test = rep(1:(n()/2), each = 2)) %>%
  group_by(species, test) %>%
  do(data_frame(species = first(.$species),
                test = first(.$test[1]),
                category1 = first(.$category),
                category2 = last(.$category),
                data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
  mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
  mutate(p.value = map_dbl(chi_test, "p.value")) %>%
  ungroup() %>%
  select(species, category1, category2, p.value)

df2
# A tibble: 25 x 4
   species category1 category2   p.value
     <chr>     <chr>     <chr>     <dbl>
 1       a         h         l 0.3465104
 2       a         h         m 0.1354680
 3       a         l         m 0.6040227
 4       b         h         l 0.2339414
 5       b         h         m 0.4798647
 6       b         l         m 0.4399181
 7       c         h         l 0.4714005
 8       c         h         m 0.6987413
 9       c         l         m 0.5729834
10       d         h         l 0.2196806
# ... with 15 more rows

票数 2

Stack Overflow用户

发布于 2017-09-21 12:10:39

首先，您应该使用data.frame创建您的data.frame，否则minus和plus列将转换为factor的。

species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species=species, category=category, minus=minus, plus=plus)

然后，我不确定是否有一种纯粹的dplyr方法来做这件事(我很乐意看到相反的结果)，但我认为这里有一种部分-dplyr的方法：

df_combinations <-
  # create a df with all interactions
  expand.grid(df$species, df$category, df$category)) %>% 
  # rename columns
  `colnames<-`(c("species", "category1", "category2")) %>% 
  # 3 lines below:
  # manage to only retain within a species, category(1 and 2) columns
  # with different values
  unique %>% 
  group_by(species) %>% 
  filter(category1 != category2) %>% 
  # cosmetics
  arrange(species, category1, category2) %>%
  ungroup() %>% 
  # prepare an empty column
  mutate(p.value=NA)

# now we loop to fill your result data.frame
for (i in 1:nrow(df_combinations)){
  # filter appropriate lines
  cat1 <- filter(df,
                 species==df_combinations$species[i],
                 category==df_combinations$category1[i])
  cat2 <- filter(df,
                 species==df_combinations$species[i],
                 category==df_combinations$category2[i])
  # calculate the chisq.test and assign its p-value to the right line
  df_combinations$p.value[i] <- chisq.test(c(cat1$minus, cat2$minus,
                                             cat1$plus, cat2$plus))$p.value  

}

让我们来看看最终的data.frame

head(df_combinations)
# A tibble: 6 x 4
# A tibble: 6 x 4
# Groups:   species [1]
species category1 category2       p.value
<fctr>    <fctr>    <fctr>         <dbl>
1       a         h         l  3.290167e-11
2       a         h         m 1.225872e-134
3       a         l         h  3.290167e-11
4       a         l         m 5.824842e-150
5       a         m         h 1.225872e-134
6       a         m         l 5.824842e-150

检查第一行: chisq.test(c(31，14，2，0))$p.value 1 3.290167e-11

这就是你想要的吗？

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46341954

复制

相似问题

问多对比对在数据帧上执行chisq.test
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多对比对在数据帧上执行chisq.testEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多对比对在数据帧上执行chisq.test
EN