首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >所有变量独立性的X平方检验

所有变量独立性的X平方检验
EN

Stack Overflow用户
提问于 2022-07-13 15:18:10
回答 3查看 235关注 0票数 3

在我的数据集中,我有许多分类变量,我想要获取这些变量之间的关联。然而,我正在努力弄清楚如何使它自动化,所以我不必在每对之间做一个奇方测试。

例如,假设我有一个数据帧。

代码语言:javascript
复制
#Create variables
set.seed(123)
fruit <-c('apple','orange','orange','pear')
fav_number <- seq(from=1,to=4,1)
place <- c('nigeria','india','usa','mexico')
weather <- c('summer','winter','spring','summer')
car <- c('bmw','mercedes','honda','honda')

#Create dataframe
df <- as.data.frame(cbind(fruit,fav_number,place,weather,car))

#Convert all columns to factors
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], 
                                       as.factor)

因此,我的输出/df看起来如下:

代码语言:javascript
复制
   fruit fav_number   place weather      car
1  apple          1 nigeria  summer      bmw
2 orange          2   india  winter mercedes
3 orange          3     usa  spring    honda
4   pear          4  mexico  summer    honda

我可以在两个变量之间做一个Chi检验:

代码语言:javascript
复制
chisq.test(table(df$place,df$fav_number))

但是我想对每个变量和另一个变量做同样的检验。我正在寻找的输出类似于用连续变量的corr矩阵得到的输出。

EN

回答 3

Stack Overflow用户

发布于 2022-07-13 15:50:26

代码语言:javascript
复制
#Create variables
set.seed(123)
fruit<-c('apple','orange','orange','pear')
fav_number<-seq(from=1,to=4,1)
place<-c('nigeria','india','usa','mexico')
weather<-c('summer','winter','spring','summer')
car<-c('bmw','mercedes','honda','honda')

#Create dataframe
df<-as.data.frame(cbind(fruit,fav_number,place,weather,car))

#Convert all columns to factors
df[sapply(df,is.character)]<-lapply(df[sapply(df,is.character)],as.factor)
eg<-expand.grid(names(df),names(df))
eg<-eg[-which(eg$Var1==eg$Var2),]

for(i in 1:nrow(eg)) {
  print(rep("#",20))
  cat(as.character(eg[i,1]),as.character(eg[i,2]),"\n")
  print(chisq.test(table(df[,eg[i,1]],df[,eg[i,2]])))
}
票数 2
EN

Stack Overflow用户

发布于 2022-07-13 15:55:34

使用outer

代码语言:javascript
复制
## chi^2
outer(df, df, Vectorize(\(x, y) chisq.test(table(x, y), sim=TRUE)$statistic))
#            fruit fav_number place weather car
# fruit          8          8     8       4   5
# fav_number     8         12    12       8   8
# place          8         12    12       8   8
# weather        4          8     8       8   5
# car            5          8     8       5   8

## p-value
outer(df, df, Vectorize(\(x, y) chisq.test(table(x, y), sim=TRUE)$p.value))
#                fruit fav_number place   weather       car
# fruit      0.1699150          1     1 1.0000000 0.8385807
# fav_number 1.0000000          1     1 1.0000000 1.0000000
# place      1.0000000          1     1 1.0000000 1.0000000
# weather    1.0000000          1     1 0.1749125 0.8255872
# car        0.8255872          1     1 0.8430785 0.1704148

请注意,我们在这里使用simulate.p.value=TRUE来消除“近似可能不正确”的警告。This post on Cross Validated详细阐述了这个主题。

数据:

代码语言:javascript
复制
df <- structure(list(fruit = structure(c(1L, 2L, 2L, 3L), levels = c("apple", 
"orange", "pear"), class = "factor"), fav_number = structure(1:4, levels = c("1", 
"2", "3", "4"), class = "factor"), place = structure(c(3L, 1L, 
4L, 2L), levels = c("india", "mexico", "nigeria", "usa"), class = "factor"), 
    weather = structure(c(2L, 3L, 1L, 2L), levels = c("spring", 
    "summer", "winter"), class = "factor"), car = structure(c(1L, 
    3L, 2L, 2L), levels = c("bmw", "honda", "mercedes"), class = "factor")), row.names = c(NA, 
-4L), class = "data.frame")
票数 1
EN

Stack Overflow用户

发布于 2022-07-13 16:16:25

使用combn获取所有组合

代码语言:javascript
复制
all_combos <- t(combn(names(df),2))
all_chis <- apply(all_combos, 1, \(x) chisq.test(table(df[x])))

lbls <- paste0(all_combos[,1],"_",all_combos[,2])
names(all_chis) <- lbls

输出是一个列表。

代码语言:javascript
复制
> all_chis
$fruit_fav_number

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$fruit_place

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$fruit_weather

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 4, df = 4, p-value = 0.406


$fruit_car

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 5, df = 4, p-value = 0.2873


$fav_number_place

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 12, df = 9, p-value = 0.2133


$fav_number_weather

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$fav_number_car

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$place_weather

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$place_car

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 8, df = 6, p-value = 0.2381


$weather_car

    Pearson's Chi-squared test

data:  table(df[x])
X-squared = 5, df = 4, p-value = 0.2873

我认为有一种方法可以在data输出中编辑chisq.test字段,但我还没有弄清楚如何访问它。解决办法是使用我创建的lbls。但这似乎太刻薄了。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72968705

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档