首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中使用查找表将所有变量的类转换为因子

在R中使用查找表将所有变量的类转换为因子
EN

Stack Overflow用户
提问于 2015-11-29 19:23:03
回答 1查看 131关注 0票数 1

我希望使用查找表来查找和替换数据帧中的匹配值,但是当我应用查找表时,它会将数据帧中的所有变量更改为因子。是否有一种方法可以在不更改变量类的情况下应用此查找表?

这是我的数据:

代码语言:javascript
复制
df <- structure(list(year = c(2008, 2008, 2008, 2010, 2009, 2009, 2011, 
2007, 2011, 2009, 2007, 2008, 2010, 2006, 2009, 2010, 2009, 2006, 
2009, 2008), change_occurred = c("true", "false", "true", "false", 
"false", "true", "false", "false", "false", "false", "false", 
"false", "true", "false", "false", "true", "false", "false", 
"false", "false"), agent_01 = c("harvest", "none", "development", 
"none", "none", "agriculture", "none", "none", "none", "none", 
"none", "none", "insect_disease_defo", "none", "none", "insect_disease_defo", 
"none", "none", "none", "none"), agent_01_conc = c("harvest_60", 
"none", "development", "none", "none", "agriculture", "none", 
"none", "none", "none", "none", "none", "insect_disease_defo", 
"none", "none", "insect_disease_defo", "none", "none", "none", 
"none"), ha_affect = c(3.87, 0, 1.134, 0, 0, 1.44, 0, 0, 0, 0, 
0, 0, 1.8, 0, 0, 2.43, 0, 0, 0, 0)), .Names = c("year", "change_occurred", 
"agent_01", "agent_01_conc", "ha_affect"), row.names = c(NA, 
20L), class = "data.frame")

df结构

代码语言:javascript
复制
str(df)
'data.frame':   20 obs. of  5 variables:
 $ year           : num  2008 2008 2008 2010 2009 ...
 $ change_occurred: chr  "true" "false" "true" "false" ...
 $ agent_01       : chr  "harvest" "none" "development" "none" ...
 $ agent_01_conc  : chr  "harvest_60" "none" "development" "none" ...
 $ ha_affect      : num  3.87 0 1.13 0 0 ...

这是我的查表:

代码语言:javascript
复制
lookup <- structure(c("harvest_0", "harvest_10", "harvest_20", "harvest_30", 
"harvest_40", "harvest_50", "harvest_60", "harvest_70", "harvest_80", 
"harvest_90", "harvest_00_20", "harvest_00_20", "harvest_00_20", 
"harvest_30_60", "harvest_30_60", "harvest_30_60", "harvest_30_60", 
"harvest_70_90", "harvest_70_90", "harvest_70_90"), .Dim = c(10L, 
2L), .Dimnames = list(NULL, c("list", "val")))

现在,我使用查找表在lookup$list中查找任何匹配项,如果它找到匹配项,则将其替换为lookup$val中的值。

代码语言:javascript
复制
g <- sapply(df, function(x) { 
  tmp = lookup[, 2][match(x, lookup[, 1])] 
  ifelse(is.na(tmp), x, tmp) 
})

现在我把它逼进数据仓库..。

代码语言:javascript
复制
g.df <- as.data.frame(g)

但是现在这些变量的结构都是因子。

代码语言:javascript
复制
str(g.df)
'data.frame':   20 obs. of  5 variables:
 $ year           : Factor w/ 6 levels "2006","2007",..: 3 3 3 5 4 4 6 2 6 4 ...
 $ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
 $ agent_01       : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
 $ agent_01_conc  : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
 $ ha_affect      : Factor w/ 6 levels "0","1.134","1.44",..: 6 1 2 1 1 3 1 1 1 1 ...

对于如何防止这种情况发生,有什么想法吗?-cherrytree

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-11-29 19:26:35

我们需要使用lapply而不是sapply,因为后者转换为matrix,而矩阵只能容纳一个类。如果有任何字符列,则所有列都将转换为character。当我们使用as.data.frame时,它将转换为factor,因为默认选项是stringsAsFactors=TRUE

代码语言:javascript
复制
 g <- lapply(df, function(x) { 
    tmp = lookup[, 2][match(x, lookup[, 1])] 
    ifelse(is.na(tmp), x, tmp) 
  })
df2 <- data.frame(g) 
str(df2)
#'data.frame':   20 obs. of  5 variables:
# $ year           : num  2008 2008 2008 2010 2009 ...
# $ change_occurred: Factor w/ 2 levels "false","true": 2 1 2 1 1 2 1 1 1 1 ...
# $ agent_01       : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ agent_01_conc  : Factor w/ 5 levels "agriculture",..: 3 5 2 5 5 1 5 5 5 5 ...
# $ ha_affect      : num  3.87 0 1.13 0 0 ...

如果我们真的想使用sapply,那么就有一个选项simplify=FALSE,这样它就不会强迫matrix

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33986977

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档