首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从随机森林模型中只得到NA响应的问题

从随机森林模型中只得到NA响应的问题
EN

Stack Overflow用户
提问于 2019-11-20 03:13:49
回答 1查看 75关注 0票数 0

这是我关于桩溢出的第一篇文章,所以如果需要更多的信息,请问我以下问题!

情境:我已经编制了淡水生态系统马里奥斯(加拿大大西洋)的水化学数据,因为我试图建立一个预测物种分布模型使用一个随机森林模型(RFM)的入侵物种。不幸的是,加拿大大西洋地区缺乏一致的水监测程序,而那些确实存在的项目没有监测到与其他组相同的参数。因此,我的数据库(包括培训和测试)有很多NAs。

问题:这是我一直从RFM上得到的回应:

代码语言:javascript
复制
> p1 <- predict(model2, newdata=Test_Dataset,type="prob")[,2]
> p1

1 2 3 4 5 6 7 8 10 11 12 13 14 16 16 17 18 19 20 22 23 24 25 26 27 28 30 31 32 33 34 NA 35 36 37 NA

我试过的是:

  1. I使用各种预测器构建了RFM (即model2)。我确实包括:

model2 <- randomForest(CMS ~ Lat + Lon + pH +碱度+ Ca +硬度+ DO + TOC + T_P + T_N + Cond + Na + No_Stocking + No_Fish_Species + Dist_Hwy + No_Boat_Launches + Connected_Lakes +入侵,重要性=真,data=TrainSet,na.action=na.roughfix) model2

**请注意,变量的大列表是预测因素,CMS是物种。

  1. I尝试将测试数据集(Test_Dataset)与培训数据集(Validation_Dataset)匹配。

Test_Dataset <- rbind(Validation_Dataset1,Validation_Dataset) Test_Dataset <- Test_Dataset 1,

  • 我已经搜索和读取了多个资源(包括明显的R页和链接在那里的引用)。

  • ,我已经按如下方式对数据进行了突变(我将只显示Validation_Dataset,因为两者的突变是相同的):

修改数据集以修复R读取NA单元格的问题

Validation_Dataset <- Validation_Dataset %>% dplyr::变异(#转换年份为分类变量年=因子(年份),#将叶绿素浓度从字符文件转换为数字文件#,每当适当的叶绿素=dplyr::na_if(叶绿素,"NA"),叶绿素=因子(叶绿素),硬度=dplyr::na_if(硬度,"NA"),Hardness=因子(硬度),碱度=dplyr::na_if(碱度," Na "),碱度=因数(碱度),Ca = dplyr::na_if( Ca,"NA"),Ca=因子(Ca),TOC = dplyr::na_if( TOC,"NA"),TOC=因子(TOC),Cond = dplyr::na_if( Cond,"NA"),Cond=因子(Cond),NA= dplyr::na_if(Na,"NA"),NH4 =因子( NH4," Na "),NH4=因子(NH4),NO3 = dplyr::na_if( NO3,"NA"),NO3=因子(NO3),pH = dplyr::na_if( pH,"NA"),pH=因子(pH),T_N =dplyr::na_if( "NA"),NH4=因子(),T_P = dplyr::na_if( T_P,"NA"),T_P=因子(T_P),DO = dplyr::na_if( DO,"NA"),DO=因数(DO),盐度=dplyr::na_if(盐度,"NA"),盐度=因子(盐度),No_Stocking = dplyr::na_if( No_Stocking,"NA"),No_Stocking=因子(No_Stocking),No_Fish_Species = dplyr::na_if( No_Fish_Species,"NA"),No_Fish_Species=因子(No_Fish_Species),Dist_Hwy = dplyr::na_if( Dist_Hwy,"NA"),Dist_Hwy=因子(Dist_Hwy),No_Boat_Launches = dplyr::na_if( No_Boat_Launches,NA),No_Boat_Launches=因子(No_Boat_Launches),Connected_Lakes = dplyr::na_if( Connected_Lakes,"NA"),Connected_Lakes=因子(Connected_Lakes),入侵=dplyr::na_if(入侵,"NA"),入侵=因子(入侵),Lat =因子(Lat),Lon =因子(Lon),CMS =因子(CMS)

问:有没有人知道如何使代码工作,以便model2在Test_Dataset上进行预测?我认为这个问题可能很小,但我没有看到。

以下是培训数据集(Validation_Dataset)的一瞥:

代码语言:javascript
复制
> str(Validation_Dataset)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    37 obs. of  31 variables:
 $ Name            : chr  "Canard River" "Cedar Creek" "Holland River" "Speed River" ...
 $ STN #/COUNTY    : chr  "10000200202" "16001800202" "3007700202" "16018403402" ...
 $ Province        : chr  "ON" "ON" "ON" "ON" ...
 $ Lat             : Factor w/ 37 levels "42.03204214",..: 2 1 11 9 10 8 7 5 6 3 ...
 $ Lon             : Factor w/ 37 levels "-83.01879548",..: 1 2 11 8 10 6 7 9 5 4 ...
 $ Year            : Factor w/ 9 levels "2007, 2011","2010, 2015, 2011",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ Month           : chr  "4" "4" "4" "4" ...
 $ Day             : chr  "11" "12" "26" "27" ...
 $ Data Source     : chr  "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" ...
 $ pH              : Factor w/ 35 levels "6.073333","6.13",..: 18 21 28 29 25 34 30 32 19 26 ...
 $ Alkalinity      : Factor w/ 31 levels "1.8","2.8","3.933333333",..: 19 22 31 30 27 NA NA 26 NA 21 ...
 $ Hardness        : Factor w/ 13 levels "14.8","36.8",..: 7 8 11 10 9 NA NA 13 NA NA ...
 $ Ca              : Factor w/ 24 levels "3.833333333",..: 18 19 24 20 21 NA NA 22 NA NA ...
 $ Chlorophyll     : Factor w/ 15 levels "0.423601","0.453791",..: NA NA NA NA NA NA NA NA NA NA ...
 $ DO              : Factor w/ 26 levels "0.27","6.2","6.96",..: 21 24 18 16 4 25 17 14 2 7 ...
 $ TOC             : Factor w/ 3 levels "4.8","5.5","8.8": NA NA NA NA NA NA NA NA NA NA ...
 $ T_P             : Factor w/ 24 levels "0.002","0.003",..: 23 22 18 10 15 14 16 13 21 20 ...
 $ T_N             : Factor w/ 32 levels "0.006","0.13",..: 30 31 27 28 17 29 24 32 21 25 ...
 $ NO3+NO2         : num  2.173 2.292 1.092 1.695 0.426 ...
 $ NO3             : Factor w/ 32 levels "0.027","0.035",..: 30 31 26 27 11 29 24 32 22 8 ...
 $ NH4             : Factor w/ 27 levels "0.005","0.006",..: 26 25 22 17 9 11 13 19 23 27 ...
 $ Cond            : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
 $ Salinity        : Factor w/ 9 levels "0.11","0.15",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Na              : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
 $ No_Stocking     : Factor w/ 3 levels "0","1","2": 1 2 2 3 1 2 1 2 1 2 ...
 $ No_Fish_Species : Factor w/ 9 levels "0","1","2","3",..: 1 4 6 4 1 5 1 9 1 9 ...
 $ Dist_Hwy        : Factor w/ 16 levels "0.003","0.006",..: NA NA 16 NA NA NA NA 8 NA 5 ...
 $ No_Boat_Launches: Factor w/ 8 levels "0","1","2","3",..: 1 1 5 1 1 1 1 8 1 3 ...
 $ Connected_Lakes : Factor w/ 11 levels "0","1","2","3",..: 7 2 3 4 9 6 2 3 2 5 ...
 $ Invasives       : Factor w/ 3 levels "0","1","2": NA NA NA NA NA NA NA NA NA NA ...
 $ CMS             : Factor w/ 2 levels "NO","YES": 2 2 2 2 2 2 2 2 2 2 ...
EN

回答 1

Stack Overflow用户

发布于 2019-11-20 09:25:02

使用参数na.roughfix。如果要使用它,首先必须在randomForest函数之外指定它。我将以虹膜数据集为例。

代码语言:javascript
复制
iris.roughfix <- na.roughfix(iris.na)
iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58946070

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档