这是我关于桩溢出的第一篇文章,所以如果需要更多的信息,请问我以下问题!
情境:我已经编制了淡水生态系统马里奥斯(加拿大大西洋)的水化学数据,因为我试图建立一个预测物种分布模型使用一个随机森林模型(RFM)的入侵物种。不幸的是,加拿大大西洋地区缺乏一致的水监测程序,而那些确实存在的项目没有监测到与其他组相同的参数。因此,我的数据库(包括培训和测试)有很多NAs。
问题:这是我一直从RFM上得到的回应:
> p1 <- predict(model2, newdata=Test_Dataset,type="prob")[,2]
> p11 2 3 4 5 6 7 8 10 11 12 13 14 16 16 17 18 19 20 22 23 24 25 26 27 28 30 31 32 33 34 NA 35 36 37 NA
我试过的是:
model2 <- randomForest(CMS ~ Lat + Lon + pH +碱度+ Ca +硬度+ DO + TOC + T_P + T_N + Cond + Na + No_Stocking + No_Fish_Species + Dist_Hwy + No_Boat_Launches + Connected_Lakes +入侵,重要性=真,data=TrainSet,na.action=na.roughfix) model2
**请注意,变量的大列表是预测因素,CMS是物种。
Test_Dataset <- rbind(Validation_Dataset1,Validation_Dataset) Test_Dataset <- Test_Dataset 1,
修改数据集以修复R读取NA单元格的问题
Validation_Dataset <- Validation_Dataset %>% dplyr::变异(#转换年份为分类变量年=因子(年份),#将叶绿素浓度从字符文件转换为数字文件#,每当适当的叶绿素=dplyr::na_if(叶绿素,"NA"),叶绿素=因子(叶绿素),硬度=dplyr::na_if(硬度,"NA"),Hardness=因子(硬度),碱度=dplyr::na_if(碱度," Na "),碱度=因数(碱度),Ca = dplyr::na_if( Ca,"NA"),Ca=因子(Ca),TOC = dplyr::na_if( TOC,"NA"),TOC=因子(TOC),Cond = dplyr::na_if( Cond,"NA"),Cond=因子(Cond),NA= dplyr::na_if(Na,"NA"),NH4 =因子( NH4," Na "),NH4=因子(NH4),NO3 = dplyr::na_if( NO3,"NA"),NO3=因子(NO3),pH = dplyr::na_if( pH,"NA"),pH=因子(pH),T_N =dplyr::na_if( "NA"),NH4=因子(),T_P = dplyr::na_if( T_P,"NA"),T_P=因子(T_P),DO = dplyr::na_if( DO,"NA"),DO=因数(DO),盐度=dplyr::na_if(盐度,"NA"),盐度=因子(盐度),No_Stocking = dplyr::na_if( No_Stocking,"NA"),No_Stocking=因子(No_Stocking),No_Fish_Species = dplyr::na_if( No_Fish_Species,"NA"),No_Fish_Species=因子(No_Fish_Species),Dist_Hwy = dplyr::na_if( Dist_Hwy,"NA"),Dist_Hwy=因子(Dist_Hwy),No_Boat_Launches = dplyr::na_if( No_Boat_Launches,NA),No_Boat_Launches=因子(No_Boat_Launches),Connected_Lakes = dplyr::na_if( Connected_Lakes,"NA"),Connected_Lakes=因子(Connected_Lakes),入侵=dplyr::na_if(入侵,"NA"),入侵=因子(入侵),Lat =因子(Lat),Lon =因子(Lon),CMS =因子(CMS)
问:有没有人知道如何使代码工作,以便model2在Test_Dataset上进行预测?我认为这个问题可能很小,但我没有看到。
以下是培训数据集(Validation_Dataset)的一瞥:
> str(Validation_Dataset)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 37 obs. of 31 variables:
$ Name : chr "Canard River" "Cedar Creek" "Holland River" "Speed River" ...
$ STN #/COUNTY : chr "10000200202" "16001800202" "3007700202" "16018403402" ...
$ Province : chr "ON" "ON" "ON" "ON" ...
$ Lat : Factor w/ 37 levels "42.03204214",..: 2 1 11 9 10 8 7 5 6 3 ...
$ Lon : Factor w/ 37 levels "-83.01879548",..: 1 2 11 8 10 6 7 9 5 4 ...
$ Year : Factor w/ 9 levels "2007, 2011","2010, 2015, 2011",..: 8 8 8 8 8 8 8 8 8 8 ...
$ Month : chr "4" "4" "4" "4" ...
$ Day : chr "11" "12" "26" "27" ...
$ Data Source : chr "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" ...
$ pH : Factor w/ 35 levels "6.073333","6.13",..: 18 21 28 29 25 34 30 32 19 26 ...
$ Alkalinity : Factor w/ 31 levels "1.8","2.8","3.933333333",..: 19 22 31 30 27 NA NA 26 NA 21 ...
$ Hardness : Factor w/ 13 levels "14.8","36.8",..: 7 8 11 10 9 NA NA 13 NA NA ...
$ Ca : Factor w/ 24 levels "3.833333333",..: 18 19 24 20 21 NA NA 22 NA NA ...
$ Chlorophyll : Factor w/ 15 levels "0.423601","0.453791",..: NA NA NA NA NA NA NA NA NA NA ...
$ DO : Factor w/ 26 levels "0.27","6.2","6.96",..: 21 24 18 16 4 25 17 14 2 7 ...
$ TOC : Factor w/ 3 levels "4.8","5.5","8.8": NA NA NA NA NA NA NA NA NA NA ...
$ T_P : Factor w/ 24 levels "0.002","0.003",..: 23 22 18 10 15 14 16 13 21 20 ...
$ T_N : Factor w/ 32 levels "0.006","0.13",..: 30 31 27 28 17 29 24 32 21 25 ...
$ NO3+NO2 : num 2.173 2.292 1.092 1.695 0.426 ...
$ NO3 : Factor w/ 32 levels "0.027","0.035",..: 30 31 26 27 11 29 24 32 22 8 ...
$ NH4 : Factor w/ 27 levels "0.005","0.006",..: 26 25 22 17 9 11 13 19 23 27 ...
$ Cond : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
$ Salinity : Factor w/ 9 levels "0.11","0.15",..: NA NA NA NA NA NA NA NA NA NA ...
$ Na : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
$ No_Stocking : Factor w/ 3 levels "0","1","2": 1 2 2 3 1 2 1 2 1 2 ...
$ No_Fish_Species : Factor w/ 9 levels "0","1","2","3",..: 1 4 6 4 1 5 1 9 1 9 ...
$ Dist_Hwy : Factor w/ 16 levels "0.003","0.006",..: NA NA 16 NA NA NA NA 8 NA 5 ...
$ No_Boat_Launches: Factor w/ 8 levels "0","1","2","3",..: 1 1 5 1 1 1 1 8 1 3 ...
$ Connected_Lakes : Factor w/ 11 levels "0","1","2","3",..: 7 2 3 4 9 6 2 3 2 5 ...
$ Invasives : Factor w/ 3 levels "0","1","2": NA NA NA NA NA NA NA NA NA NA ...
$ CMS : Factor w/ 2 levels "NO","YES": 2 2 2 2 2 2 2 2 2 2 ...发布于 2019-11-20 09:25:02
使用参数na.roughfix。如果要使用它,首先必须在randomForest函数之外指定它。我将以虹膜数据集为例。
iris.roughfix <- na.roughfix(iris.na)
iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix)https://stackoverflow.com/questions/58946070
复制相似问题