好的,为了设置场景,我编写了一个函数从MySQL导入多个表(使用RODBC)并在它们上运行randomForest()。此函数在多个数据库上运行(作为单独的实例)。在一个特定的数据库和一个特定的表中,抛出"error In as.POSIXlt.character(x,tz,.):没有标准格式的字符串“错误。该函数运行在两个数据库中的大约150个表上,除了这个表之外没有任何问题。
下面是表中的头()打印:
MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00 184 24 8 24 67 147 Flat
2 2014-11-05 23:57:00 203 184 204 67 51 147 Flat
3 2014-11-06 00:40:00 179 309 49 189 75 19 Flat
4 2014-11-06 00:46:00 28 192 60 49 152 147 Flat
5 2014-11-06 01:20:00 309 48 9 11 24 19 Flat
6 2014-11-06 01:31:00 24 177 64 152 188 19 Flat以下是功能:
GenerateRF <- function(db, countstable, RFcutoff) {
'load required libraries'
library(RODBC)
library(randomForest)
library(caret)
library(ff)
library(stringi)
'connection and data preparation'
connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")
'import count table and check if RF is allowed to be built'
query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
row.counts <- sqlQuery (connection, query.str)
'Operate only on tables that have >= RFcutoff'
for (i in 1:nrow (row.counts)) {
table.name <- as.character (row.counts[i,1])
col.count <- as.numeric (row.counts[i,2])
row.count <- as.numeric (row.counts[i,3])
if (row.count >= 20) {
'Delete old RFs and DFs for input pattern'
if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
file.remove (paste0 (table.name, '_RF.Rdata'))
}
if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
file.remove (paste0 (table.name, '_DF.Rdata'))
}
'import and clean data'
query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
raw.data <- sqlQuery(connection, query.str2)
'partition data into training/test sets'
set.seed(489)
index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
data.train <- raw.data [index,]
data.test <- raw.data [-index,]
'find optimal trees to grow (without outcome and dates)
data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE))
best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]
'compress df'
data.ff <- as.ffdf (data.train)
'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
importance=TRUE, proximity=FALSE)
'generate and print variable importance plot'
varImpPlot (data.rf, main = table.name)
'predict on test data'
data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))
'get dates and name date column'
data.test.dates <- data.frame (data.test[,1])
colnames (data.test.dates) <- 'MQLTime'
'attach dates to prediction df'
data.test.res <- cbind (data.test.dates, data.test.pred)
'force date coercion to attempt negating unambiguous format error '
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
rownames (data.test.res)<-NULL
data.test.res <- as.data.frame (data.test.res)
root.table <- stri_sub(table.name, 0, -5)
sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")
'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F,
row.names = F, quote = F)
'end of conditional block'
}
'end of for loop'
}
'close all connection to MySQL'
odbcCloseAll()
'clear workspace'
rm(list=ls())
'end of function'
}在这一行:
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")我尝试过使用各种功能来胁迫MQLTime,包括:as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())
并已尝试:
"%y" vs "%Y" and "%OS" vs "%S"所有变体似乎对错误都没有影响,而且该函数仍然能够在所有其他表上运行。我已经手动检查了表(它包含近1500行),并在MySQL中查找空日期或日期,比如"0000-00-00 : 00:00:00“。
而且,如果我在R终端中逐行运行函数,这个违规的表就会被处理,而不会有任何问题让我感到困惑。
我已经用尽了我能想到的所有功能/解决方案(以及通过Google博士我能找到的所有功能/解决方案),所以我在这里请求帮助。我可能应该提到,MQLTime列在MySQL中存储为varchar()。这样做是为了解决R和MySQL之间类型转换的问题。
SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions,
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)编辑: Str()输出从MySQl导入的数据,显示MQLTime已经采用POSIXct格式:
> str(raw.data)
'data.frame': 1472 obs. of 8 variables:
$ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
$ bar5 : int 184 203 179 28 309 24 156 48 309 437 ...
$ bar4 : int 24 184 309 192 48 177 48 68 60 71 ...
$ bar3 : int 8 204 49 60 9 64 68 27 192 147 ...
$ bar2 : int 24 67 189 49 11 152 27 56 437 67 ...
$ bar1 : int 67 51 75 152 24 188 56 147 71 0 ...
$ pat1 : int 147 147 19 147 19 19 147 19 147 19 ...
$ baXRC : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...因此,我尝试在dataframe操作中声明stringsAsfactors = FALSE,但没有效果。
有趣的是,如果通过第一个' if‘块中的附加条件语句从处理中删除违规表,则该函数将在该表的前面立即停止。
如果从处理过程中删除了原始表和新表,则该函数将在它们之前的表上停止。我从来没有见过这样的行为,我真的很困惑。
在运行过程中,我观察了系统资源,但它们似乎从未达到极限。
这是否是'for‘循环的问题,而不一定是日期格式的问题?
发布于 2015-03-01 10:25:11
我脸上好像有个鸡蛋。函数停止的表后面的表有一个值为“0000-00-00:00:00”的行。我在MySQL函数中添加了另一条语句,以便在预处理表时删除这些行。感谢那些看过这个的人。
https://stackoverflow.com/questions/28789177
复制相似问题