文章/答案/技术大牛

发布

社区首页 >问答首页 >标准无歧义格式[R] MySQL导入数据

问标准无歧义格式[R] MySQL导入数据
EN

Stack Overflow用户

提问于 2015-03-01 01:46:04

回答 1查看 361关注 0票数 1

好的，为了设置场景，我编写了一个函数从MySQL导入多个表(使用RODBC)并在它们上运行randomForest()。此函数在多个数据库上运行(作为单独的实例)。在一个特定的数据库和一个特定的表中，抛出"error In as.POSIXlt.character(x，tz，.)：没有标准格式的字符串“错误。该函数运行在两个数据库中的大约150个表上，除了这个表之外没有任何问题。

下面是表中的头()打印：

MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00  184   24    8   24   67  147  Flat
2 2014-11-05 23:57:00  203  184  204   67   51  147  Flat
3 2014-11-06 00:40:00  179  309   49  189   75   19  Flat
4 2014-11-06 00:46:00   28  192   60   49  152  147  Flat
5 2014-11-06 01:20:00  309   48    9   11   24   19  Flat
6 2014-11-06 01:31:00   24  177   64  152  188   19  Flat

以下是功能：

GenerateRF <- function(db, countstable, RFcutoff) {  

'load required libraries'
  library(RODBC)
  library(randomForest)
  library(caret)
  library(ff)
  library(stringi)

'connection and data preparation'
  connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")

'import count table and check if RF is allowed to be built'
  query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
      row.counts <- sqlQuery (connection, query.str)

'Operate only on tables that have >= RFcutoff'
  for (i in 1:nrow (row.counts)) {
    table.name <- as.character (row.counts[i,1])
    col.count <- as.numeric (row.counts[i,2])
    row.count <- as.numeric (row.counts[i,3])

    if (row.count >= 20) {

'Delete old RFs and DFs for input pattern'
    if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
          file.remove (paste0 (table.name, '_RF.Rdata'))
    }
    if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
          file.remove (paste0 (table.name, '_DF.Rdata'))
    }

'import and clean data'
      query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
          raw.data <- sqlQuery(connection, query.str2) 

'partition data into training/test sets'
      set.seed(489)
          index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
              data.train <- raw.data [index,]
              data.test <- raw.data [-index,]

'find optimal trees to grow (without outcome and dates)
      data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
                 stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE)) 
          best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]

'compress df'
      data.ff <- as.ffdf (data.train)

'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
      data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
               importance=TRUE, proximity=FALSE)

'generate and print variable importance plot'
      varImpPlot (data.rf, main = table.name)

'predict on test data'
      data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))

'get dates and name date column'
      data.test.dates <- data.frame (data.test[,1])
          colnames (data.test.dates) <- 'MQLTime'

'attach dates to prediction df'
      data.test.res <- cbind (data.test.dates, data.test.pred)

'force date coercion to attempt negating unambiguous format error '
      data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")

'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
      rownames (data.test.res)<-NULL
      data.test.res <- as.data.frame (data.test.res)
      root.table <- stri_sub(table.name, 0, -5)
          sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")

'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
      save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
      save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
      write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F, 
        row.names = F, quote = F)

'end of conditional block'
    }

'end of for loop'
  }

'close all connection to MySQL'
  odbcCloseAll()

'clear workspace'
  rm(list=ls())

'end of function'  
}

在这一行：

data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")

我尝试过使用各种功能来胁迫MQLTime，包括：as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())

并已尝试：

"%y" vs "%Y" and "%OS" vs "%S"

所有变体似乎对错误都没有影响，而且该函数仍然能够在所有其他表上运行。我已经手动检查了表(它包含近1500行)，并在MySQL中查找空日期或日期，比如"0000-00-00 : 00:00:00“。

而且，如果我在R终端中逐行运行函数，这个违规的表就会被处理，而不会有任何问题让我感到困惑。

我已经用尽了我能想到的所有功能/解决方案(以及通过Google博士我能找到的所有功能/解决方案)，所以我在这里请求帮助。我可能应该提到，MQLTime列在MySQL中存储为varchar()。这样做是为了解决R和MySQL之间类型转换的问题。

SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions, 
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32


> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

编辑: Str()输出从MySQl导入的数据，显示MQLTime已经采用POSIXct格式：

> str(raw.data)
'data.frame':   1472 obs. of  8 variables:
 $ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
 $ bar5   : int  184 203 179 28 309 24 156 48 309 437 ...
 $ bar4   : int  24 184 309 192 48 177 48 68 60 71 ...
 $ bar3   : int  8 204 49 60 9 64 68 27 192 147 ...
 $ bar2   : int  24 67 189 49 11 152 27 56 437 67 ...
 $ bar1   : int  67 51 75 152 24 188 56 147 71 0 ...
 $ pat1   : int  147 147 19 147 19 19 147 19 147 19 ...
 $ baXRC  : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...

因此，我尝试在dataframe操作中声明stringsAsfactors = FALSE，但没有效果。

有趣的是，如果通过第一个' if‘块中的附加条件语句从处理中删除违规表，则该函数将在该表的前面立即停止。

如果从处理过程中删除了原始表和新表，则该函数将在它们之前的表上停止。我从来没有见过这样的行为，我真的很困惑。

在运行过程中，我观察了系统资源，但它们似乎从未达到极限。

这是否是'for‘循环的问题，而不一定是日期格式的问题？

mysql

datetime

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-03-01 10:25:11

我脸上好像有个鸡蛋。函数停止的表后面的表有一个值为“0000-00-00:00:00”的行。我在MySQL函数中添加了另一条语句，以便在预处理表时删除这些行。感谢那些看过这个的人。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28789177

复制

相似问题

问标准无歧义格式[R] MySQL导入数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问标准无歧义格式[R] MySQL导入数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问标准无歧义格式[R] MySQL导入数据
EN