我想使用dplyr和RMySQL来处理我的大数据。dplyr代码没有问题。问题(我认为)是关于将数据从MySQL导出到R。即使我在collect中使用n=Inf,我的连接每次都会断开。从理论上讲,我的数据应该有超过50K的行,但我只能返回15K左右。如有任何建议,欢迎光临。
方法1
library(dplyr)
library(RMySQL)
# Connect to a database and select a table
my_db <- src_mysql(dbname='aermod_1', host = "localhost", user = "root", password = "")
my_tbl <- tbl(my_db, "db_table")
out_summary_station_raw <- select(my_tbl, -c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw)方法2:使用Pool
library(pool)
library(RMySQL)
library(dplyr)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "aermod_1",
host = "localhost",
username = "root",
password = ""
)
out_summary_station_raw <- src_pool(pool) %>% tbl("aermod_final") %>% select(-c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw, n = Inf)警告消息(两种方法):
Warning messages:
1: In dbFetch(res, n) : error while fetching rows
2: Only first 15,549 results retrieved. Use n = Inf to retrieve all. 更新:
已检查日志,从服务器端看起来一切正常。在我的示例中,slow-log显示为Query_time: 79.348351 Lock_time: 0.000000 Rows_sent: 15552 Rows_examined: 16449696,但collect无法检索完整的数据。我可以使用MySQL Bench复制相同的移动。
https://stackoverflow.com/questions/44350330
复制相似问题