首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >RSQLite中的因素

RSQLite中的因素
EN

Stack Overflow用户
提问于 2013-11-04 11:15:55
回答 2查看 1.1K关注 0票数 4

我一直无法找到关于RSQLite如何处理因素的文档。从快速测试(见下文)看,它们似乎被转换为字符。

问题1:是否有任何办法将其作为因素加以保存?我可以想到一些简单的方法(主要涉及一个单独的表或.Rdata文件来存储因子级别),但似乎应该有一种标准的、因此更易于维护的方法来实现这一点。

问题2:如果不是RSQLite,比其他数据库或类似数据库的软件包?这里我的用例很简单:在一堆大型(2-5mm行X550列) data.frames中添加一个处理,以构建一个巨大的数据库,然后只能选择我希望从数据库中输入到data.table中的行并继续工作。

代码语言:javascript
复制
library(RSQLite)
# Create
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )
# Write test
set.seed(1)
testDat <- data.frame(x=runif(1000),y=runif(1000),g1=sample(letters[1:10],1000,replace=TRUE),g2=rep(letters[1:10],each=100),g3=factor( sample(letters[1:10],1000,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteTable( conn = db, name = "test", value = testDat, row.names=FALSE )
# Read test
testRecovery <- dbGetQuery(db, "SELECT * FROM test")
testSelection <- dbGetQuery(db, "SELECT * FROM test WHERE g3=='h' OR g3=='e' ")
# Close
dbDisconnect(db)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-11-04 13:35:57

在我看来,这很简单:factor是一个只有S和R知道的概念。句号。

因此,要让它们进入DB并返回,您需要编写映射器。要么简化所有的as.character (并假设大多数DB后端都会像R一样散列)。或者以DB为中心,并将因子拆分为(无符号的)整数(可能还有短整数)和标签。

票数 5
EN

Stack Overflow用户

发布于 2013-11-04 18:34:46

好的,我按照@DirkEddelbuettel的建议写了一些包装材料。感谢你的评论。

代码语言:javascript
复制
#' Write a table via RSQLite with factors stored in another table
#' Handles data.tables efficiently for large datasets
#' @param conn The connection object (created with e.g. dbConnect)
#' @param name The name of the table to write
#' @param value The data.frame to write to the database
#' @param factorName The base name of the tables to store the factor labels in in the SQLite database (e.g. if factorName is "_factor_" and the data.frame in value contains a factor column called "color" and the name is "mytable" then dbWriteFactorTable will create a table called mytable_factor_color which will store the levels information)
#' @param \dots Options to pass along to dbWriteTable (e.g. append=TRUE)
#' @return A boolean indicating whether the table write was successful
dbWriteFactorTable <- function( conn, name, value, factorName="_factor_", ... ) {
  # Test inputs
  stopifnot(class(conn)=="SQLiteConnection")
  stopifnot(class(name)=="character")
  stopifnot("data.frame" %in% class(value))
  stopifnot(class(factorName)=="character")
  if( grepl("[.]",factorName) ) stop("factorName must use valid characters for SQLite")
  if( "data.table" %in% class(value) ) dt <- TRUE # Is value a data.table, if so use more efficient methods
  # Convert factors to character
  factorCols <- names( Filter( function(x) x=="factor", vapply( value, class, "" ) ) )
  if(length(factorCols>0)) {
    for( cl in which( colnames(value) %in% factorCols ) ) {
      cn <- colnames(value)[cl]
      factorTable <- data.frame( levels=levels(value[[ cn ]]) )
      factorTable$levelKey <- seq(nrow(factorTable))
      fctNm <- paste0(name,factorName,cn)
      dbWriteTable( conn = conn, name = fctNm, value = factorTable, row.names=FALSE, overwrite=TRUE )
      if( dt )  set( x=value, j=cl, value=as.character(value[[ cn ]]) )
    }
    if( !dt )  value <- japply( value, which( colnames(value) %in% factorCols ), as.character )
  } else {
    warning("No factor columns detected.")
  }
  dbWriteTable( conn = conn, name = name, value = value, ... )
}

#' Read a table via RSQLite with factors stored in another table
#' @param conn The connection object (created with e.g. dbConnect)
#' @param name The name of the table to read
#' @param query A character string containing sequel statements to be appended onto the query (e.g. "WHERE x==3")
#' @param dt Whether to return a data.table vs. a plain-old data.frame
#' @param factorName The base name of the tables to store the factor labels in in the SQLite database (e.g. if factorName is "_factor_" and the data.frame in value contains a factor column called "color" and the name is "mytable" then dbWriteFactorTable will expect there to be a table called mytable_factor_color which holds the levels information)
#' @param \dots Options to pass along to dbGetQuery
#' @return A data.table or data.frame
dbReadFactorTable <- function( conn, name, query="", dt=TRUE, factorName="_factor_", ... ) {
  # Test inputs
  stopifnot(class(conn)=="SQLiteConnection")
  stopifnot(class(name)=="character")
  stopifnot(class(factorName)=="character")
  if( grepl("[.]",factorName) ) stop("factorName must use valid characters for SQLite")
  # Read main table
  if( dt ) {
    value <- as.data.table( dbGetQuery( conn, paste("SELECT * FROM",name,query), ... ) )
  } else {
    value <- dbGetQuery( conn, paste("SELECT * FROM",name,query), ... )
  }
  # Convert factors to character
  factorCols <- sub( paste0("^.*",name,factorName,"(.+)$"), "\\1", 
    Filter( Negate(is.na), 
      str_extract( dbListTables( conn ), paste0(".*",name,factorName,".*") ) 
    )
  )
  if(length(factorCols>0)) {
    for( cn in factorCols ) {
      fctNm <- paste0(name,factorName,cn)
      factorTable <- dbGetQuery( conn, paste0("SELECT * FROM ",fctNm) )
      if( dt ) {
        cl <- which( colnames(value) %in% cn )
        set( x=value, j=cl, value=factor( value[[ cn ]], levels=factorTable$levels ) )
      } else {
        value[[ cn ]] <- factor( value[[ cn ]], levels=factorTable$levels )
      }
    }
  } else {
    warning("No factor columns detected.")
  }
  value
}

还有一个简单的例子:

代码语言:javascript
复制
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )
set.seed(1)
n <- 1000
testDat <- data.frame(key=seq(n), x=runif(n),y=runif(n),g1=sample(letters[1:10],n,replace=TRUE),g2=rep(letters[1:10],each=n/10),g3=factor( sample(letters[1:10],n,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteFactorTable( conn = db, name = "test", value = as.data.table(testDat), row.names=FALSE )
dbReadFactorTable( conn = db, name = "test" )
dbReadFactorTable( conn = db, name = "test", query="WHERE g3=='a'" )
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/19766588

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档