我正试图将一个表从一个SQLite数据库写到一个R数据框架中,并偶然发现了一个让我感到困惑的问题。下面是我想导入的SQLite表中的三个第一个条目:
1|10|0|0|0|0|10|10|0|0|0|6|8|6|20000|30000|2012-02-29 21:27:07.239091|2012-02-29 21:28:24.815385|6|80.67.28.161|||||||||||||||||||||||||||||||33|13.4936||t|t|f||||||||||||||||||4|0|0|7|7|2
2|10|0|0|0|0|0|0|0|2|2|4|5|4|20000|30000|2012-02-29 22:00:30.618726|2012-02-29 22:04:09.629942|5|80.67.28.161|3|7||0|1|3|0|||4|3|4|5|5|5|5|4|5|4|4|0|0|0|0|0|9|9|9|9|9|||1|f|t|f|||||||||||||k|text|l|||-13|0|3|10||2
3|13|2|4|4|4|4|1|1|2|5|6|3|2|40000|10000|2012-03-01 09:07:52.310033|2012-03-01 09:21:13.097303|6|80.67.28.161|2|2||30|1|1|0|||4|2|1|6|8|3|5|6|6|7|6|||||||||||26|13.6336|4|f|t|f|t|f|f|f|f|||||||||some text||||10|1|1|3|2|3我感兴趣的是第53至60栏,为了省去上面计算的麻烦,如下所示:
|t|t|f||||||
|f|t|f||||||
|f|t|f|t|f|f|f|f|您可以看到,对于前两个条目,只有前三个列不为NULL,而对于第三个条目,所有八个列都分配了值。
以下是这些列的SQLite表信息
sqlite> PRAGMA table_info(observations);
0|id|INTEGER|1||1
** snip **
53|understanding1|boolean|0||0
54|understanding2|boolean|0||0
55|understanding3|boolean|0||0
56|understanding4|boolean|0||0
57|understanding5|boolean|0||0
58|understanding6|boolean|0||0
59|understanding7|boolean|0||0
60|understanding8|boolean|0||0
** snip **现在,当我试图将这些内容读到R中时,这些专栏最终会变成什么样子:
> library('RSQLite')
> con <- dbConnect("SQLite", dbname = 'db.sqlite3))
> obs <- dbReadTable(con,'observations')
> obs[1:3,names(obs) %in% paste0('understanding',1:8)]
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
1 t t f NA NA NA NA
2 f t f NA NA NA NA
3 f t f 0 0 0 0
understanding8
1 NA
2 NA
3 0如您所见,前三列包含的值要么是't',要么是'f',而其他列是NA,其中SQLite表中的对应值为NULL,而0则为空--而不管SQLite表中的相应值是t还是f。不用说,这不是我所期望的行为。问题是,我认为,这些列的类型不正确:
> sapply(obs[1:3,names(obs) %in% paste0('understanding',1:8)], class)
understanding1 understanding2 understanding3 understanding4 understanding5 understanding6 understanding7
"character" "character" "character" "numeric" "numeric" "numeric" "numeric"
understanding8
"numeric" 当RSQLite在第一个条目中的对应列中将t和f作为值时,会不会将前三列设置为character类型,但是会不会与numeric一起使用,因为在这些列中,第一个条目恰好是NULL?
如果这确实发生了什么,那么有什么方法可以解决这个问题并将所有这些列转换到character (或者更好的是logical)中呢?
发布于 2012-10-18 01:13:33
以下是麻烦事,但它有效:
# first make a copy of the DB and work with it instead of changing
# data in the original
original_file <- "db.sqlite3"
copy_file <- "db_copy.sqlite3"
file.copy(original_file, copy_file) # duplicate the file
con <- dbConnect("SQLite", dbname = copy_file) # establish a connection to the copied DB
# put together a query to replace all NULLs by 'NA' and run it
columns <- c(paste0('understanding',1:15))
columns_query <- paste(paste0(columns,' = IfNull(',columns,",'NA')"),collapse=",")
query <- paste0("UPDATE observations SET ",columns_query)
dbSendQuery(con, query)
# Now that all columns have string values RSQLite will infer the
# column type to be `character`
df <- dbReadTable(con,'observations') # read the table
file.remove(copy_file) # delete the copy
# replace all 'NA' strings with proper NAs
df[names(df) %in% paste0('understanding',1:15)][df[names(df) %in% paste0('understanding',1:15)] == 'NA'] <- NA
# convert 't' to boolean TRUE and 'f' to boolean FALSE
df[ ,names(df) %in% paste0('understanding',1:15)] <- sapply( df[ ,names(df) %in% paste0('understanding',1:15)], function(x) {x=="t"} )https://stackoverflow.com/questions/12824569
复制相似问题