我在的readr或R基中寻找一个函数或解决方案,以“预览”read_csv在实际导入数据之前猜测的列类型。我正在处理几个60 to大小的文件,其中包含51列和160 k行,这样就可以更容易地为read_csv构建read_csv规范。
如果这听起来像是个明显的问题的话我的借口。我在论坛上没有找到这个具体问题的答案,直到最近才开始使用dplyr。谢谢。
发布于 2021-02-07 12:21:05
进入重新读取器代码,尝试做一些操作来使用read_csv函数代码,但仅限于猜测规范。
getReaderSpec <- function (file, col_names = TRUE, col_types = NULL, locale = default_locale(),
na = c("", "NA"), quoted_na = TRUE, quote = "\"",
comment = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max), progress = show_progress(),
skip_empty_rows = TRUE)
{
tokenizer <- readr:::tokenizer_csv(na = na, quoted_na = quoted_na,
quote = quote, comment = comment, trim_ws = trim_ws,
skip_empty_rows = skip_empty_rows)
name <- readr:::source_name(file)
file <- readr:::standardise_path(file)
if (readr:::is.connection(file)) {
data <- readr:::datasource_connection(file, skip, skip_empty_rows,
comment)
if (readr:::empty_file(data[[1]])) {
return(tibble::tibble())
}
}
else {
if (!isTRUE(grepl("\n", file)[[1]]) && readr:::empty_file(file)) {
return(tibble::tibble())
}
if (is.character(file) && identical(locale$encoding,
"UTF-8")) {
data <- enc2utf8(file)
}
else {
data <- file
}
}
spec <- readr:::col_spec_standardise(data, skip = skip, skip_empty_rows = skip_empty_rows,
comment = comment, guess_max = guess_max, col_names = col_names,
col_types = col_types, tokenizer = tokenizer, locale = locale)
readr:::show_cols_spec(spec)
invisible(spec)
}
myspec <- getReaderSpec("someexample.csv")https://stackoverflow.com/questions/66087230
复制相似问题