我有这样一个.csv文件(除了真正的.csv文件有更多的列外):
library(tidyverse)
tibble(id1 = c("a", "b"),
id2 = c("c", "d"),
data1 = c(1, 2),
data2 = c(3, 4),
data1s = c(5, 6),
data2s = c(7, 8)) %>%
write_csv("df.csv")我只想要id1,id2,data1和data2。
我可以这么做:
df <- read_csv("df.csv",
col_names = TRUE,
cols_only(id1 = col_character(),
id2 = col_character(),
data1 = col_integer(),
data2 = col_integer()))但是,如前所述,我的真实数据集有更多的列,所以我希望使用tidyselect帮助程序只读取指定的列并确保指定的格式。
我试过这个:
df2 <- read_csv("df.csv",
col_names = TRUE,
cols_only(starts_with("id") = col_character(),
starts_with("data") & !ends_with("s") = col_integer()))但是错误消息表明语法有问题。是否有可能以这种方式使用tidyselect助手?
发布于 2022-09-08 08:08:19
我的建议在一定程度上是围绕着房子的,但它基本上允许你在“规则”而不是明确的基础上定制阅读规范。
library(tidyverse)
tibble(id1 = c("a", "b"),
id2 = c("c", "d"),
data1 = c(1, 2),
data2 = c(3, 4),
data1s = c(5, 6),
data2s = c(7, 8)) %>%
write_csv("df.csv")
# read only 1 row to make a spec from with minimal read; really just to get the colnames
df_spec <- spec(read_csv("df.csv",
col_names = TRUE,
n_max = 1))
#alter the spec with base R functions startsWith / endsWith etc.
df_spec$cols <- imap(df_spec$cols,~{if(startsWith(.y,"id")){
col_character()
} else if(startsWith(.y,"data") &
!endsWith(.y,"s")){
col_integer()
} else {
col_skip()
}})
df <- read_csv("df.csv",
col_types = df_spec$cols)https://stackoverflow.com/questions/73642144
复制相似问题