我正在尝试拆分一个数据帧,并使用一个变量中的唯一值将其写入r中的csv文件。我是r的新手,我不能完全确定我知道自己在做什么。
## trying to subset data
library(dplyr)
library(plyr)
#set the working directory
setwd("S:/some stuff")
## load the datafile into an object called data.
data <- read.csv("S:/some stuff/Area.csv",
header = TRUE, sep = ",")
#Create subsets of data by LA
LA<-subset(data,AREA == "LA")我的数据框有2,500个观察值和20个变量。
我的数据帧被称为LA,我想将其拆分的变量称为疾病
我找到了这个How to create multiple ,csv files in R?
并相应地对其进行重新分配
从…
plyr::d_ply(iris, .(Species), function(x) write.csv(x,
file = paste(x$Species, ".csv", sep = "")))至
plyr::d_ply(LA, .(Disease), function(x) write.csv(x,
file = paste(LA$Disease, ".csv", )))然而……
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
Show Traceback
Rerun with Debug
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument 有两件事我想解决。1)子集数据帧2)写入路径
理想情况下,我希望在导入数据( Area.csv文件)时遍历它。这是有区域和疾病的。有12个地区,20种疾病。我想按区域创建每个疾病的csv文件。在本例中,面积= LA,然后是疾病。
如何使用循环为每个区域创建20个不同的文件?
mpExpenses2012 = read.csv("~/Downloads/DataDownload_2012.csv")
#mpExpenses2012 is the large dataframe containing data for each MP
#Get the list of unique MP names
for (name in levels(mpExpenses2012$MP.s.Name)){
#Subset the data by MP
tmp=subset(mpExpenses2012,MP.s.Name==name)
#Create a new filename for each MP - the folder 'mpExpenses2012' should already exist
fn=paste('mpExpenses2012/',gsub(' ','',name),sep='')
#Save the CSV file containing separate expenses data for each MP
write.csv(tmp,fn,row.names=FALSE)
}可能会有帮助,但它写的是一条让我沮丧的道路。
编辑
library(tidyr)
library(purrr)
temp_dir <- tempfile()
dir.create(temp_dir)
LA %>%
nest(-FinalDiseaseForMonthlyAnalysis) %>%
pwalk(function(FinalDiseaseForMonthlyAnalysis, data) write.csv(data, file.path(temp_dir, paste0(FinalDiseaseForMonthlyAnalysis, ".csv"))))
list.files(temp_dir)
temp_dir
unlink(temp_dir, recursive = T)这是可行的。但是现在出现了“文件在哪里?”有个问题。是:我得到临时文件,然后解除链接。但是如何保存在S:/some stuff/上的文件夹中呢?
编辑最终:解决了我读到r中的所有东西都是一个列表的问题。我找到了一种方法,将其分成两列来做我需要的事情。令人恼火的是,它链接在这里的评论中:https://blog.ouseful.info/2013/04/03/splitting-a-large-csv-file-into-separate-smaller-files-based-on-values-within-a-specific-column/
我错过了。我在使用dir.create生成目录时也遇到了问题。谁知道当你尝试做一些事情时,dir.create需要有recursive = TRUE呢?我现在知道了。
不管怎么说。下面是我所做的:
## trying to subset data
# generate data:
library(tidyr)
library(purrr)
library(dplyr)
library(write)
## set working directory
setwd("S:/somestuff")
#create the directories - pretty sure there's a way to avoid doing this long hand
dir.create("S:/somestuff/CSV source files", recursive = TRUE)
dir.create("S:/somestuff/CSV source files/LA1", recursive = TRUE)
dir.create("S:/somestuff/CSV source files/LA2", recursive = TRUE)
dir.create("S:/somestuff/CSV source files/LA3", recursive = TRUE)
#Read in the CSV
DF = read.csv("S:/somestuff/CSV source files/ALL.csv",
header = TRUE, sep = ",")
glimpse(DF)
#This splits the dataframe generated above (DF) and calls it DF4
DF4 <- split(DF,list(DF$LA,DF$FinalDiseaseForMonthlyAnalysis))
lapply(names(DF4), function(name) write.csv(DF4[[name]], file = paste("S:/somestuff/CSV source files/",gsub('','',name),sep = ''), row.names = F))我猜如果我读入数据帧,我就可以使用dir.create从数据帧中LA中的名称创建路径。
回到问题之后。在最新版本的dplyr中,这要容易得多
ourdata<-DF4%>%
group_by(DF$LA,DF$FinalDiseaseForMonthlyAnalysis)%>%
group_walk(~ write_csv(.x, paste0(.y$LA,.y$FinalDiseaseForMonthlyAnalysis, ".csv")))发布于 2018-04-04 17:03:33
最后,我使用了:
## trying to subset data
# generate data:
library(tidyr)
library(purrr)
library(dplyr)
library(stringr)
library(plyr)
library (car)
## set working directory
setwd("S:/Somestuff/Borough profile maps/Working")
## read data in from geocoded file
geocoded<-read.csv("geocoded 2015 - 2018.csv",na.strings=c(""," ","N/A"))
str(geocoded)
str(geocoded$GENDER)
levels(geocoded$LA)
#split geocoded data by LA
x <-split(geocoded,geocoded$LA)
str(x)
#Split geocoded data by LA and Final
#split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, .)
y<-split(geocoded,list(geocoded$Final,geocoded$LA), drop = TRUE, sep = "_")
str(y)
#create dir and then write CSV files of geocoded to file locations
dir.create("S:/Somestuff/Borough profile maps/Working/TEST/",, recursive = TRUE)
dir.create("S:/Somestuff/Borough profile maps/Working/TEST/TEST2",, recursive = TRUE)
lapply(names(x), function(name) write.csv(x[[name]], file = paste('S:/Somestuff/Borough profile maps/Working/TEST/',gsub(' ','',name),sep = ''), row.names = F))
lapply(names(y),function(name) write.csv(y[[name]], file = paste('S:/Somestuff/Borough profile maps/Working/TEST/TEST2/',name,".csv")))问题在于,在我的原始代码中,您会注意到我使用的是read.csv,但输入的是一个.txt文件。我将文件更改为.csv和BANG。啊,真灵。第一次。
我意识到你并不需要我在开始时调用的所有库,但在我荒谬的尝试次数中,它们被留在了那里。
发布于 2019-10-11 17:14:12
回到问题之后。在最新版本的dplyr中,这要容易得多
DF4%>%
group_by(DF$LA,DF$FinalDiseaseForMonthlyAnalysis)%>%
group_walk(~ write_csv(.x, paste0(.y$LA,.y$FinalDiseaseForMonthlyAnalysis, ".csv")))https://stackoverflow.com/questions/49512137
复制相似问题