首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R Plyr写入CSV

R Plyr写入CSV
EN

Stack Overflow用户
提问于 2018-03-27 19:50:20
回答 2查看 750关注 0票数 1

我正在尝试拆分一个数据帧,并使用一个变量中的唯一值将其写入r中的csv文件。我是r的新手,我不能完全确定我知道自己在做什么。

代码语言:javascript
复制
 ## trying to subset data
    library(dplyr)
    library(plyr)
    #set the working directory
    setwd("S:/some stuff")

    ## load the datafile into an object called data. 
    data <- read.csv("S:/some stuff/Area.csv",
                       header = TRUE, sep = ",")
#Create subsets of data by LA
LA<-subset(data,AREA == "LA")

我的数据框有2,500个观察值和20个变量。

我的数据帧被称为LA,我想将其拆分的变量称为疾病

我找到了这个How to create multiple ,csv files in R?

并相应地对其进行重新分配

从…

代码语言:javascript
复制
plyr::d_ply(iris, .(Species), function(x) write.csv(x, 
  file = paste(x$Species, ".csv", sep = "")))

代码语言:javascript
复制
plyr::d_ply(LA, .(Disease), function(x) write.csv(x,
file = paste(LA$Disease, ".csv", )))

然而……

代码语言:javascript
复制
Error in file(file, ifelse(append, "a", "w")) : 
  invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :

 Show Traceback

 Rerun with Debug
 Error in file(file, ifelse(append, "a", "w")) : 
  invalid 'description' argument 

有两件事我想解决。1)子集数据帧2)写入路径

理想情况下,我希望在导入数据( Area.csv文件)时遍历它。这是有区域和疾病的。有12个地区,20种疾病。我想按区域创建每个疾病的csv文件。在本例中,面积= LA,然后是疾病。

如何使用循环为每个区域创建20个不同的文件?

我是这样想的:https://blog.ouseful.info/2013/04/03/splitting-a-large-csv-file-into-separate-smaller-files-based-on-values-within-a-specific-column/

代码语言:javascript
复制
    mpExpenses2012 = read.csv("~/Downloads/DataDownload_2012.csv")
#mpExpenses2012 is the large dataframe containing data for each MP
#Get the list of unique MP names
for (name in levels(mpExpenses2012$MP.s.Name)){
  #Subset the data by MP
  tmp=subset(mpExpenses2012,MP.s.Name==name)
  #Create a new filename for each MP - the folder 'mpExpenses2012' should already exist
  fn=paste('mpExpenses2012/',gsub(' ','',name),sep='')
  #Save the CSV file containing separate expenses data for each MP
  write.csv(tmp,fn,row.names=FALSE)
}

可能会有帮助,但它写的是一条让我沮丧的道路。

编辑

代码语言:javascript
复制
library(tidyr)
library(purrr)
temp_dir <- tempfile()
dir.create(temp_dir)

LA %>%
  nest(-FinalDiseaseForMonthlyAnalysis) %>% 
  pwalk(function(FinalDiseaseForMonthlyAnalysis, data) write.csv(data, file.path(temp_dir, paste0(FinalDiseaseForMonthlyAnalysis, ".csv"))))
list.files(temp_dir)
temp_dir
unlink(temp_dir, recursive = T)

这是可行的。但是现在出现了“文件在哪里?”有个问题。是:我得到临时文件,然后解除链接。但是如何保存在S:/some stuff/上的文件夹中呢?

编辑最终:解决了我读到r中的所有东西都是一个列表的问题。我找到了一种方法,将其分成两列来做我需要的事情。令人恼火的是,它链接在这里的评论中:https://blog.ouseful.info/2013/04/03/splitting-a-large-csv-file-into-separate-smaller-files-based-on-values-within-a-specific-column/

我错过了。我在使用dir.create生成目录时也遇到了问题。谁知道当你尝试做一些事情时,dir.create需要有recursive = TRUE呢?我现在知道了。

不管怎么说。下面是我所做的:

代码语言:javascript
复制
## trying to subset data
# generate data:
library(tidyr)
    library(purrr)
    library(dplyr)
    library(write)
 ## set working directory
    setwd("S:/somestuff")

    #create the directories - pretty sure there's a way to avoid doing this long hand
    dir.create("S:/somestuff/CSV source files", recursive = TRUE)
    dir.create("S:/somestuff/CSV source files/LA1", recursive = TRUE)
    dir.create("S:/somestuff/CSV source files/LA2", recursive = TRUE)
    dir.create("S:/somestuff/CSV source files/LA3", recursive = TRUE)


    #Read in the CSV
    DF = read.csv("S:/somestuff/CSV source files/ALL.csv",
                       header = TRUE, sep = ",")
    glimpse(DF) 

    #This splits the dataframe generated above (DF) and calls it DF4
    DF4 <- split(DF,list(DF$LA,DF$FinalDiseaseForMonthlyAnalysis))
    lapply(names(DF4), function(name) write.csv(DF4[[name]], file = paste("S:/somestuff/CSV source files/",gsub('','',name),sep = ''), row.names = F))

我猜如果我读入数据帧,我就可以使用dir.create从数据帧中LA中的名称创建路径。

回到问题之后。在最新版本的dplyr中,这要容易得多

代码语言:javascript
复制
ourdata<-DF4%>%
  group_by(DF$LA,DF$FinalDiseaseForMonthlyAnalysis)%>%
  group_walk(~ write_csv(.x, paste0(.y$LA,.y$FinalDiseaseForMonthlyAnalysis, ".csv")))
EN

回答 2

Stack Overflow用户

发布于 2018-04-04 17:03:33

最后,我使用了:

代码语言:javascript
复制
## trying to subset data
# generate data:
library(tidyr)
library(purrr)
library(dplyr)
library(stringr)
library(plyr)
library (car)
## set working directory
setwd("S:/Somestuff/Borough profile maps/Working")

## read data in from geocoded file
geocoded<-read.csv("geocoded 2015 - 2018.csv",na.strings=c(""," ","N/A"))

str(geocoded)
str(geocoded$GENDER)
levels(geocoded$LA)

#split geocoded data by LA 
x <-split(geocoded,geocoded$LA)
str(x)

#Split geocoded data by LA and Final
#split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, .)
y<-split(geocoded,list(geocoded$Final,geocoded$LA), drop = TRUE, sep = "_")
str(y)

#create dir and then write CSV files of geocoded to file locations
dir.create("S:/Somestuff/Borough profile maps/Working/TEST/",, recursive = TRUE)
dir.create("S:/Somestuff/Borough profile maps/Working/TEST/TEST2",, recursive = TRUE)
lapply(names(x), function(name) write.csv(x[[name]], file = paste('S:/Somestuff/Borough profile maps/Working/TEST/',gsub(' ','',name),sep = ''), row.names = F))
lapply(names(y),function(name) write.csv(y[[name]], file = paste('S:/Somestuff/Borough profile maps/Working/TEST/TEST2/',name,".csv")))

问题在于,在我的原始代码中,您会注意到我使用的是read.csv,但输入的是一个.txt文件。我将文件更改为.csv和BANG。啊,真灵。第一次。

我意识到你并不需要我在开始时调用的所有库,但在我荒谬的尝试次数中,它们被留在了那里。

票数 0
EN

Stack Overflow用户

发布于 2019-10-11 17:14:12

回到问题之后。在最新版本的dplyr中,这要容易得多

代码语言:javascript
复制
DF4%>%
      group_by(DF$LA,DF$FinalDiseaseForMonthlyAnalysis)%>%
      group_walk(~ write_csv(.x, paste0(.y$LA,.y$FinalDiseaseForMonthlyAnalysis, ".csv")))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49512137

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档