我是R的新用户,正在尝试创建数据帧的多个子样本。我将我的数据分配给了4个层(STRATUM = 1,2,3,4),并且希望在每个层中只随机保留指定数量的行。为此,我导入数据,按分层值排序,然后为每行分配一个随机数。我想保留我最初的随机数赋值,因为我需要在将来的分析中再次使用它们,所以我用这些值保存了一个.csv。接下来,我按层对数据进行子集,然后指定要在每个层中保留的记录数。最后,我重新连接数据并保存为新的.csv。代码可以工作,但是,我想重复这个过程100次。在每种情况下,我都希望保存分配了随机数的.csv,以及随机选择的绘图的最终.csv。我不确定如何让这段代码重复100x,也不确定如何为每次迭代分配一个唯一的文件名。任何帮助都将不胜感激。
DataFiles <- "//Documents/flownData_JR.csv"
PlotsFlown <- read.table (file = DataFiles, header = TRUE, sep = ",")
#Sort the data by the stratification
FlownStratSort <- PlotsFlown[order(PlotsFlown$STRATUM),]
#Create a new column with a random number (no duplicates)
FlownStratSort$RAND_NUM <- sample(137, size = nrow(FlownStratSort), replace = FALSE)
#Sort by the stratum, then random number
FLOWNRAND <- FlownStratSort[order(FlownStratSort$STRATUM,FlownStratSort$RAND_NUM),]
#Save a csv file with the random numbers
write.table(FLOWNRAND, file = "//Documents/RANDNUM1_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)
#Subset the data by stratum
FLOWNRAND1 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='1'),]
FLOWNRAND2 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='2'),]
FLOWNRAND3 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='3'),]
FLOWNRAND4 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='4'),]
#Remove data from each stratum, specifying the number of records we want to retain
FLOWNRAND1 <- FLOWNRAND1[1:34, ]
FLOWNRAND2 <- FLOWNRAND2[1:21, ]
FLOWNRAND3 <- FLOWNRAND3[1:7, ]
FLOWNRAND4 <- FLOWNRAND4[1:7, ]
#Rejoin the data
FLOWNRAND_uneven <- rbind(FLOWNRAND1, FLOWNRAND2, FLOWNRAND3, FLOWNRAND4)
#Save the table with plots removed from each stratum flown in 2017
write.table(FLOWNRAND_uneven, file = "//Documents/Flown_RAND_uneven_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)发布于 2017-06-02 05:40:19
如果您只需要知道每个集合中有哪些行,这里有一个data.table解决方案。
library(data.table)
df <- data.table(dat = runif(100),
stratum = sample(1:4, 100, replace = T))
# Gets specified number randomly from each strata
get_strata <- function(df, n, i){
# Subset data frame to randomly chosen w/in strata
# replace stratum with var name
f <- df[df[, .I[sample(.N, n)], by = stratum]$V1]
# Save as CSV, replace path
write.csv(f, file = paste0("path/df_", i),
row.names = F, col.names = T)
}
for (i in 1:100){
# replace 10 with number needed
get_strata(df, 10, i)
}https://stackoverflow.com/questions/44317198
复制相似问题