我有一个刑事罪行历史的数据集,它以以下方式列出:
ID Charge Chargedate VictimID ...
1 Robbery 2013-04-05 1
1 Theft 2013-04-06 2
1 Theft 2013-04-07 2
2 Homicide2013-04-08 3
2 Theft 2013-04-09 3
2 Burglary2013-04-10 3
...我想通过两种方式重塑数据集。首先,我想重塑一下,使每一行都对应一个唯一的ID值,而不使用victimID。我还想以计数的方式总结一下指控的存在。例如,与在数据集中有15个盗窃变量不同,我希望只有一个值为15的theftcount变量。
例如:
ID Robberycount Robberydate1 Theftcount Theftdate1 Theftdate2 ...
1 1 2013-04-05 2 2013-04-06 2013-04-07
2 0 NA 1 2013-04-09 NA
...我想创建的另一个数据集涉及重塑数据集,但每行对应于每个唯一的ID和victimID对,例如
ID VictimID Robberycount Robberydate1 Theftcount Theftdate1 Theftdate2 ...
1 1 1 2013-04-05 0 NA NA
1 2 0 NA 2 2013-04-06 2013-04-07
2 3 0 NA 1 2013-04-09 NA
...我已经尝试使用包Melt来做这件事,但我似乎不能得到我想要的结果。特别是,我不知道如何让dcast或melt这样的函数聚合犯罪数据,并为每项指控指定具体日期。有没有一种方法可以在不手动排序数据集的情况下实现我想要的结果?
发布于 2018-01-19 15:12:06
您需要在两个步骤中完成此操作,因此将两次转换为wide。因此,您必须先准备好这两个密钥。那么丑陋的事情是你最终得到了更多的行,这可以用dplyr::summarise和unique来修复(na.rm在unique中是很好的特性;-))。试试这个:
df <- read.table(text = "ID Charge Chargedate VictimID
1 Robbery 2013-04-05 1
1 Theft 2013-04-06 2
1 Theft 2013-04-07 2
2 Homicide 2013-04-08 3
2 Theft 2013-04-09 3
2 Burglary 2013-04-10 3
", header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
# first data frame:
df %>%
group_by(ID, Charge) %>%
mutate(key_date = paste0(Charge, "date", seq_len(n())),
key_count = paste0(Charge, "count"),
count = n()) %>%
ungroup() %>%
select(-Charge, -VictimID) %>%
spread(key = key_count, value = count, fill = 0) %>%
spread(key = key_date, value = Chargedate) %>%
group_by(ID) %>%
mutate_at(.vars = vars(matches("count$")), sum) %>%
summarise_all(.funs = function(x) {
x <- unique(x[!is.na(x)])
ifelse(length(x) == 0, NA_character_, x)
})
# second data frame you asked for:
df %>%
group_by(ID, Charge, VictimID) %>%
mutate(key_date = paste0(Charge, "date", seq_len(n())),
key_count = paste0(Charge, "count"),
count = n()) %>%
ungroup() %>%
select(-Charge) %>%
spread(key = key_count, value = count, fill = 0) %>%
spread(key = key_date, value = Chargedate) %>%
group_by(ID, VictimID) %>%
mutate_at(.vars = vars(matches("count$")), sum) %>%
summarise_all(.funs = function(x) {
x <- unique(x[!is.na(x)])
ifelse(length(x) == 0, NA_character_, x)
})https://stackoverflow.com/questions/48333659
复制相似问题