我有以下数据:
ID city gender total 0-4 5-9 10-14
1 A1 city1 M 120 30 50 40
2 A1 city1 F 100 20 45 35
3 B1 city1 F 130 35 50 45
4 B1 city1 M 150 30 60 60
5 C1 city2 M 140 40 50 50
6 C1 city2 F 135 35 45 55
7 D1 city2 M 145 40 55 50
8 D1 city2 F 165 65 35 65
9 E1 city2 M 155 50 50 55
10 E1 city2 F 160 50 45 65这里0-4,5-9,10-14是年龄组。
如何以以下形式轻松更改此数据:
ID city total male female meanage male%
1 A1 city1 220 120 100 7.57 54
2 B1 city1 280 150 130 8.5 53
3 C1 city2 --- --- --- ---- ----
4 D1 city2 -- --- --- --- -----
5 E1 city2 -- ---- ---- ----- -------谢谢。
发布于 2014-05-28 10:32:22
你真的想做很多复杂的转换。我看不到一步就能完成所有这些任务的简单方法。首先,下面是一个更友好的示例数据
dd<-data.frame(
ID = paste0(rep(LETTERS[1:5],each=2),"1"),
city = rep(c("city1", "city2"), c(4,6)),
gender = c("M", "F", "F", "M", "M", "F", "M", "F", "M", "F"),
total = c(120, 100, 130, 150, 140, 135, 145, 165, 155, 160),
X0.4 = c(30, 20, 35, 30, 40, 35, 40, 65, 50, 50),
X5.9 = c(50, 45, 50, 60, 50, 45, 55, 35, 50, 45),
X10.14 = c(40, 35, 45, 60, 50, 55, 50, 65, 55, 65)
)现在我们需要计算一个总的加权年龄,然后再对数据进行整形
tt<-transform(dd,
wage=2*X0.4 + 7*X5.9 + 12*X10.14,
X0.4=NULL,
X5.9=NULL,
X10.14=NULL
)现在我们将对数据进行整形。它不太“宽”,所以我们将它融化,并将其加宽,以便我们为每个性别的总数和工资列。
library(reshape2)
mm<-melt(tt, id.vars=c("ID","city","gender"))
cc<-dcast(mm, ID+city~gender+variable)现在,我们有了每一行所需的所有数据,因此我们执行最后一组转换,以获得所需的列
transform(cc, total=F_total+M_total,
male=M_total,
female=F_total,
meanage = (F_wage+M_wage)/(M_total+F_total),
maleperc = M_total/(M_total+F_total) * 100,
F_total=NULL, M_total=NULL,
F_wage=NULL, M_wage=NULL
)它会产生
ID city total male female meanage maleperc
1 A1 city1 220 120 100 7.568182 54.54545
2 B1 city1 280 150 130 7.714286 53.57143
3 C1 city2 275 140 135 7.545455 50.90909
4 D1 city2 310 145 165 7.161290 46.77419
5 E1 city2 315 155 160 7.317460 49.20635所以我不确定我是否同意你对B1的平均年龄。如果您正在使用其他公式来计算该值,请随时更新代码。
发布于 2014-05-28 10:33:28
这或多或少会对数据的前几行执行您想要的操作。(我认为你对B1的meanage的计算是错误的,但根据年龄范围计算平均值对我来说似乎不是很合理)。
library('dplyr')
library('reshape2')
yourdata <- data.frame(ID = c("A1", "A1", "B1", "B1", "C1", "C1"),
city = c(rep("city1", 4), "city2", "city2"),
gender = c("M", "F", "F", "M", "M", "F"),
total = c(120, 100, 130, 150, 140, 135),
X0_4 = c(30, 20, 35, 30, 40, 35),
X5_9 = c(50, 45, 50, 60, 50, 45),
X10_14 = c(40, 35, 45, 60, 50, 55))
# total counts of each age group for each ID
age_counts <- yourdata %>%
group_by(ID) %>%
summarize(X0_4 = sum(X0_4), X5_9 = sum(X5_9), X10_14 = sum(X10_14))
# use dcast to get 'male' and 'female' columns
data_wide <- dcast(yourdata, ID + city ~ gender, value.var = 'total')
# merge back with age counts
data_full <- merge(data_wide, age_counts, by = 'ID')
# final table
data_full %>%
group_by(ID) %>%
mutate(total = sum(F, M),
meanage = ((X0_4 * ((0 + 4) / 2) + X5_9 * ((5 + 9) / 2) + X10_14 *
((10 + 14) / 2)) / total),
male_percent = ((M * 100) %/% total)) %>%
select(ID, city, total, M, F, meanage, male_percent)https://stackoverflow.com/questions/23900751
复制相似问题