我有以下表格的R数据:
FIRM WORKER HOURS
FIRM1 A1 H1
FIRM1 A2 H2
FIRM1 A3 H3
FIRM1 B1 H4
FIRM1 B2 H5
FIRM2 A1 H6
FIRM2 C1 H7有些公司拥有不同教育类别的工人(A、B、C、.)。我想对数据进行转换,这样教育类别就可以总结成一个列,所有的公司都只有一行。因此,我需要将初始数据转换为以下形式:
FIRM HOURS_A HOURS_B HOURS_C
FIRM1 H1+H2+H3 H4+H5
FIRM2 H6 H7做这件事最整洁的方法是什么?
发布于 2013-09-03 17:15:47
先聚合,然后重塑:
数据:
x <- read.table(header=TRUE, text="
FIRM WORKER HOURS
FIRM1 A1 1
FIRM1 A2 2
FIRM1 A3 3
FIRM1 B1 4
FIRM1 B2 5
FIRM2 A1 6
FIRM2 C1 7
")代码:
tmp <- aggregate(HOURS~FIRM+WORK, data=within(x, WORK <- substr(WORKER,1,1)), sum)
reshape(tmp, idvar="FIRM", timevar="WORK", direction="wide")结果:
FIRM HOURS.A HOURS.B HOURS.C
1 FIRM1 6 9 NA
2 FIRM2 6 NA 7发布于 2013-09-03 17:17:49
我假设您的意思是您实际上想要对某些值进行求和,并且您的data.frame看起来如下所示:
mydf <- structure(
list(FIRM = c("FIRM1", "FIRM1", "FIRM1", "FIRM1", "FIRM1", "FIRM2", "FIRM2"),
WORKER = c("A", "A", "A", "B", "B", "A", "C"),
HOURS = c(10L, 20L, 15L, 13L, 12L, 9L, 16L)),
.Names = c("FIRM", "WORKER", "HOURS"),
class = "data.frame", row.names = c(NA, -7L))
mydf
# FIRM WORKER HOURS
# 1 FIRM1 A 10
# 2 FIRM1 A 20
# 3 FIRM1 A 15
# 4 FIRM1 B 13
# 5 FIRM1 B 12
# 6 FIRM2 A 9
# 7 FIRM2 C 16然后,您可以使用xtabs
xtabs(HOURS ~ FIRM + WORKER, mydf)
# WORKER
# FIRM A B C
# FIRM1 45 25 0
# FIRM2 9 0 16或者,您可以对数据集进行melt并使用dcast对其进行整形。
library(reshape2)
dfL <- melt(mydf, id.vars=c("FIRM", "WORKER"))
dcast(dfL, FIRM ~ variable + WORKER, fun.aggregate=sum, value.var="value")
# FIRM HOURS_A HOURS_B HOURS_C
# 1 FIRM1 45 25 0
# 2 FIRM2 9 0 16https://stackoverflow.com/questions/18597739
复制相似问题