我正在尝试dcast我的数据,以便我只能将Actual值分割成一个新的列。但是,我成功地做到这一点的唯一方法是dcast,然后返回melt。我想知道是否有更有效的解决方案。
步骤1:
我已经为我的数据做了一些准备,但是看起来是这样的:
> test_m <- melt(test, id.vars = c("category", "Budget_year", "State"))
> test_m <- test_m[,c("Year", "Type_of_observation"):= tstrsplit(variable, " ", fixed = TRUE)]
> test_m[,variable := NULL]
> head(test_m, n = 10)
category Budget_year State value Year Type_of_observation
1: Transfer Duty 2000_01 N 1916 1998-99 Actual
2: Land Tax 2000_01 N 948 1998-99 Actual
3: Payroll Tax 2000_01 N 3605 1998-99 Actual
4: Total Gambling 2000_01 N 1419 1998-99 Actual
5: GST 2000_01 N 4705 1998-99 Actual
6: Transfer Duty 2000_01 N 1747 1999-00 Budget
7: Land Tax 2000_01 N 830 1999-00 Budget
8: Payroll Tax 2000_01 N 3616 1999-00 Budget
9: Total Gambling 2000_01 N 1558 1999-00 Budget
10: GST 2000_01 N 5162 1999-00 Budget现在,我想从Type_of_observation列中创建一个新的列,但只考虑到Actual观测,并将所有其他观察类型抛在脑后。我目前的方法是dcast,然后是melt,如下所示:
步骤2:期望的输出
> test_c <- dcast(test_m, category + Budget_year + State + Year ~ Type_of_observation)
> test_mc <- melt(test_c, id.vars = c("category", "Budget_year", "State", "Year", "Actual"), measure.vars = c("Budget", "Estimate", "Revised"))
> head(test_mc, n = 10)
category Budget_year State Year Actual variable value
1: GST 2000_01 N 1998-99 4705 Budget NA
2: GST 2000_01 N 1999-00 NA Budget 5162
3: GST 2000_01 N 2000-01 NA Budget 8318
4: GST 2000_01 N 2001-02 NA Budget NA
5: GST 2000_01 N 2002-03 NA Budget NA
6: GST 2000_01 N 2003-04 NA Budget NA
7: Land Tax 2000_01 N 1998-99 948 Budget NA
8: Land Tax 2000_01 N 1999-00 NA Budget 830
9: Land Tax 2000_01 N 2000-01 NA Budget 921
10: Land Tax 2000_01 N 2001-02 NA Budget NA现在我有了一个Actuals的列,所有其他类型的观察都保留在variable列中。
有什么方法可以让我不用做test_m和melt就可以从dcast到test_mc呢?我最好是在寻找data.table解决方案,但对任何事情都是开放的。
这是dput for test_m
> dput(test_m)
structure(list(category = c("Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax",
"Total Gambling", "GST"), Budget_year = c("2000_01", "2000_01",
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01",
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01",
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01",
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01",
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01",
"2000_01", "2000_01", "2000_01"), State = c("N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N"), value = c("1916", "948", "3605", "1419",
"4705", "1747", "830", "3616", "1558", "5162", "2249", "889",
"3742", "1578", "5173", "1746", "921", "3931", "1212", "8318",
"1686", "948", "4146", "1241", "9520", "1756", "971", "4258",
"1309", "9789", "1904", "991", "4503", "1374", "10006"), Year = c("1998-99",
"1998-99", "1998-99", "1998-99", "1998-99", "1999-00", "1999-00",
"1999-00", "1999-00", "1999-00", "1999-00", "1999-00", "1999-00",
"1999-00", "1999-00", "2000-01", "2000-01", "2000-01", "2000-01",
"2000-01", "2001-02", "2001-02", "2001-02", "2001-02", "2001-02",
"2002-03", "2002-03", "2002-03", "2002-03", "2002-03", "2003-04",
"2003-04", "2003-04", "2003-04", "2003-04"), Type_of_observation = c("Actual",
"Actual", "Actual", "Actual", "Actual", "Budget", "Budget", "Budget",
"Budget", "Budget", "Revised", "Revised", "Revised", "Revised",
"Revised", "Budget", "Budget", "Budget", "Budget", "Budget",
"Estimate", "Estimate", "Estimate", "Estimate", "Estimate", "Estimate",
"Estimate", "Estimate", "Estimate", "Estimate", "Estimate", "Estimate",
"Estimate", "Estimate", "Estimate")), .Names = c("category",
"Budget_year", "State", "value", "Year", "Type_of_observation"
), row.names = c(NA, -35L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x026c24a0>)发布于 2018-07-25 03:19:01
您可以先完成案例,然后加入您的数据集。
最后,执行一个update来查找实际值。
#create complete cases
ans <- test_m[CJ(category=category, Budget_year=Budget_year, State=State, Year=Year, Type_of_observation=c("Budget", "Estimate", "Revised"), unique=TRUE),
on=.(category, Budget_year, State, Year, Type_of_observation)][
#update join
test_m[Type_of_observation=="Actual"],
Actual := i.value,
on=.(category, Budget_year, State, Year)]
#order to match test_mc
setorder(ans, category, Budget_year, State, Year, Type_of_observation)[]发布于 2018-10-18 02:25:46
我认为我有一个简单的data.table方法来实现这一点,使用setkey并在括号内加入。
我将使用一个更简单的data.table。目标是将interest_rate放到自己的列中。
samp <- data.table(
group=c("a","a","a","b","b","b","c","c","c"),
variable=c("balance", "end_balance","interest_rate"),
value=c(1000, 940, .05, 1200, 1040, .08, 980, 970, .10)
)
setkey(samp, group)
# This will create a data.table with just our desired variable value, interest_rate, by group
samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]
# We then join this to the original data.table using the already set key and
# drop the interest_rate rows in the final data.table
samp[samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]][variable!="interest_rate"]https://stackoverflow.com/questions/51510008
复制相似问题