首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >只将一个变量转换为新列R

只将一个变量转换为新列R
EN

Stack Overflow用户
提问于 2018-07-25 02:30:37
回答 2查看 391关注 0票数 2

我正在尝试dcast我的数据,以便我只能将Actual值分割成一个新的列。但是,我成功地做到这一点的唯一方法是dcast,然后返回melt。我想知道是否有更有效的解决方案。

步骤1:

我已经为我的数据做了一些准备,但是看起来是这样的:

代码语言:javascript
复制
> test_m <- melt(test, id.vars = c("category", "Budget_year", "State"))
> test_m <- test_m[,c("Year", "Type_of_observation"):= tstrsplit(variable, " ", fixed = TRUE)]
> test_m[,variable := NULL]
> head(test_m, n = 10)

          category Budget_year State value    Year Type_of_observation
 1:  Transfer Duty     2000_01     N  1916 1998-99              Actual
 2:       Land Tax     2000_01     N   948 1998-99              Actual
 3:    Payroll Tax     2000_01     N  3605 1998-99              Actual
 4: Total Gambling     2000_01     N  1419 1998-99              Actual
 5:            GST     2000_01     N  4705 1998-99              Actual
 6:  Transfer Duty     2000_01     N  1747 1999-00              Budget
 7:       Land Tax     2000_01     N   830 1999-00              Budget
 8:    Payroll Tax     2000_01     N  3616 1999-00              Budget
 9: Total Gambling     2000_01     N  1558 1999-00              Budget
10:            GST     2000_01     N  5162 1999-00              Budget

现在,我想从Type_of_observation列中创建一个新的列,但只考虑到Actual观测,并将所有其他观察类型抛在脑后。我目前的方法是dcast,然后是melt,如下所示:

步骤2:期望的输出

代码语言:javascript
复制
> test_c <- dcast(test_m, category + Budget_year + State + Year ~ Type_of_observation)
> test_mc <- melt(test_c, id.vars = c("category", "Budget_year", "State", "Year", "Actual"), measure.vars = c("Budget", "Estimate", "Revised"))
> head(test_mc, n = 10)
    category Budget_year State    Year Actual variable value
 1:      GST     2000_01     N 1998-99   4705   Budget    NA
 2:      GST     2000_01     N 1999-00     NA   Budget  5162
 3:      GST     2000_01     N 2000-01     NA   Budget  8318
 4:      GST     2000_01     N 2001-02     NA   Budget    NA
 5:      GST     2000_01     N 2002-03     NA   Budget    NA
 6:      GST     2000_01     N 2003-04     NA   Budget    NA
 7: Land Tax     2000_01     N 1998-99    948   Budget    NA
 8: Land Tax     2000_01     N 1999-00     NA   Budget   830
 9: Land Tax     2000_01     N 2000-01     NA   Budget   921
10: Land Tax     2000_01     N 2001-02     NA   Budget    NA

现在我有了一个Actuals的列,所有其他类型的观察都保留在variable列中。

有什么方法可以让我不用做test_mmelt就可以从dcasttest_mc呢?我最好是在寻找data.table解决方案,但对任何事情都是开放的。

这是dput for test_m

代码语言:javascript
复制
> dput(test_m)
structure(list(category = c("Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST", "Transfer Duty", "Land Tax", "Payroll Tax", 
"Total Gambling", "GST"), Budget_year = c("2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01", "2000_01", "2000_01", "2000_01", 
"2000_01", "2000_01", "2000_01"), State = c("N", "N", "N", "N", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N"), value = c("1916", "948", "3605", "1419", 
"4705", "1747", "830", "3616", "1558", "5162", "2249", "889", 
"3742", "1578", "5173", "1746", "921", "3931", "1212", "8318", 
"1686", "948", "4146", "1241", "9520", "1756", "971", "4258", 
"1309", "9789", "1904", "991", "4503", "1374", "10006"), Year = c("1998-99", 
"1998-99", "1998-99", "1998-99", "1998-99", "1999-00", "1999-00", 
"1999-00", "1999-00", "1999-00", "1999-00", "1999-00", "1999-00", 
"1999-00", "1999-00", "2000-01", "2000-01", "2000-01", "2000-01", 
"2000-01", "2001-02", "2001-02", "2001-02", "2001-02", "2001-02", 
"2002-03", "2002-03", "2002-03", "2002-03", "2002-03", "2003-04", 
"2003-04", "2003-04", "2003-04", "2003-04"), Type_of_observation = c("Actual", 
"Actual", "Actual", "Actual", "Actual", "Budget", "Budget", "Budget", 
"Budget", "Budget", "Revised", "Revised", "Revised", "Revised", 
"Revised", "Budget", "Budget", "Budget", "Budget", "Budget", 
"Estimate", "Estimate", "Estimate", "Estimate", "Estimate", "Estimate", 
"Estimate", "Estimate", "Estimate", "Estimate", "Estimate", "Estimate", 
"Estimate", "Estimate", "Estimate")), .Names = c("category", 
"Budget_year", "State", "value", "Year", "Type_of_observation"
), row.names = c(NA, -35L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x026c24a0>)
EN

回答 2

Stack Overflow用户

发布于 2018-07-25 03:19:01

您可以先完成案例,然后加入您的数据集。

最后,执行一个update来查找实际值。

代码语言:javascript
复制
#create complete cases
ans <- test_m[CJ(category=category, Budget_year=Budget_year, State=State, Year=Year, Type_of_observation=c("Budget", "Estimate", "Revised"), unique=TRUE),
    on=.(category, Budget_year, State, Year, Type_of_observation)][
        #update join
        test_m[Type_of_observation=="Actual"], 
        Actual := i.value,
        on=.(category, Budget_year, State, Year)]

#order to match test_mc
setorder(ans, category, Budget_year, State, Year, Type_of_observation)[]
票数 1
EN

Stack Overflow用户

发布于 2018-10-18 02:25:46

我认为我有一个简单的data.table方法来实现这一点,使用setkey并在括号内加入。

我将使用一个更简单的data.table。目标是将interest_rate放到自己的列中。

代码语言:javascript
复制
samp <- data.table(
  group=c("a","a","a","b","b","b","c","c","c"),
  variable=c("balance", "end_balance","interest_rate"),
  value=c(1000, 940, .05, 1200, 1040, .08, 980, 970, .10)
)


setkey(samp, group)

#  This will create a data.table with just our desired variable value, interest_rate, by group
samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]

#  We then join this to the original data.table using the already set key and
#  drop the interest_rate rows in the final data.table
samp[samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]][variable!="interest_rate"]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51510008

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档