我在R中设计了一个神经网络,为此我必须准备我的数据并导入一个表。
例如:
time hour Money day
1: 20000616 1 9.35 5
2: 20000616 2 6.22 5
3: 20000616 3 10.65 5
4: 20000616 4 11.42 5
5: 20000616 5 10.12 5
6: 20000616 6 7.32 5现在我需要一个假人。我的最后一张桌子应该是这样的:
time Money day 1 2 3 4 5 6
1: 20000616 9.35 5 1 0 0 0 0 0
2: 20000616 6.22 5 0 1 0 0 0 0
3: 20000616 10.65 5 0 0 1 0 0 0
4: 20000616 11.42 5 0 0 0 1 0 0
5: 20000616 10.12 5 0 0 0 0 1 0
6: 20000616 7.32 5 0 0 0 0 0 1有没有一种简单的方法/聪明的方法将我的表转换成新的布局?还是在R中编程?我需要在R,而不是在进口之前这么做。
提前感谢
发布于 2018-02-06 03:30:33
您可以通过使用dummies包轻松地创建虚拟变量。
library(dummies)
df <- data.frame(
time = c(20000616, 20000616, 20000616, 20000616, 20000616, 20000616),
hour = c(1, 2, 3, 4, 5, 6),
Money = c(9.35, 6.22, 10.65, 11.42, 10.12, 7.32),
day = c(5, 5, 5, 5, 5, 5))
# Specify the categorical variables in the dummy.data.frame function.
df_dummy <- dummy.data.frame(df, names=c("hour"), sep="_")
names(df_dummy) <- c("time", 1:6, "Money", "day")
df_dummy <- df_dummy[c("time", "Money", "day", 1:6)]
df_dummy
# time Money day 1 2 3 4 5 6
# 1 20000616 9.35 5 1 0 0 0 0 0
# 2 20000616 6.22 5 0 1 0 0 0 0
# 3 20000616 10.65 5 0 0 1 0 0 0
# 4 20000616 11.42 5 0 0 0 1 0 0
# 5 20000616 10.12 5 0 0 0 0 1 0
# 6 20000616 7.32 5 0 0 0 0 0 1发布于 2018-02-05 19:56:01
使用data.table的可能解决方案(您可以很好地使用它):
dt[dcast(dt, hour ~ hour, value.var = 'hour', fun = length), on = .(hour)]这意味着:
time hour Money day 1 2 3 4 5 6 1: 20000616 1 9.35 5 1 0 0 0 0 0 2: 20000616 2 6.22 5 0 1 0 0 0 0 3: 20000616 3 10.65 5 0 0 1 0 0 0 4: 20000616 4 11.42 5 0 0 0 1 0 0 5: 20000616 5 10.12 5 0 0 0 0 1 0 6: 20000616 6 7.32 5 0 0 0 0 0 1
我想,在真实的数据集中,time和day会有更多的变化,然后您可以将代码调整为:
dt[dcast(dt, time + day + hour ~ hour, value.var = 'hour', fun = length)
, on = .(time, day, hour)]使用的数据:
dt <- fread(' time hour Money day
20000616 1 9.35 5
20000616 2 6.22 5
20000616 3 10.65 5
20000616 4 11.42 5
20000616 5 10.12 5
20000616 6 7.32 5')发布于 2018-02-05 20:41:57
基本解决办法可以是:
dat <- data.frame(time = c(20000616, 20000616, 20000616, 20000616, 20000616, 20000616),
hour = c(1, 2, 3, 4, 5, 6),
Money = c(9.35, 6.22, 10.65, 11.42, 10.12, 7.32),
day = c(5, 5, 5, 5, 5, 5) )
dat$dummy_day <- factor(dat$day, levels = 1:7)
model.matrix(~time + hour + Money + day + dummy_day, dat,
contrasts = list(dummy_day = "contr.SAS"))它返回一个矩阵:
(Intercept) time hour Money day dummy_day1 dummy_day2 dummy_day3 dummy_day4 dummy_day5 dummy_day6
1 1 20000616 1 9.35 5 0 0 0 0 1 0
2 1 20000616 2 6.22 5 0 0 0 0 1 0
3 1 20000616 3 10.65 5 0 0 0 0 1 0
4 1 20000616 4 11.42 5 0 0 0 0 1 0
5 1 20000616 5 10.12 5 0 0 0 0 1 0
6 1 20000616 6 7.32 5 0 0 0 0 1 0
attr(,"assign")
[1] 0 1 2 3 4 5 5 5 5 5 5
attr(,"contrasts")
attr(,"contrasts")$dummy_day
[1] "contr.SAS"https://stackoverflow.com/questions/48630405
复制相似问题