假设我有一个data.frame
> ID=c("S","B","S","S","B","S","S","S","B","B","S") #"S" means Sell, "B" means Buy
> quant=c(3,2,5,1,2,10,4,4,17,6,6)
> time=c(1,2,3,3,4,4,5,5,5,6,6)
> test=data.frame(ID,quant,time)
ID quant time
1 S 3 1
2 B 2 2
3 S 5 3
4 S 1 3
5 B 2 4
6 S 10 4
7 S 4 5
8 S 4 5
9 B 17 5
9 B 6 6
9 S 6 6我需要计算每一次的净位置,以便我的最后输出是
ID quant time
1 S 3 1
2 B 2 2
3 S 6 3 #Sell 5+1=6
4 S 8 4 #Sell 10-2=8
5 B 9 5 #Buy 17-4-4=9
6 B 0 6 #this row is optional because the position is 0 so I don't need it我想我需要先按时间对data.frame进行分组,然后再按ID进行分组,然后如何计算净位置呢?我尝试过使用aggregate,但它似乎只适用于按一列分组吗?谢谢!
发布于 2014-07-23 10:59:21
另一个想法,可能看上去有点脆弱,但似乎适用于具体情况:
tmp = diff(xtabs(quant ~ ID + time, test))
data.frame(ID = c(ifelse(tmp > 0, "S", "B")),
time = colnames(tmp),
quant = c(abs(tmp)))
# ID time quant
#1 S 1 3
#2 B 2 2
#3 S 3 6
#4 S 4 8
#5 B 5 9
#6 B 6 0一个典型的R基方法:
do.call(rbind,
lapply(split(test, test$time),
function(x) {
s = sum(x[["quant"]][x[["ID"]] == "S"])
b = sum(x[["quant"]][x[["ID"]] == "B"])
data.frame(time = x$time[1],
quant = abs(s - b),
ID = if(s > b) "S" else "B")
}))
# time quant ID
#1 1 3 S
#2 2 2 B
#3 3 6 S
#4 4 8 S
#5 5 9 B
#6 6 0 B发布于 2014-07-23 10:42:43
使用data.table
library(data.table)
setDT(test)[, list(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])),
by = time][, list(ID = ifelse(quant > 0, "B", "S"), quant = abs(quant), time)]
# ID quant time
# 1: S 3 1
# 2: B 2 2
# 3: S 6 3
# 4: S 8 4
# 5: B 9 5
# 6: S 0 6发布于 2014-07-23 10:46:46
下面是dplyr的一个解决方案。
library(dplyr)
result <- test %>%
group_by(time) %>%
summarise(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])) %>%
mutate(ID = c("S", "B")[(quant >= 0) + 1], quant = abs(quant))
# time quant ID
# 1 1 3 S
# 2 2 2 B
# 3 3 6 S
# 4 4 8 S
# 5 5 9 B
# 6 6 0 Bc("S", "B")[(quant >= 0) + 1]是什么意思?
(quant >= 0)创建一个逻辑向量,指示quant是否等于或大于0。FALSE转换为0,将TRUE转换为1。因此,这个命令产生一个1s和2s的向量。c("S", "B")的索引向量,从而得到"S"s和"B"s的特征向量。https://stackoverflow.com/questions/24908074
复制相似问题