我希望在下面的数据中计算每个人(hai_dispense_number)每月(月)行数。我的总体目标是观察从4月到9月的平均行数是否增加。我确信我应该使用ave函数来创建一个count变量。但我所有的尝试对我都没有用。见下面的尝试。一旦我做了统计,我想我将能够使用ddply做一个平均每月总结。下面是一个玩具df,列“obs”是我想要的输出。
df
hai_dispense_number date_of_claim hai_atc month obs
9972511 Patient HAI0002664 2010-04-07 A10BA02 april 1
11376245 Patient HAI0002664 2010-05-04 A10BA02 may 1
12508505 Patient HAI0002664 2010-05-31 A10BA02 may 2
13480611 Patient HAI0002664 2010-06-30 A10BA02 june 1
13486327 Patient HAI0002664 2010-06-30 A10BH03 june 2
13567944 Patient HAI0002664 2010-06-08 A10BA02 june 3
15003657 Patient HAI0002664 2010-07-27 A10BA02 july 1
15003658 Patient HAI0002664 2010-07-27 A10BH03 july 2
16600413 Patient HAI0002664 2010-08-31 A10BB09 august 1
16600866 Patient HAI0002664 2010-08-23 A10BA02 august 2
16600867 Patient HAI0002664 2010-08-23 A10BH03 august 3
17537505 Patient HAI0002664 2010-08-27 A10BB09 august 4
19176349 Patient HAI0002664 2010-09-17 A10BB09 september 1
19176350 Patient HAI0002664 2010-09-17 A10BH03 september 2
19176358 Patient HAI0002664 2010-09-17 A10BA02 september 3
17765433 Patient HAI0006637 2010-09-17 A10BA02 september 4
12953451 Patient HAI0007418 2010-06-04 A10BA02 june 1
15786889 Patient HAI0007418 2010-07-28 A10BB09 july 1
15787103 Patient HAI0007418 2010-07-12 A10BB09 july 2
15787233 Patient HAI0007418 2010-07-05 A10BA02 july 3
15878776 Patient HAI0007418 2010-07-08 A10BB09 july 4
15908690 Patient HAI0007418 2010-07-23 A10BB09 july 5
17363576 Patient HAI0007418 2010-08-20 A10BB09 august 1
17554737 Patient HAI0007418 2010-08-13 A10BB09 august 2事先尝试
df$obs<-with(df, ave(month, hai_dispense_number, FUN=seq_along)) ##doesn't split by month
df$obs<-with(df, ave(month, hai_dispense_number, FUN=cumsum)) ##gives all NA values, think seq_along is actually what I want
df$obs <- ave(df$month, df$month, FUN=seq_along) ##this is better than the previous two, but doesn't seem to split by person
ddply(df,~month,summarise,mean=mean(obs)) ##this works absolutely fine, just need to counts right first!会重视任何人能给我的任何投入。看上去我好像出了什么根本问题。
发布于 2014-01-29 12:58:49
好吧,我已经把你的数据删减为:
> head(df)
patient month
9972511 HAI0002664 april
11376245 HAI0002664 may
12508505 HAI0002664 may
13480611 HAI0002664 june
13486327 HAI0002664 june
13567944 HAI0002664 june这就是我们所需要的,因为我们只需要几个月的病人识别资料。要获得所需的新列,请尝试如下:
library(plyr)
> ddply(df, .(patient, month), mutate, obs = 1:length(month))
patient month obs
1 HAI0002664 april 1
2 HAI0002664 august 1
3 HAI0002664 august 2
4 HAI0002664 august 3
5 HAI0002664 august 4
6 HAI0002664 july 1
7 HAI0002664 july 2
8 HAI0002664 june 1
9 HAI0002664 june 2
10 HAI0002664 june 3
11 HAI0002664 may 1
12 HAI0002664 may 2
13 HAI0002664 september 1
14 HAI0002664 september 2
15 HAI0002664 september 3
16 HAI0006637 september 1
17 HAI0007418 august 1
18 HAI0007418 august 2
19 HAI0007418 july 1
20 HAI0007418 july 2
21 HAI0007418 july 3
22 HAI0007418 july 4
23 HAI0007418 july 5
24 HAI0007418 june 1顺便说一句,我假设在您的示例输出中,9月份的obs =4是一种类型,因为患者标识符已经从前三个(2664到6637)更改了。
https://stackoverflow.com/questions/21431321
复制相似问题