这主要是一个逻辑问题。
我在试图找出一群人服用药物的模式。我的第一步是找到“连续使用”4种药物的人。我把连续使用定义为4种药物在第4种药物初始处方后的重复使用。
对于一些人,他们可能是连续使用4种药物后,启动了他们的第四名医生。我发现第四种药物(我感兴趣的第四种药物是B,Q,S和T),然后我想看看这个人是否继续服用A+C+D+4th药物模式中的4种药物。这是我如何为4种药物(迷你数据集下面,标签4个吸毒者);
bys id: gen interest=0
by id: replace interest =1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& con_4_4 ==1 & count==4
by id: egen interest4=max(interest) //notes: this variable tells me if the person has a 4th drug of interest to me; drug B, Q, S or T
gen acd_4_1=0
by id: replace acd_4_1 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==1
gen acd_4_2=0
by id: replace acd_4_2 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==2
gen acd_4_3=0
by id: replace acd_4_3 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==3
by id: egen acd_4_11 =max(acd_4_1)
by id: egen acd_4_22 =max(acd_4_2)
by id: egen acd_4_33 =max(acd_4_3)
gen acd_4=1 if acd_4_11 ==1 & acd_4_22 ==1 & acd_4_33 ==1 & interest4==1 //acd_4 is a variable indicating whether people had the desired pattern after initiating their 4th agent
*notes:
*kate has acd_4 = . because she used a prohibited drug "Q" and also her 4th drug was not of interest to us (was "A" as opposed to T, Q, S or B)
*mark has acd_4==1 because he used the correct pattern A+C+D after the prescription of his 4th drug which was S (count=4, date 5th October 2000)现在,它变得更棘手了。其他人,他们可能正在切换药物,或停用,可能没有连续使用4种药物,直到他们的第5或第6医疗。例如,只有在第五天之后,他们才有重复处方的A+C+D和med 5,在这种情况下,这是我们感兴趣的药物(再次,它将是B,Q,S或T)。
如果他们有另一种药物B,Q,S,T,除了他们感兴趣的药物和感兴趣的模式,那么我想指出这一点,因为我想从进一步的考虑中排除那个人的模式。例如,我想要med5+A+C+D而不是med5+A+C+D+S。
我想出了一种方法(下面是迷你数据集,标签为“5名吸毒者”),但我的代码很笨重,在我的大数据集上要花很长时间。有人能给我一些建议吗? 1)改进我的逻辑,2)改进我的编码,或者3)两者都改进!
gen interest5=0
bys id: replace interest5 =1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& con_5_5 ==1 & count==5
by id: egen interest55 = max(interest5)
drop interest5
ren interest55 interest5
by id: gen A5=1 if (agent_type =="A") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen AA55=max(A5)
drop A5
by id: gen C5=1 if (agent_type =="C") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen C55=max(C5)
drop C5
by id: gen D5=1 if (agent_type =="D") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen D55=max(D5)
drop D5
by id: gen acd_5=1 if (AA55==1 & C55==1 & D55==1) & interest5==1
*make sure patient isn't taking any of the other comparator agents
by id: gen prohib=1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1 & count!=5 //here the count!=5 code indicates that I want stata to flag if the patient is taking any of the comparator agents, not inclusive ofthe compartor agent of interest, in this case the comparator agent is count==5
by id: egen prohib55=max(prohib)
by id: gen pattern=1 if acd_5 ==1 & prohib55 !=1
*notes:
*mary has pattern = . because she used a prohibited drug "B" after the prescription of her 4th agent (here count=5, agent_type "T", starting on 29th July 05)
*Pat has pattern=1 because he used A+C+D after his 4th agent (here count=5, agent-type==B, starting on 28th Jan 09)
*Sue has pattern=. because she used a prohibited drug "T" after the precription of her 4th agent (here count=5, agenttype==B, startig on 25th Feb 2011) 数据集
4吸毒者
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id int rx_date str1 agent_type byte count int fourth_full byte con_4_4 int fourth_con_full
"kate" 16728 "Q" 1 . 1 16733
"kate" 16728 "C" 3 . 1 16733
"kate" 16733 "A" 4 16733 1 16733
"kate" 16758 "B" 2 16733 1 16733
"kate" 16758 "Q" 1 16733 1 16733
"kate" 16758 "C" 3 16733 1 16733
"kate" 16762 "A" 4 16733 1 16733
"kate" 16784 "C" 3 16733 1 16733
"kate" 16784 "A" 4 16733 1 16733
"kate" 16784 "Q" 1 16733 1 16733
"kate" 16784 "B" 2 16733 1 16733
"kate" 16812 "Q" 1 16733 1 16733
"kate" 16812 "B" 2 16733 1 16733
"kate" 16812 "A" 4 16733 1 16733
"kate" 16812 "C" 3 16733 1 16733
"kate" 16841 "Q" 1 16733 1 16733
"kate" 16841 "C" 3 16733 1 16733
"kate" 16841 "B" 2 16733 1 16733
"mark" 14874 "C" 2 . 1 14888
"mark" 14874 "A" 1 . 1 14888
"mark" 14888 "S" 4 14888 1 14888
"mark" 14888 "D" 3 14888 1 14888
"mark" 14930 "S" 4 14888 1 14888
"mark" 14930 "C" 2 14888 1 14888
"mark" 14930 "A" 1 14888 1 14888
"mark" 14930 "D" 3 14888 1 14888
"mark" 14965 "S" 4 14888 1 14888
"mark" 14965 "A" 1 14888 1 14888
"mark" 14965 "D" 3 14888 1 14888
"mark" 14965 "C" 2 14888 1 14888
"mark" 15028 "S" 4 14888 1 14888
"mark" 15028 "C" 2 14888 1 14888
"mark" 15028 "A" 1 14888 1 14888
"mark" 15028 "D" 3 14888 1 14888
"mark" 15097 "C" 2 14888 1 14888
"mark" 15097 "A" 1 14888 1 14888
"mark" 15097 "D" 3 14888 1 14888
"mark" 15097 "S" 4 14888 1 14888
end
format %tddd-Mon-YY rx_date
format %tddd-Mon-YY fourth_full
format %tddd-Mon-YY fourth_con_full5名吸毒者
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id int rx_date str1 agent_type byte count int fifth_full byte con_5_5 int fifth_con_full
"pat" 17910 "D" 1 . 1 17925
"pat" 17910 "A" 4 . 1 17925
"pat" 17910 "C" 2 . 1 17925
"pat" 17925 "B" 5 17925 1 17925
"pat" 17948 "B" 5 17925 1 17925
"pat" 17969 "C" 2 17925 1 17925
"pat" 17969 "B" 5 17925 1 17925
"pat" 17969 "D" 1 17925 1 17925
"pat" 17969 "A" 4 17925 1 17925
"pat" 18028 "D" 1 17925 1 17925
"pat" 18028 "B" 5 17925 1 17925
"pat" 18028 "C" 2 17925 1 17925
"pat" 18028 "A" 4 17925 1 17925
"pat" 18081 "D" 1 17925 1 17925
"pat" 18081 "C" 2 17925 1 17925
"mary" 16618 "C" 2 . 1 16646
"mary" 16618 "D" 3 . 1 16646
"mary" 16618 "B" 1 . 1 16646
"mary" 16646 "T" 5 16646 1 16646
"mary" 16679 "A" 4 16646 1 16646
"mary" 16679 "C" 2 16646 1 16646
"mary" 16679 "D" 3 16646 1 16646
"mary" 16679 "B" 1 16646 1 16646
"mary" 16681 "T" 5 16646 1 16646
"mary" 16737 "D" 3 16646 1 16646
"mary" 16737 "B" 1 16646 1 16646
"mary" 16737 "A" 4 16646 1 16646
"sue" 18676 "D" 3 . 1 18683
"sue" 18676 "C" 2 . 1 18683
"sue" 18676 "T" 4 . 1 18683
"sue" 18683 "B" 5 18683 1 18683
"sue" 18729 "C" 2 18683 1 18683
"sue" 18729 "B" 5 18683 1 18683
"sue" 18729 "T" 4 18683 1 18683
"sue" 18729 "D" 3 18683 1 18683
"sue" 18730 "C" 2 18683 1 18683
"sue" 18779 "C" 2 18683 1 18683
"sue" 18779 "T" 4 18683 1 18683
"sue" 18779 "D" 3 18683 1 18683
"sue" 18826 "A" 1 18683 1 18683
"sue" 18834 "C" 2 18683 1 18683
"sue" 18834 "T" 4 18683 1 18683
"sue" 18834 "D" 3 18683 1 18683
"sue" 18889 "D" 3 18683 1 18683
end
format %tddd-Mon-YY rx_date
format %tddd-Mon-YY fifth_full
format %tddd-Mon-YY fifth_con_full发布于 2017-11-08 18:38:42
这不是一个答案,但它不能很好地符合一个评论。您的代码是清楚的,但可以压缩。例如,可以将第一个块简化为
gen interest = inlist(agent_type, "T", "Q", "S", "B") & con_4_4 ==1 & count==4
bysort id: egen interest4 = max(interest)
gen acd_4_1 = inlist(agent_type, "A", "C", "D") & count==1
gen acd_4_2 = inlist(agent_type, "A", "C", "D") & count==2
gen acd_4_3 = inlist(agent_type, "A", "C", "D") & count==3
by id: egen acd_4_11 = max(acd_4_1)
by id: egen acd_4_22 = max(acd_4_2)
by id: egen acd_4_33 = max(acd_4_3)
gen acd_4= 1 if acd_4_11 ==1 & acd_4_22 ==1 & acd_4_33 ==1 & interest4==1 从13条降到9条。
只是表面上的,但最重要的是,你想要你真正的问题是明确的,并得到回答。
其中的小技巧包括
by:对结果没有任何影响时,省略它。generate和replace对沸腾,生成0,1个变量为一条语句。inlist()简洁地捕获替代方案。更简短地重写这个问题将使你更有可能尝试真正的问题。
https://stackoverflow.com/questions/47186568
复制相似问题