我有一个包含以下列的数据格式。
CUI id term id term_name
C0000729 10000057 MDR LLT Abdominal cramps
C0000729 10000056 MDR LLT Abdominal cramp
C0000729 10011286 MDR LLT Cramp abdominal
C0000729 10000058 MDR LLT Abdominal crampy pains
C0000729 10093764 ICD10 PT Abdominal crampy pains
C0000800 10093765 ICD10 PT Abdominal pain
C0000800 10000058 MDR LLT Abdominal crampy pains
C0000800 10093764 ICD10AM PT Abdominal crampy pains
C0000730 10000052 MDR LLT Abdominal cramps back如果术语是'MDR‘和和ICDs (ICD10,ICD10AM),我想获取这些行(每个CUI)。但是,如果它只是MDR,排除它(例如。最后一行'C0000730')。
预期产出如下:
CUI id term id term_name
C0000729 10000057 MDR LLT Abdominal cramps
C0000729 10000056 MDR LLT Abdominal cramp
C0000729 10011286 MDR LLT Cramp abdominal
C0000729 10000058 MDR LLT Abdominal crampy pains
C0000729 10093764 ICD10 PT Abdominal crampy pains
C0000800 10093765 ICD10 PT Abdominal pain
C0000800 10000058 MDR LLT Abdominal crampy pains
C0000800 10093764 ICD10AM PT Abdominal crampy pains我正在使用下面的代码行使用上面的数据。
#select only those mappings where ICD and MeDRA both exists for a particular CUI id
s = set(['ICD10','ICD10CM','ICD10AM','MDR'])
dff_mapped = df_umls[df_umls.groupby('CUI')['SAB'].transform(lambda x: set(x) == s)]
dff_mapped = dff_mapped.sort_values(['CUI', 'SAB'],ascending = [True, True])
dff_mapped.to_csv('df_mapped', index = False, sep = ',')任何帮助都是非常感谢的。
发布于 2022-11-18 14:29:06
如果我理解得对:
nonMDR = df[df['term id'].str.startswith('ICD')] # creates a new df with ICDs
term_ids = nonMDR['CUI'].unique() # create an array of unique CUIs
df[df['CUI'].isin(term_ids)] # filter CUIshttps://stackoverflow.com/questions/74490903
复制相似问题