我当前的df:
clinical # date collected name result submitter
123 3/2/2020 flu a negative hospital
123 3/2/2020 flu b positive hospital
123 3/2/2020 flu c positive hospital
123 3/2/2020 flu d negative hospital
567 7/7/1945 flu a negative hospital
567 7/7/1945 flu b negative hospital
567 7/7/1945 flu c positive hospital
567 7/7/1945 flu d negative hospital
989 8/8/1988 flu a negative hospice
989 8/8/1988 flu b negative hospice
989 8/8/1988 flu c negative hospice
989 8/8/1988 flu d negative hospice
989 8/8/1988 flu e negative hospice
989 8/8/1988 flu f negative hospice我的df有数千行,行数总是在变化。每个人在第一列中用一个数字表示,例如: Jane用123表示。简做了甲型流感、乙型流感、丙型流感和丁型流感的检测。我要把简的信息压缩成一行。我需要在行之间变化的变量,即"name“和"result”。所有其他信息都是恒定的,可以删除。一些患者被检测出更多的值,比如989号患者,他做了6次流感测试,而不是像简那样做了4次。同样,同样的过程也需要发生。像流感类型和伴随的测试结果这样的唯一值将被移到同一行中。
理想的数据帧应该是这样的:
12 3/2/2020 hospital flu a - flu b + flu c - flu d -
567 7/7/1977 hospital flu a + flu b + flu c - flu d -
989 8/8/1988 hospital flu a - flu b + flu c - flu d - flu e + flu f + 也许有一种更好的方法可以做到这一点--比如用钥匙或字典?我非常感谢任何可行的解决方案。
提前感谢您的建议:)
发布于 2020-10-24 08:14:10
尝试一下,使用map创建一个连接的结果文本字段,将单词转换为正负符号,然后使用join的agg函数执行groupby
df['restxt'] = (df['collected'] + ' ' +
df['name'] + ' ' +
df['result'].map({'negative':'-', 'positive':'+'}))
df.groupby(['clinical #', 'date', 'submitter'], as_index=False)['restxt'].agg(' '.join)输出:
clinical # date submitter restxt
0 123 3/2/2020 hospital flu a - flu b + flu c + flu d -
1 567 7/7/1945 hospital flu a - flu b - flu c + flu d -
2 989 8/8/1988 hospice flu a - flu b - flu c - flu d - flu e - flu f -https://stackoverflow.com/questions/64508654
复制相似问题