我有一个有5列的数据帧,希望根据它们的值(大于0)将其中2列(化疗和手术)转换为行(诊断序列),并将个人id和年龄诊断等信息添加到行中。
这是我的数据框
import pandas as pd
data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df 我试图获得化疗/手术大于0的值,但当我试图将其添加为一行时,它不起作用。
这就是我想要的最终结果。
ID Diagnosis Age at Diagnosis
0 A-1 Birth 0
1 A-1 Lung cancer 25
2 A-1 Chemo 25
3 A-1 Surgery 25
4 A-1 Death 50
5 A-2 Birth 0
6 A-2 Brain cancer 12
7 A-2 Chemo 12
8 A-2 Skin cancer 20
9 A-2 Chemo 20
10 A-2 Surgery 20
11 A-2 Current age 23
12 A-3 Birth 0
13 A-3 Brain cancer 30
14 A-3 Surgery 30
15 A-3 Lung cancer 33
16 A-3 Chemo 33
17 A-3 Current age 35这是我尝试过的方法之一:
chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df发布于 2019-06-05 05:31:34
我们可以熔化两列Chemo和Surgery,然后去掉所有的0和concat:
# melt the two columns
new_df = df[['ID', 'Chemo', 'Surgery']].melt(id_vars='ID',
value_name='Age at Diagnosis',
var_name='Diagnosis')
# filter out the zeros
new_df = new_df[new_df['Age at Diagnosis'].ne('0')]
# concat with the original dataframe, ignoring the extra columns
new_df = pd.concat((df,new_df), sort=False, join='inner')
# sort values
new_df.sort_values(['ID','Age at Diagnosis'])输出:
ID Diagnosis Age at Diagnosis
0 A-1 Birth 0
1 A-1 Lung cancer 25
1 A-1 Chemo 25
12 A-1 Surgery 25
2 A-1 Death 50
3 A-2 Birth 0
4 A-2 Brain cancer 12
4 A-2 Chemo 12
5 A-2 Skin cancer 20
5 A-2 Chemo 20
16 A-2 Surgery 20
6 A-2 Current age 23
7 A-3 Birth 0
8 A-3 Brain cancer 30
19 A-3 Surgery 30
9 A-3 Lung cancer 33
9 A-3 Chemo 33
10 A-3 Current age 35发布于 2019-06-05 05:04:35
这种尝试相当冗长,并且需要几个步骤。我们不能做简单的透视表或索引/列堆叠,因为我们需要用来自另一列的部分结果来修改一列。这需要拆分和附加。
首先,将你的数据帧转换成我们可以使用的数据类型。
data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
df[["Age at Diagnosis", "Chemo", "Surgery"]] = df[["Age at Diagnosis", "Chemo", "Surgery"]].astype(int)现在我们把它分成几部分。
# I like making a copy or resetting an index so that
# pandas is not operating off a slice
df_chemo = df[df.Chemo > 0].copy()
df_surgery = df[df.Surgery > 0].copy()
# drop columns you don't need
df_chemo.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df_surgery.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df.drop(["Chemo", "Surgery"], axis=1, inplace=True)
# Set Chemo and Surgery Diagnosis
df_chemo.Diagnosis = "Chemo"
df_surgery.Diagnosis = "Surgery"然后将所有内容添加到一起。您可以这样做,因为列维度匹配。
df_new = df.append(df_chemo).append(df_surgery)
# make it look pretty
df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)https://stackoverflow.com/questions/56450961
复制相似问题