我需要根据组中的非空值向组应用一个字符串。一个例子是:
ID name surname prsn_id
A john smith prsn_01
A john smith NaN
A john smith NaN
A john smith NaN
B mary jane prsn_02
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
C Barry willis prsn_03
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan产出应是:
ID name surname prsn_id
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03或者:
ID name surname prsn_id prsn_id_2
A john smith prsn_01 NaN
A john smith NaN prsn_01
A john smith NaN prsn_01
A john smith NaN prsn_01
B mary jane prsn_02 NaN
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
C Barry willis prsn_03 NaN
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03我试过:
df['prsn_id_2'] = (df
.groupby(['ID', 'name', 'surname'])['prsn_id']
.fillna(method='ffill'))这可能是可行的,但它需要很长的时间,因此,将不是很实际的前进。我需要另一个解决方案,是矢量化和相对快速的。
发布于 2018-07-24 10:39:40
使用dropna删除NaN的行,然后左加入merge
df1 = df.dropna(subset=['prsn_id'])
#if possible duplicates
#df1 = df.dropna(subset=['prsn_id']).drop_duplicates(['ID','name', 'surname'])
df = df.drop('prsn_id', axis=1).merge(df1, on=['ID','name', 'surname'], how='left')
print (df)
ID name surname prsn_id
0 A john smith prsn_01
1 A john smith prsn_01
2 A john smith prsn_01
3 A john smith prsn_01
4 B mary jane prsn_02
5 B mary jane prsn_02
6 B mary jane prsn_02
7 B mary jane prsn_02
8 B mary jane prsn_02
9 B mary jane prsn_02
10 B mary jane prsn_02
11 C Barry willis prsn_03
12 C Barry willis prsn_03
13 C Barry willis prsn_03
14 C Barry willis prsn_03
15 C Barry willis prsn_03细节
print (df1)
ID name surname prsn_id
0 A john smith prsn_01
4 B mary jane prsn_02
11 C Barry willis prsn_03https://stackoverflow.com/questions/51496429
复制相似问题