我有这两个与相同数据相关的数据帧。其中一个包含所有数据,如下所示:
Person ID word rt accuracy emotional_w
0 CHOQUE 1353 C True
0 SILLA 434 C False
0 BRAZO 480 C False
0 LLUVIA 1091 C False
1 SOLEDAD 637 C True
1 INFIERNO 437 I True
1 MOMENTO 754 C False另一个包含平均值rt和标准差rt,然后是我感兴趣的数字'desvios_mayores‘和'desvios_menores’:
Person ID rt_stdev rt_mean desvios_mayores desvios_menores
0 0 311.200383049439 655.975609756098 1278.37637585498 33.5748436572201
1 1 280.592497402182 971.416666666667 1532.60166147103 410.231671862303
2 2 325.848282375085 928.630952380953 1580.32751713112 276.934387630783我需要检查该人在每个单词中的rt是否大于desvios_mayores或小于devios_menores,如果是,则将该数字替换为他们的rt_mean。
到目前为止,我写了这篇文章,但它引发了错误"ValueError:只能比较具有相同标签的系列对象“:
if df_outliers_total['Person ID'] == df['Person ID']:
if df['rt'] > df_outliers_total['desvios_mayores']:
df_outliers_total['rt_mean']
elif df['rt'] < df_outliers_total['desvios_menores']:
df_outliers_total['rt_mean']实现这一目标的更好方法是什么?谢谢。
发布于 2021-10-08 13:22:49
For compare values是必需的,左连接DataFrame.merge,然后在Series.mask中设置新值,通过|链接两个掩码以实现逐位OR
df1 = df.merge(df_outliers_total, on='Person ID', how='left')
m = (df1['rt'] > df1['desvios_mayores']) | (df1['rt'] < df1['desvios_menores'])
df1['rt'] = df1['rt'].mask(m, df1['rt_mean'])
#for original columns names
df1 = df1.reindex(df.columns, axis=1)
print (df1)
Person ID word rt accuracy emotional_w
0 0 CHOQUE 655.97561 C True
1 0 SILLA 434.00000 C False
2 0 BRAZO 480.00000 C False
3 0 LLUVIA 1091.00000 C False
4 1 SOLEDAD 637.00000 C True
5 1 INFIERNO 437.00000 I True
6 1 MOMENTO 754.00000 C False发布于 2021-10-08 13:45:13
这里有另一种方法来实现这一点:
original_cols = df1.columns
df1 = df1.merge(df_outliers_total, on="Person ID", how="left")
df1['rt'] = df1.apply(lambda x: x['rt_mean'] if (x['rt'] > x['desvios_mayores'] or x['rt'] < x['desvios_menores']) else x['rt'], axis=1)
print(df1[original_cols])
Person ID word rt accuracy emotional_w
0 0 CHOQUE 655.97561 C True
1 0 SILLA 434.00000 C False
2 0 BRAZO 480.00000 C False
3 0 LLUVIA 1091.00000 C False
4 1 SOLEDAD 637.00000 C True
5 1 INFIERNO 437.00000 I True
6 1 MOMENTO 754.00000 C Falsehttps://stackoverflow.com/questions/69496470
复制相似问题