我有两个数据帧:
df1 (sample, has more columns):
+---+----------------+--------------+-----------+
| | Region | Placement ID | Units |
+---+----------------+--------------+-----------+
| 0 | Western Europe | 1.10872E+13 | 367628.76 |
| 1 | Western Europe | 1.10872E+13 | 367628.76 |
| 2 | Western Europe | 1.10872E+13 | 74604.63 |
+---+----------------+--------------+-----------+
df2 (sample, has more columns:
+-----------+----------------+--------------+
| Creatives | Publisher Name | Placement ID |
+-----------+----------------+--------------+
| Temenos | Quantcast | 1.10872E+13 |
| Temenos | Quantcast | 1.10872E+13 |
| Temenos | Quantcast | 1.10872E+13 |
+-----------+----------------+--------------+我想做的是根据Placement ID在dataframe 2中添加一个额外的列,其中包含dataframe 1的索引列。
数据框1或2中的某些放置Id字段可能为空,或具有错误的值。如果不匹配或发现错误,则我想添加一个缺少或错误的值,如N/A、Missing或留空
发布于 2016-11-09 18:34:24
IIUC您需要merge,但存在重复项的问题,因此首先使用drop_duplicates删除它们,然后选择一列用于添加,另一列用于join (Placement ID):
print (pd.merge(df2,
df1.drop_duplicates('Placement ID')[['Units', 'Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID Units
0 Temenos Quantcast 1.108720e+13 367628.76
1 Temenos Quantcast 1.108720e+13 367628.76
2 Temenos Quantcast 1.108720e+13 367628.76如果需要添加索引,则需要reset_index
print (pd.merge(df2,
df1.drop_duplicates('Placement ID')
.reset_index()[['level_0','Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID level_0
0 Temenos Quantcast 1.108720e+13 0
1 Temenos Quantcast 1.108720e+13 0
2 Temenos Quantcast 1.108720e+13 0需要删除重复项,因为通过连接键merge多行-在df2中有3个相同的值1.108720e+13,在df1中有3行,因此获得3x3行,如下所示:
print (pd.merge(df2,
df1.reset_index()[['level_0', 'Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID level_0
0 Temenos Quantcast 1.108720e+13 0
1 Temenos Quantcast 1.108720e+13 1
2 Temenos Quantcast 1.108720e+13 2
3 Temenos Quantcast 1.108720e+13 0
4 Temenos Quantcast 1.108720e+13 1
5 Temenos Quantcast 1.108720e+13 2
6 Temenos Quantcast 1.108720e+13 0
7 Temenos Quantcast 1.108720e+13 1
8 Temenos Quantcast 1.108720e+13 2https://stackoverflow.com/questions/40504724
复制相似问题