我对python非常陌生,遇到了以下问题。
我有两个数据处理程序,第一个看起来是这样的:
df1
code product
10-00 apple
10-10 banana
10-20 grape
10-00 cucumber
20-00 tomato
20-10 onion
20-10 garlic第二个看起来是:
df2
code colour
10-00 green
10-10 yellow
10-20 purple
20-00 red
20-10 white我希望有一个循环,它将给出以下数据
df
10-00 apple green
10-10 banana yellow
10-20 grape purple
10-00 cucumber green
20-00 tomato red
20-10 onion white
20-10 garlic white但我真的想不出从哪里开始..。有谁有过这样的问题吗?
发布于 2018-02-28 09:52:58
试试这个(pd.DataFrame.merge)
df = pd.merge(df1,df2,on=['code'],how='left')示例:
import pandas as pd
df1 = pd.DataFrame({
'code': ['10-00','10-10'],
'product': ['apple','banana']
})
df2 = pd.DataFrame({
'code': ['10-00','10-10'],
'colour': ['green','yellow']
})
df = pd.merge(df1,df2,on=['code'],how='left')
print(df)返回:
code product colour
0 10-00 apple green
1 10-10 banana yellow发布于 2018-02-28 09:56:54
不要使用循环,用df2列索引code数据帧,使用简单的赋值!
>>> df1.set_index('code', inplace=True)
>>> df2.set_index('code',inplace=True)
>>> df1
product
code
10-00 apple
10-10 banana
10-20 grape
10-00 cucumber
20-00 tomato
20-10 onion
20-10 garlic
>>> df2
colour
code
10-00 green
10-10 yellow
10-20 purple
20-00 red
20-10 white然后简单地说:
>>> df1['colour'] = df2['colour']
>>> df1
product colour
code
10-00 apple green
10-10 banana yellow
10-20 grape purple
10-00 cucumber green
20-00 tomato red
20-10 onion white
20-10 garlic white如果您不想用代码索引df1 (它会给您提供一个重复的索引),那么您可以始终使用:
>>> df1['colour'] = df2.loc[df1['code']].values
>>> df1
code product colour
0 10-00 apple green
1 10-10 banana yellow
2 10-20 grape purple
3 10-00 cucumber green
4 20-00 tomato red
5 20-10 onion white
6 20-10 garlic white只要df2被'code'索引
发布于 2018-02-28 10:07:19
这在set_index和join中是可能的。
df1.set_index('code').join(df2.set_index('code')).reset_index()结果
code product colour
0 10-00 apple green
1 10-00 cucumber green
2 10-10 banana yellow
3 10-20 grape purple
4 20-00 tomato red
5 20-10 onion white
6 20-10 garlic white解释
set_index('code')同时应用于df1和df2,这样我们以后就可以使用它加入。join作为索引上的“左联接”应用。reset_index应用于结果,以便检索具有所需列的数据。https://stackoverflow.com/questions/49026520
复制相似问题