比较2 df的df1有重复的1000+ _id的唯一行,但df2只有唯一的唯一id。我想要比较df1中的每一行,这样unique_id就在df2中,如果匹配,也可以从df1到df2比较相同的类别和子类别。输出:如果其中任何一个不匹配,则将该索引提取到一个数组中。
import pandas as pd
import numpy as np
data1 = {'unique_id':
['Computer','iPhone','Printer','Desktop','Computer','iPhone','iphpne','Printer','Desktop','Computer','iPhone','Printer','Desktop'],
'category':
['movies','documentary','series','special','movies','documentary','series','special','series','special','movies','series','special'],
'subcategory':
['drama','horror','comedy','reality','drama','documentary','comedy','reality','documentary','comedy','documentary','comedy','drama']
}
df1 = pd.DataFrame(data1,columns= ['unique_id', 'category','subcategory'])
data2 = {'unique_id': ['Computer','iPhone','Printer','Desktop'],
'category': ['movies','documentary','series','special'],
'subcategory':['drama','horror','comedy','reality']
}
df2 = pd.DataFrame(data2,columns= ['unique_id', 'category','subcategory'])发布于 2020-12-15 14:24:18
IIUC,这就是你需要的
pd.concat([df1,df2]).drop_duplicates(keep=False)打印:
unique_id category subcategory
5 iPhone documentary documentary
6 iphpne series comedy
7 Printer special reality
8 Desktop series documentary
9 Computer special comedy
10 iPhone movies documentary
12 Desktop special drama获取索引的
pd.concat([df1,df2]).drop_duplicates(keep=False).index打印:
Int64Index([5, 6, 7, 8, 9, 10, 12], dtype='int64')https://stackoverflow.com/questions/65300087
复制相似问题