我有一个数据框架:
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 1 RESIDENTIAL RESIDENTIAL
1 2 RESIDENTIAL COMMERCIAL
1 3 RESIDENTIAL INDUSTRIAL
1 4 RESIDENTIAL RESIDENTIAL
2 1 COMMERCIAL RESIDENTIAL
2 2 COMMERCIAL COMMERCIAL
2 3 COMMERCIAL COMMERCIAL
3 1 INDUSTRIAL INDUSTRIAL
3 2 INDUSTRIAL COMMERCIAL
4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL
4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL
4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL
5 2 COMMERCIAL / RESIDENTIAL COMMERCIAL
5 3 COMMERCIAL / RESIDENTIAL INDUSTRIAL
5 4 COMMERCIAL / RESIDENTIAL COMMERCIAL一个属性可以有多个单元。这意味着单位是属性的子类别。我想过滤Prop_Usage与Unit_Usage不匹配的行。我们在Prop_Usage列中有一个类别是RESIDENTIAL - COMMERCIAL,那么Unit_Usage可以是RESIDENTIAL或COMMERCIAL。COMMERCIAL / RESIDENTIAL也是如此。
预期输出:
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 2 RESIDENTIAL COMMERCIAL
1 3 RESIDENTIAL INDUSTRIAL
2 1 COMMERCIAL RESIDENTIAL
3 2 INDUSTRIAL COMMERCIAL
4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
5 3 COMMERCIAL / RESIDENTIAL INDUSTRIAL发布于 2020-01-11 16:25:34
在DataFrame.apply中使用in语句
df = df[~df.apply(lambda x: x['Unit_Usage'] in x['Prop_Usage'], axis=1)]或者在列表理解中使用zip:
df = df[[not a in b for a, b in zip(df['Unit_Usage'], df['Prop_Usage'])]]print (df)
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 1 2 RESIDENTIAL COMMERCIAL
2 1 3 RESIDENTIAL INDUSTRIAL
4 2 1 COMMERCIAL RESIDENTIAL
8 3 2 INDUSTRIAL COMMERCIAL
11 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
14 5 3 COMMERCIAL / RESIDENTIAL INDUSTRIALhttps://stackoverflow.com/questions/59692738
复制相似问题