假设我们有如下所示的df1:
x1 = [{'partner': "Afghanistan", 'commodity': NaN},
{'partner': "Zambia", 'commodity': 2},
{'partner': "Germany", 'commodity': 2},
{'partner': "Afghanistan", 'commodity': NaN},
{'partner': "Canada", 'commodity': NaN},
{'partner': "Italy", 'commodity': 3},
{'partner': "Canada", 'commodity': NaN},
{'partner': "USA", 'commodity': NaN}]
df1 = pd.DataFrame(x1)我想要做的是在partner中看到在commodity中有NaN值的值列表,但是我不想让相同的partner列出两次。
所以我想要的结果如下:
commodity_nan_partners=
Afghanistan
Canada
USA而不是:
Afghanistan
Afghanistan
Canada
Canada
USA发布于 2018-11-25 02:25:44
loc + isnull + drop_duplicates
您可以筛选您的系列,然后删除重复项:
res = df1.loc[df1['commodity'].isnull(), 'partner'].drop_duplicates()
print(res)
0 Afghanistan
4 Canada
7 USA
Name: partner, dtype: object发布于 2018-11-25 02:24:00
您可以使用NaN查找isnull值,然后使用unique或set获取唯一值。
>>> pd.Series(df1.loc[df1.commodity.isnull(),'partner'].unique())
0 Afghanistan
1 Canada
2 USA
dtype: object
# or
>>> pd.Series(list(set(df1.loc[df1.commodity.isnull(),'partner'])))
0 Canada
1 Afghanistan
2 USA
dtype: object发布于 2018-11-25 02:34:59
步骤1
筛选出仅保留有效字符串:
v = df1.loc[df1.commodity.isna(), 'partner']或,
v = df1.partner[df1.commodity.isna()]
print(v)
0 Afghanistan
3 Afghanistan
4 Canada
6 Canada
7 USA
Name: partner, dtype: object步骤2
放下复印机。
如果你想要收藏,
ingredients.unique()
array(['Afghanistan', 'Canada', 'USA'], dtype=object)或,
set(ingredients)
{'Afghanistan', 'Canada', 'USA'}如果你想要系列赛,
ser = ingredients.drop_duplicates().reset_index(drop=True)
0 Afghanistan
1 Canada
2 USA
Name: partner, dtype: object如果你想要DataFrame,
df = ser.to_frame()https://stackoverflow.com/questions/53464129
复制相似问题