我试图通过只考虑某个子集来使用Pandas.drop_duplicates(),但得到了错误的KeyError: Index(['days'], dtype='object')
索引如下:id, event_description, attribute1, attribute 2, attribute 3, days, days_supply, days_equivalent
我想忽略属性2和属性3,所以我运行了以下命令
df = df.drop_duplicates(subset=['id', 'event_description', 'attribute1', 'days', 'days_supply', 'days_equivalent'])它返回:
eyError Traceback (most recent call last)
<ipython-input-4-3f7da32b380f> in <module>
7
8 df = df.drop_duplicates(subset=['id', 'event_description', 'attribute1', 'days',
-> 9 'days_supply', 'days_equivalent'])
10
11 print(df)
/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in drop_duplicates(self, subset, keep, inplace)
4892
4893 inplace = validate_bool_kwarg(inplace, "inplace")
-> 4894 duplicated = self.duplicated(subset, keep=keep)
4895
4896 if inplace:
/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in duplicated(self, subset, keep)
4949 diff = Index(subset).difference(self.columns)
4950 if not diff.empty:
-> 4951 raise KeyError(diff)
4952
4953 vals = (col.values for name, col in self.items() if name in subset)
KeyError: Index(['days'], dtype='object')一旦我删除了days,删除重复项就没有问题了,但是我确实需要确保我考虑到了days。这个错误需要我修复什么?
发布于 2019-09-25 06:38:18
必须重新检查列名。Days vs days
发布于 2019-09-25 06:03:52
尝试使用
df.drop_duplicates(subset=['id', 'event_description', 'attribute1', 'days', 'days_supply', 'days_equivalent'],inplace=True)来自:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
尝试使用
也许你的df格式不正确,不管怎样,如果你认为问题与dtype有关,你可以使用应用函数来检查df‘’date‘的整个数据,如下所示:
def checkType(someDate):
##Do verification
return dateCorrected
df['date'] = df['date'].apply(checkType)https://stackoverflow.com/questions/58088435
复制相似问题