我想对我收到的数据做些清理。
守则如下:
import pandas as pd
def cleanup(df: pd.DataFrame) -> pd.DataFrame:
# Remove entries from the IT dept
mask = (df['dept'] != 'IT')
df = df[mask]
# Rename the dept from marketing to comms for the remaining rows
mask = df['dept'] == 'marketing'
df.loc[mask, 'dept'] = "comms"
# The warning occurs here...
# Rename the dept from accounting to finance for the remaining rows
mask = df['dept'] == 'accounting'
df.loc[mask, 'dept'] = 'finance'
return df
data = [[1,"marketing"],[2,"accounting"],[3,"marketing"],[4,"IT"],[5,"IT"],[6,"board"]]
df = pd.DataFrame(data, columns = ['id', 'dept'])
df=cleanup(df)我收到以下警告:
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py:480: SettingWithCopyWarning:一个值试图在来自DataFrame的片的副本上设置。尝试使用.locrow_indexer,col_indexer = value代替 请参阅文档中的注意事项:guide/indexing.html#returning-a-view-versus-a-copy self.objitem =s
我有点担心这个警告,因为返回的数据是正确的,而且文档似乎不适用于这种情况。
我的密码有什么问题吗?或者我能安全地忽略这个警告吗?
发布于 2019-09-27 12:50:27
错误说出来了。您的df是df=df[mask]的另一个框架的一部分。尝试更新原始帧,而不是片:
def cleanup(df: pd.DataFrame) -> pd.DataFrame:
# Remove entries from the IT dept
mask1 = (df['dept'] != 'IT')
# Rename the dept from marketing to comms for the remaining rows
mask2 = df['dept'] == 'marketing'
df.loc[mask1 & mask2, 'dept'] = "comms"
# The warning occurs here...
# Rename the dept from accounting to finance for the remaining rows
mask2 = df['dept'] == 'accounting'
df.loc[mask1&mask2, 'dept'] = 'finance'
return df
data = [[1,"marketing"],[2,"accounting"],[3,"marketing"],[4,"IT"],[5,"IT"],[6,"board"]]
df = pd.DataFrame(data, columns = ['id', 'dept'])
df=cleanup(df)修改后的函数返回新的df ,其中 IT值在dept中。实际上,如果您不想要这些记录,您可以复制并更新:
def cleanup(df: pd.DataFrame) -> pd.DataFrame:
# Remove entries from the IT dept
mask = (df['dept'] != 'IT')
# we copy the data frame here so it's no longer a slice
df = df[mask].copy()
# Rename the dept from marketing to comms for the remaining rows
mask = df['dept'] == 'marketing'
df.loc[mask, 'dept'] = "comms"
# The warning occurs here...
# Rename the dept from accounting to finance for the remaining rows
mask = df['dept'] == 'accounting'
df.loc[mask, 'dept'] = 'finance'
return dfhttps://stackoverflow.com/questions/58134639
复制相似问题