如果某个字符串不在列表中,我希望删除dataframe中的一行,然后将该行的值拆分到其他行中。
但我很想知道怎么做。
例如:
# list for categories
print(list_cat)
['FREIGHT CHARGES', 'PETROL', 'ADDITIONAL STOP', 'OTHERS', 'TOLL', 'DANGEROUS GOODS']test = df_pivot.query('LOAD_ID_VR == "19082022121669VR00158480"')
test.head()
LOAD_ID_VR Status Status_2 Claim Amount Indicator Value categories
6 19082022121669VR00158480 To correct pass NaN Adjustment Freight Charges -625.00 FREIGHT CHARGES
3200 19082022121669VR00158480 To correct pass NaN Ajustment Petrol Diff -41.45 PETROL
6394 19082022121669VR00158480 To correct pass NaN Ajustment Manual Rate Diff 333.23 MANUAL RATE列表中没有'MANUAL RATE',因此,我试图达到的解决方案是用'MANUAL RATE'删除行,并将333.23拆分到其他2行中。
最后得到的结果是:
test = df_pivot.query('LOAD_ID_VR == "19082022121669VR00158480"')
test.head()
LOAD_ID_VR Status Status_2 Claim Amount Indicator Value categories
6 19082022121669VR00158480 To correct pass NaN Adjustment Freight Charges -458.335 FREIGHT CHARGES
3200 19082022121669VR00158480 To correct pass NaN Ajustment Petrol Diff 125.215 PETROL对于其他键和类别,这个解决方案应该是自动化的,因为这是一个很大的数据格式。有可能这样做吗?
发布于 2022-09-27 15:50:50
我首先添加所有“非分布式”,然后评估可以分发多少个“可分发”,然后再做剩下的(您肯定可以用更少的代码来完成这个任务-我只是想使它尽可能容易理解):
import pandas as pd
list_classes_remove = ["M"]
df = pd.DataFrame(
data={
"CATEGORY": ["F", "P", "M", "F", "P", "P", "M"],
"ID": ["1", "1", "1", "2", "2", "3", "3"],
"VALUE": [-625, -41.45, 333.23, 50, 100, -40, 10],
}
)
df_distribute = (
df.loc[lambda x: x["CATEGORY"].isin(list_classes_remove)]
.loc[:, ["ID", "VALUE"]]
.groupby("ID")
.sum()
.reset_index(drop=False)
.rename(columns={"VALUE": "VALUE_DISTRIBUTE"})
)
df_no_distributables = df.loc[lambda x: ~(x["CATEGORY"].isin(list_classes_remove))]
df_factor = (
df_no_distributables.loc[:, ["ID", "VALUE"]]
.groupby("ID")
.count()
.reset_index(drop=False)
.rename(columns={"VALUE": "COUNT"})
)
df_no_distributables = df_no_distributables.merge(
right=df_distribute, left_on="ID", right_on="ID"
).merge(right=df_factor, left_on="ID", right_on="ID")
df_no_distributables["VALUE"] = df_no_distributables.apply(
lambda x: x["VALUE"] + x["VALUE_DISTRIBUTE"] / x["COUNT"]
if x["COUNT"]
else x["VALUE"],
axis=1,
)https://stackoverflow.com/questions/73866768
复制相似问题