**底部编辑**
我有一个库存数据的数据框架,如下所示:
d = {'product': [a, b, a, b, c], 'amount': [1, 2, 3, 5, 2], 'date': [2020-6-6, 2020-6-6, 2020-6-7,
2020-6-7, 2020-6-7]}
df = pd.DataFrame(data=d)
df
product amount date
0 a 1 2020-6-6
1 b 2 2020-6-6
2 a 3 2020-6-7
3 b 5 2020-6-7
4 c 2 2020-6-7我想知道每月的存货差额是多少。输出将如下所示:
df
product diff isnew date
0 a nan nan 2020-6-6
1 b nan nan 2020-6-6
2 a 2 False 2020-6-7
3 b 3 False 2020-6-7
4 c 2 True 2020-6-7不好意思,如果我在第一个例子中不清楚,实际上我有很多个月的数据,所以我不只是在做一个周期与另一个阶段的区别。它需要是一个一般的情况,它查看月份n对n-1,然后n-1和n-2的差值,等等。
在潘达斯做这件事最好的方法是什么?
发布于 2020-06-08 04:11:15
您可以在列产品上尝试groupby,并为列'diff‘设置diff列数量。然后对列'isnew‘使用duplicated。
df['diff'] = df.groupby('product')['amount'].diff()
df['isnew'] = ~df['product'].duplicated()
print (df)
product amount date diff isnew
0 a 1 2020-6-6 NaN True
1 b 2 2020-6-6 NaN True
2 a 3 2020-6-7 2.0 False
3 b 5 2020-6-7 3.0 False
4 c 2 2020-6-7 NaN True发布于 2020-06-08 04:16:36
我想这里的关键是找到isnew
# new products by `product`
new_prods = df['date'] != df.date.min()
duplicated = df.duplicated('product')
# first appearance of new products
# or duplicated *old* products
valids = new_prods ^ duplicated
df.loc[valids,'is_new'] = ~ duplicated
# then the difference:
df['diff'] = (df.groupby('product')['amount'].diff() # normal differences
.fillna(df['amount']) # fill the first value for all product
.where(df['is_new'].notna()) # remove the first month
)输出:
product amount date is_new diff
0 a 1 2020-6-6 NaN NaN
1 b 2 2020-6-6 NaN NaN
2 a 3 2020-6-7 False 2.0
3 b 5 2020-6-7 False 3.0
4 c 2 2020-6-7 True 2.0https://stackoverflow.com/questions/62254743
复制相似问题