目前,我有一个如下所示的数据格式,每次它跨越1000ex (2000,3000...etc)的倍数时,我都需要重置累计和。
Production ID cumsum
2017-10-19 1054 1323217 1054
2017-10-20 0 1323217 1054
2017-10-21 0 1323217 1054
2017-10-22 0 1323217 1054
2017-10-23 0 1323217 1054 例如,在上面,我需要一个如下所示的df:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1000 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4000 4
2017-10-23 0 1323217 4018 108 0
2017-10-23 500 1323218 500 500 0下面,每1000正确重置一次值,但我似乎不太明白如何通过ID对其进行分组,并将其舍入到1000 s。
maxvalue = 1000
lastvalue = 0
newcum = []
for row in df.iterrows():
thisvalue = row[1]['cumsum'] + lastvalue
if thisvalue > maxvalue:
thisvalue = 0
newcum.append( thisvalue )
lastvalue = thisvalue
df['newcum'] = newcum由于下面的答案,我现在能够计算生成的累积数,但我需要计算生成的增量#。
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] %
thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
My initial thinking was to try something similar to:
df['numGen1'] = df['cumsum2'].diff()最终编辑测试和工作。谢谢你的帮助,
I was overthinking it, below is how I was able to do it:
df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000
multiple = (df['cumsum'] // thresh )
mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh
df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()
numgen = []
adjcumsum = []
for i in range(len(df['cumsum'])):
if df['cumsum'][i] > thresh and (df['ID'][i] == df['ID'][i-1]):
numgenv = (df['cumsum'][i] // thresh) - (df['cumsum'][i-1] // thresh)
numgen.append(numgenv)
elif df['cumsum'][i] > thresh:
numgenv = (df['cumsum'][i] // thresh)
numgen.append(numgenv)
else:
numgenv = 0
numgen.append(numgenv)
df['numgen2.0'] = numgen发布于 2019-07-19 17:09:49
IIUC,这只是一个整数除法问题,有一些技巧:
thresh = 1000
df['cumsum'] = df['Production'].cumsum()
# how many times cumsum passes thresh
multiple = (df['cumsum'] // thresh )
# detect where thresh is pass
mask = multiple.diff().ne(0)
# update the number generated:
df['numberGenerated'] = np.where(mask, multiple, 0)
# then the adjusted cumsum
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh输出:
Production ID cumsum adjCumsum numberGenerated
2017-10-19 1054 1323217 1054 1054 1
2017-10-20 0 1323217 1054 54 0
2017-10-21 0 1323217 1054 54 0
2017-10-22 3054 1323217 4108 4108 4
2017-10-23 0 1323217 4108 108 0
2017-10-23 500 1323218 4608 608 0https://stackoverflow.com/questions/57116732
复制相似问题