我有一个数据集(DATE_LOCATION,dates ),它的产品在不同的日期销售。日期是9个月,随机13天或更长时间从一个月。我必须以这样的方式分离数据,即产品的连续销售时间为1-3天,连续销售4至7天,连续销售8至15天,连续销售for>16日。那么,我如何使用熊猫和其他软件包在python中编写这个代码呢?
DATE_LOCATION Sold
07-08-16 0:00 2
08-08-16 0:00 7
12-08-16 0:00 3
13-08-16 0:00 1
14-08-16 0:00 2
15-08-16 0.00 1
.
.
.
22-10-16 0:00 1
23-10-16 0:00 2
26-10-16 0:00 1
28-10-16 0:00 1
29-10-16 0:00 3
30-10-16 0:00 3
.
.
.(goes for 9 months of data)
.我甚至不知道如何在python中编写代码,所需的输出是
Days Sold
1-3 20 #(7,8),(22,23),(26),(28,29,30) dates because the range is [1,3]
4-7 7 #(12,13,14,15) dates because the range is [4,7]
8-15 0
>16 0如果至少有人发了一个链接到哪里,我会很高兴的。我试过了
df["DATE_LOCATION"] = pd.to_datetime(df.DATE_LOCATION)
df["DAY"] = df.DATE_LOCATION.dt.day
def flag(x):
if 1<=x<=3:
return '1-3'
elif 4<=x<=7:
return '4-7'
elif 8<=x<=15:
return '8-15'
else:
return '>=16'
df["Days"] = df.DAY.apply(flag)
df["Days"].Sold.sum()这给了我每个月这几天销售的产品数量。但是我需要产品的总和,在指定的范围内,产品在指定范围内销售。
发布于 2018-06-19 07:31:47
我用这个代码复制了输入数据。
df = pd.DataFrame({'DATE_LOCATION': ['07-08-16 0:00', '08-08-16 0:00', '12-08-16 0:00',\
'13-08-16 0:00', '14-08-16 0:00', '15-08-16 0:00',\
'22-10-16 0:00', '23-10-16 0:00', '26-10-16 0:00',\
'28-10-16 0:00', '29-10-16 0:00', '30-10-16 0:00',],\
'Sold': [2, 7, 3, 1, 2, 1, 1, 2, 1, 1, 3, 3]})
df.DATE_LOCATION = pd.to_datetime(df.DATE_LOCATION, dayfirst=True)现在数据看起来像这样
DATE_LOCATION Sold
0 2016-08-07 2
1 2016-08-08 7
2 2016-08-12 3
3 2016-08-13 1
4 2016-08-14 2
5 2016-08-15 1
6 2016-10-22 1
7 2016-10-23 2
8 2016-10-26 1
9 2016-10-28 1
10 2016-10-29 3
11 2016-10-30 3获取行间间隔,计算运行长度(连续天数),并对其进行分组,直至运行时间持续延长,最后得到最大run_length值和各组售出物品的总和。
df['Day_Interval'] = df.DATE_LOCATION.diff().shift(0).fillna(0)
# calculate run length
day_intervals = (df.Day_Interval.values / np.timedelta64(1, 'D')).astype(int)
run_lengths = []
run_length = 0
groups = []
group = 0
for day_interval in day_intervals:
if day_interval != 1:
run_length = 1
group += 1
groups.append(group)
else:
run_length += 1
groups.append(group)
run_lengths.append(run_length)
df['Run_Length'] = run_lengths
df['Group'] = groups
# calculate summary statistic by group
df = df.groupby('Group')['Sold', 'Run_Length'].agg({'Sold': np.sum, 'Run_Length': np.max})
df['1-3'] = 0
df['4-7'] = 0
df['8-15'] = 0
df['>=16'] = 0
df.loc[(df.Run_Length >= 1) & (df.Run_Length <=3), "1-3"] = df.Sold
df.loc[(df.Run_Length >= 4) & (df.Run_Length <=7), "4-7"] = df.Sold
df.loc[(df.Run_Length >= 8) & (df.Run_Length <=15), "8-15"] = df.Sold
df.loc[(df.Run_Length >= 16), ">=16"] = df.Sold
df = df.T.iloc[2:]
df['Sold'] = df.sum(axis=1)
df = df[['Sold']]输出(df):
Group Sold
1-3 20
4-7 7
8-15 0
>=16 0希望这能有所帮助。
https://stackoverflow.com/questions/50636307
复制相似问题