我有一个数据文件df
hour calls received appointment
6:48 4 2 2
4:02 21 3 2
12:52 31 7 4
2:14 32 5 2
6:45 13 3 2hour列为string
我想用像1-2,2-3这样的格式,按小时计算和分组。
我的做法是:
df[['hour','calls','received','appointment']].groupby('hour').sum() 此外,我想检查每一个小时,如果数据不存在的任何一个小时,填补它为零。
我要输出如下:
hour calls received appointment
0-1 0 0 0
1-2 0 0 0
2-3 32 5 2
3-4 0 0 0
4-5 21 3 2
5-6 0 0 0
6-7 17 5 4
...发布于 2022-07-24 05:14:22
您可以使用pandas.resmaple基本hour,然后在['calls','received','appointment']上计算和,并在最后将日期时间重命名为所需的格式。
df['time'] = pd.to_datetime(df['hour'])
df = df.set_index('time').resample('H')[['calls','received','appointment']].sum().reset_index()
# rename 2022-07-24 02:00:00 -> (2-3)
df['time'] = df['time'].apply(lambda x: f"{x.hour}-{x.hour+1}")
print(df) time calls received appointment
0 2-3 32 5 2
1 3-4 0 0 0
2 4-5 21 3 2
3 5-6 0 0 0
4 6-7 17 5 4
5 7-8 0 0 0
6 8-9 0 0 0
7 9-10 0 0 0
8 10-11 0 0 0
9 11-12 0 0 0
10 12-13 31 7 4发布于 2022-07-24 05:56:05
您可以在pd.cut列上使用hour:
# Create labels: 0-1, 1-2, 2-3, ...
labels = [f"{i}-{i+1}" for i in range(24)]
# Extract the hour part and convert it as int
hours = df['hour'].str.split(':').str[0].astype(int)
# Classify your data. The output is a Series with a 'category' dtype
df['hour'] = pd.cut(hours, range(25), labels=labels, right=False)
# Group by range and sum [...]
out = df.groupby('hour', as_index=False).sum()..。由于石斑鱼是绝对的,所以显示了所有的观察结果:)
观察到:bool,默认为False 这只适用于任何石斑鱼都是绝对的。 如果是真的:只显示分类石斑鱼的观察值。 如果错误:显示分类石斑鱼的所有值。
输出:
>>> out
hour calls received appointment
0 0-1 0 0 0
1 1-2 0 0 0
2 2-3 32 5 2
3 3-4 0 0 0
4 4-5 21 3 2
5 5-6 0 0 0
6 6-7 17 5 4
7 7-8 0 0 0
8 8-9 0 0 0
9 9-10 0 0 0
10 10-11 0 0 0
11 11-12 0 0 0
12 12-13 31 7 4
13 13-14 0 0 0
14 14-15 0 0 0
15 15-16 0 0 0
16 16-17 0 0 0
17 17-18 0 0 0
18 18-19 0 0 0
19 19-20 0 0 0
20 20-21 0 0 0
21 21-22 0 0 0
22 22-23 0 0 0
23 23-24 0 0 0发布于 2022-07-24 05:13:44
# Split into separate columns:
df[['hour', 'minute']] = df.hour.str.split(':', expand=True).astype(int)
# Pivot the table, summing hours together:
cols = ['calls', 'received', 'appointment']
df = df.pivot_table(index='hour', values=cols, aggfunc='sum')
# Create a new interval range DataFrame:
new_index = pd.DataFrame(pd.interval_range(0, 13), columns=['hour'])
# Join the interval range and the dataframe:
df = new_index.join(df, how='outer')
# Fill NaN and turn back to integers:
df[cols] = df[cols].fillna(0).astype(int)
# Printing to match your col order:
print(df[['hour'] + cols])输出:
hour calls received appointment
0 (0, 1] 0 0 0
1 (1, 2] 0 0 0
2 (2, 3] 32 5 2
3 (3, 4] 0 0 0
4 (4, 5] 21 3 2
5 (5, 6] 0 0 0
6 (6, 7] 17 5 4
7 (7, 8] 0 0 0
8 (8, 9] 0 0 0
9 (9, 10] 0 0 0
10 (10, 11] 0 0 0
11 (11, 12] 0 0 0
12 (12, 13] 31 7 4https://stackoverflow.com/questions/73095800
复制相似问题