我正试着给时间贴上这样的标签
df['day-hour'] = ('Day' + (df['hour'] // 24).add(1).astype(str) +
' - ' + (df['hour'] % 24).astype(str))所以,结果是
customer_id hour day-hour
1 10 Day1 - 10
1 123 Day6 - 3
1 489 Day21 - 9
2 230 Day9 - 14然后我尝试对df.groupby(['customer_id','day-hour']).size().unstack(fill_value=0)进行分组
结果是
day-hour Day1 - 10 Day6 - 3 Day21 - 9 Day9 - 14
customer_id
1 1 1 1 0
2 0 0 0 1我所期望的输出按实际天数排序,如下所示
day-hour Day1 - 10 Day6 - 3 Day9 - 14 Day21 - 9
customer_id
1 1 1 0 1
2 0 0 1 0我该改什么代码?
发布于 2018-06-28 10:43:00
有两种可能的解决方案--在注释中添加指向@Zero的零:
df['day-hour'] = ('Day' + (df['hour'] // 24).add(1).astype(str).str.zfill(2) +
' - ' + (df['hour'] % 24).astype(str).str.zfill(2) )或按custom function with 2 fields排序
df = df[sorted(df.columns,key=lambda x: (int(x.split(' - ')[0][3:]), int(x.split(' - ')[1])))]更具可读性:
def f(x):
a = x.split(' - ')
return (int(a[0][3:]), int(a[1]))
df = df[sorted(df.columns, key=f)]
print (df)
Day1 - 10 Day6 - 3 Day9 - 14 Day21 - 9
1 1 1 0 1
2 0 0 1 0https://stackoverflow.com/questions/51080695
复制相似问题