数据格式的df1如下所示
SALE_CTRY BUILT_CITY CATEGORY MONTH_YEAR BUCKET RATE
CA Mumbai rtyu 3/1/2004 3 5
CA Mumbai rtyu 5/1/2004 4 6
US Mumbai rtyu 8/1/2004 3 6
IN Delhi aghaj 4/1/2005 5 4
RU Delhi aghaj 8/1/2005 3 3
RU Delhi aghaj 10/1/2005 2 5尝试添加缺少的桶以及MONTH_YEAR

当我只有4列,如类别,MONTH_YEAR,桶,速率,然后我填入如下代码
df1['MONTH_YEAR'] = pd.to_datetime(df1['MONTH_YEAR'])
rng = pd.date_range(df1[‘MONTH_YEAR’].min(),df1['MONTH_YEAR'].max(), freq='MS')
df2 = pd.DataFrame({ 'MONTH_YEAR': rng})
df1['Bucket'] = df1['Bucket'].astype(int)
Bucket=np.arange(0,(df11['Bucket'].max()+2),1)
# repeating MONTH_YEAR as many times as bucket
df2=pd.concat([AVS2]*len(Bucket))
df2['Bucket']=np.repeat(Bucket, len(rng))
df3 = pd.merge(left=df2, right=df1, how='left',on=['MONTH_YEAR','Bucket']).fillna(0)与其他专栏一起,我如何扩展它?
发布于 2021-03-17 04:32:04
agg和merge
cols = ['SALE_CTRY', 'BUILT_CITY', 'CATEGORY']
func = {'MONTH_YEAR': ['min', 'max'], 'BUCKET': ['max']}
aggs = df.groupby(cols).agg(func)
_df = pd.DataFrame([
(s, b, c, m, bucket)
for (s, b, c), mn, mx, bucket_mx in aggs.itertuples()
for m in pd.date_range(mn, mx, freq='MS')
for bucket in range(bucket_mx + 1)
], columns=[*cols, 'MONTH_YEAR', 'BUCKET'])
df.merge(_df, 'right').fillna(0, downcast='infer') SALE_CTRY BUILT_CITY CATEGORY MONTH_YEAR BUCKET RATE
0 CA Mumbai rtyu 2004-03-01 0 0
1 CA Mumbai rtyu 2004-03-01 1 0
2 CA Mumbai rtyu 2004-03-01 2 0
3 CA Mumbai rtyu 2004-03-01 3 5
4 CA Mumbai rtyu 2004-03-01 4 0
5 CA Mumbai rtyu 2004-04-01 0 0
6 CA Mumbai rtyu 2004-04-01 1 0
7 CA Mumbai rtyu 2004-04-01 2 0
8 CA Mumbai rtyu 2004-04-01 3 0
9 CA Mumbai rtyu 2004-04-01 4 0
10 CA Mumbai rtyu 2004-05-01 0 0
11 CA Mumbai rtyu 2004-05-01 1 0
12 CA Mumbai rtyu 2004-05-01 2 0
13 CA Mumbai rtyu 2004-05-01 3 0
14 CA Mumbai rtyu 2004-05-01 4 6
15 IN Delhi aghaj 2005-04-01 0 0
16 IN Delhi aghaj 2005-04-01 1 0
17 IN Delhi aghaj 2005-04-01 2 0
18 IN Delhi aghaj 2005-04-01 3 0
19 IN Delhi aghaj 2005-04-01 4 0
20 IN Delhi aghaj 2005-04-01 5 4
21 RU Delhi aghaj 2005-08-01 0 0
22 RU Delhi aghaj 2005-08-01 1 0
23 RU Delhi aghaj 2005-08-01 2 0
24 RU Delhi aghaj 2005-08-01 3 3
25 RU Delhi aghaj 2005-09-01 0 0
26 RU Delhi aghaj 2005-09-01 1 0
27 RU Delhi aghaj 2005-09-01 2 0
28 RU Delhi aghaj 2005-09-01 3 0
29 RU Delhi aghaj 2005-10-01 0 0
30 RU Delhi aghaj 2005-10-01 1 0
31 RU Delhi aghaj 2005-10-01 2 5
32 RU Delhi aghaj 2005-10-01 3 0
33 US Mumbai rtyu 2004-08-01 0 0
34 US Mumbai rtyu 2004-08-01 1 0
35 US Mumbai rtyu 2004-08-01 2 0
36 US Mumbai rtyu 2004-08-01 3 6发布于 2021-10-12 10:18:50
您可以使用完成函数将进程从化脓者中抽象出来,以公开丢失的行,并填充fillna --这些行将通过SALE_CTRY、BUILT_CITY和CATEGORY组合按组公开。
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn# within the complete function the years and bucket will contain all rows,
# including the missing rows; fillna helps to replace the null values with 0.
expand = {'MONTH_YEAR': lambda year: pd.date_range(year.min(),
year.max(),
freq='MS'),
'BUCKET': lambda bucket: range(bucket.max() + 1)}
(df.complete(expand,
by = ['SALE_CTRY', 'BUILT_CITY', 'CATEGORY'],
sort = True)
.fillna(0, downcast='infer')
)
SALE_CTRY BUILT_CITY CATEGORY MONTH_YEAR BUCKET RATE
0 CA Mumbai rtyu 2004-03-01 00:00:00 0 0
1 CA Mumbai rtyu 2004-03-01 00:00:00 1 0
2 CA Mumbai rtyu 2004-03-01 00:00:00 2 0
3 CA Mumbai rtyu 2004-03-01 00:00:00 3 5
4 CA Mumbai rtyu 2004-03-01 00:00:00 4 0
5 CA Mumbai rtyu 2004-04-01 00:00:00 0 0
6 CA Mumbai rtyu 2004-04-01 00:00:00 1 0
7 CA Mumbai rtyu 2004-04-01 00:00:00 2 0
8 CA Mumbai rtyu 2004-04-01 00:00:00 3 0
9 CA Mumbai rtyu 2004-04-01 00:00:00 4 0
10 CA Mumbai rtyu 2004-05-01 00:00:00 0 0
11 CA Mumbai rtyu 2004-05-01 00:00:00 1 0
12 CA Mumbai rtyu 2004-05-01 00:00:00 2 0
13 CA Mumbai rtyu 2004-05-01 00:00:00 3 0
14 CA Mumbai rtyu 2004-05-01 00:00:00 4 6
15 IN Delhi aghaj 2005-04-01 00:00:00 0 0
16 IN Delhi aghaj 2005-04-01 00:00:00 1 0
17 IN Delhi aghaj 2005-04-01 00:00:00 2 0
18 IN Delhi aghaj 2005-04-01 00:00:00 3 0
19 IN Delhi aghaj 2005-04-01 00:00:00 4 0
20 IN Delhi aghaj 2005-04-01 00:00:00 5 4
21 RU Delhi aghaj 2005-08-01 00:00:00 0 0
22 RU Delhi aghaj 2005-08-01 00:00:00 1 0
23 RU Delhi aghaj 2005-08-01 00:00:00 2 0
24 RU Delhi aghaj 2005-08-01 00:00:00 3 3
25 RU Delhi aghaj 2005-09-01 00:00:00 0 0
26 RU Delhi aghaj 2005-09-01 00:00:00 1 0
27 RU Delhi aghaj 2005-09-01 00:00:00 2 0
28 RU Delhi aghaj 2005-09-01 00:00:00 3 0
29 RU Delhi aghaj 2005-10-01 00:00:00 0 0
30 RU Delhi aghaj 2005-10-01 00:00:00 1 0
31 RU Delhi aghaj 2005-10-01 00:00:00 2 5
32 RU Delhi aghaj 2005-10-01 00:00:00 3 0
33 US Mumbai rtyu 2004-08-01 00:00:00 0 0
34 US Mumbai rtyu 2004-08-01 00:00:00 1 0
35 US Mumbai rtyu 2004-08-01 00:00:00 2 0
36 US Mumbai rtyu 2004-08-01 00:00:00 3 6https://stackoverflow.com/questions/66666912
复制相似问题