就上下文而言,我试图使用回归来衡量竞争对手广告的存在是否会影响广告的指标。我不知道如何巩固周数,或者根据一周内品牌的存在来分配布尔值(1或0),但是不同的行。
import pandas as pd
df = pd.DataFrame({'week': ['2019-11-11', '2019-11-11', '2019-11-18', '2019-11-25', '2019-11-11', '2019-11-18', '2019-11-11'],
'brand':['X', 'X-2', 'X', 'X', 'Y', 'Y', 'Z'],
'score': [.34, .25, .54, .23, .22, .34, .44]}) 预期结果:
df = pd.DataFrame({'week': ['2019-11-11', '2019-11-11', '2019-11-18', '2019-11-25', '2019-11-11', '2019-11-18', '2019-11-11'],
'brand':['X', 'X-2', 'X', 'X', 'Y', 'Y', 'Z'],
'score': [.34, .25, .54, .23, .22, .34, .44],
'presence_dummy_Y': [1, 1, 1, 0, 1, 1, 1],
'presence_dummy_Z': [1, 1, 0, 0, 1, 0, 1]}) 发布于 2022-08-12 15:57:29
你可以使用get_dummies,用loc/filter过滤竞争对手,如果一周内GroupBy.max至少有1,你可以得到1。
df.join(pd
.get_dummies(df['brand']) # transform to dummies
.filter(regex='^(?!X)') # keep only brands not starting with X
.groupby(pd.to_datetime(df['week']).dt.to_period('W')) # groupby week
.transform('max') # 1 if at least a 1
.add_prefix('dummy_') # rename columns
)产出:
week brand score dummy_Y dummy_Z
0 2019-11-11 X 0.34 1 1
1 2019-11-11 X-2 0.25 1 1
2 2019-11-18 X 0.54 1 0
3 2019-11-25 X 0.23 0 0
4 2019-11-11 Y 0.22 1 1
5 2019-11-18 Y 0.34 1 0
6 2019-11-11 Z 0.44 1 1发布于 2022-08-12 15:57:02
让我们试一下
out = df.join(df['brand'].str.get_dummies()
.groupby(df['week']).transform('any').astype(int)
.pipe(lambda df: df.filter(regex='Y|Z'))
.add_prefix('presence_dummy_'))print(out)
week brand score presence_dummy_Y presence_dummy_Z
0 2019-11-11 X 0.34 1 1
1 2019-11-11 X-2 0.25 1 1
2 2019-11-18 X 0.54 1 0
3 2019-11-25 X 0.23 0 0
4 2019-11-11 Y 0.22 1 1
5 2019-11-18 Y 0.34 1 0
6 2019-11-11 Z 0.44 1 1https://stackoverflow.com/questions/73336756
复制相似问题