你们大家好。我有一个地区的数据,客户和一些交货,加上他们的价格。这一栏被用作购买类型,第一次和最后一次采购被标记为“第一次”和“最后一次”,有时我们有中间交货标记为“交付”。我需要将具有相同的第一次和最后一次交付价格的客户和区域标记为,作为期望输出中的一列。必须显示整个数据。
我已经使用merge解决了这个问题,但是我想知道是否有一种不使用合并的方法,因为它看起来并不那么有效。耽误您时间,实在对不起。
样本数据:
import pandas as pd
data = [['NY', 'A','FIRST', 25], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 15],
['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price'])
# print dataframe.
print(df)
region customer purchaseType price
0 NY A FIRST 25
1 NY A DELIVERY 20
2 NY A DELIVERY 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY 10
9 FL A DELIVERY 12
10 FL A DELIVERY 25
11 FL A LAST 15
12 FL C FIRST 15
13 FL C LAST 10
14 FL D FIRST 10
15 FL D DELIVERY 20
16 FL D LAST 30
17 FL E FIRST 20
18 FL E LAST 20预期输出:
region customer purchaseType price firstLastEqual
0 NY A FIRST 25 True
1 NY A DELIVERY 20 True
2 NY A DELIVERY 30 True
3 NY A LAST 25 True
4 NY B FIRST 15 False
5 NY B DELIVERY 10 False
6 NY B LAST 20 False
7 FL A FIRST 15 True
8 FL A DELIVERY 10 True
9 FL A DELIVERY 12 True
10 FL A DELIVERY 25 True
11 FL A LAST 15 True
12 FL C FIRST 15 False
13 FL C LAST 10 False
14 FL D FIRST 10 False
15 FL D DELIVERY 20 False
16 FL D LAST 30 False
17 FL E FIRST 20 True
18 FL E LAST 20 True发布于 2020-11-29 23:46:09
用“合并”回答:
df_first = df[df['purchaseType'] == 'FIRST']
df_last = df[df['purchaseType'] == 'LAST']
df_compare = df_first.merge(df_last, how='inner', left_on=['region','customer'], right_on=['region','customer'])
df_compare = df_compare[df_compare['price_x'] == df_compare['price_y']]
df_compare['firstLastEqual'] = True
df = df.merge(df_compare, how='left', left_on=['region','customer'], right_on=['region','customer'])
df['firstLastEqual'] = df['firstLastEqual'].fillna(False)
df = df.drop(['purchaseType_x', 'price_x', 'purchaseType_y', 'price_y'], axis=1)
print(df)想知道如果可能的话不合并。
https://stackoverflow.com/questions/65066448
复制相似问题