这是我的数据
import pandas as pd
data=pd.DataFrame({'vehicle':['car','car','car','car','car','car','bus','bus','bus','bus','bus','bus','car','car','car','car','car','car','bus','bus','bus','bus','bus','bus'],
'expecteddate':['2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022'],'range':[240,240,240,240,240,240,300,300,300,300,300,300,240,240,240,240,240,240,300,300,300,300,300,300],'color':['blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red'],'discount':[70,80,90,60,40,50,120,110,130,140,80,90,60,40,50,30,70,45,130,100,140,120,90,30],'date':['2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022']})
print(data)数据中的数据:
vehicle expecteddate range color discount date
0 car 2/24/2022 240 blue 70 2/18/2022
1 car 2/24/2022 240 red 80 2/18/2022
2 car 3/15/2022 240 blue 90 2/18/2022
3 car 3/15/2022 240 red 60 2/18/2022
4 car 4/20/2022 240 blue 40 2/18/2022
5 car 4/20/2022 240 red 50 2/18/2022
6 bus 2/24/2022 300 blue 120 2/18/2022
7 bus 2/24/2022 300 red 110 2/18/2022
8 bus 3/15/2022 300 blue 130 2/18/2022
9 bus 3/15/2022 300 red 140 2/18/2022
10 bus 4/20/2022 300 blue 80 2/18/2022
11 bus 4/20/2022 300 red 90 2/18/2022
12 car 2/24/2022 240 blue 60 2/17/2022
13 car 2/24/2022 240 red 40 2/17/2022
14 car 3/15/2022 240 blue 50 2/17/2022
15 car 3/15/2022 240 red 30 2/17/2022
16 car 4/20/2022 240 blue 70 2/17/2022
17 car 4/20/2022 240 red 45 2/17/2022
18 bus 2/24/2022 300 blue 130 2/17/2022
19 bus 2/24/2022 300 red 100 2/17/2022
20 bus 3/15/2022 300 blue 140 2/17/2022
21 bus 3/15/2022 300 red 120 2/17/2022
22 bus 4/20/2022 300 blue 90 2/17/2022
23 bus 4/20/2022 300 red 30 2/17/2022根据这一数据,我们有两辆车,三个预期日期,范围,两种颜色,折扣和日期。我们必须在折扣栏中找到最小值和在折扣栏中得到最小值的日期,将新列分别为mindisc列和mindate,并根据最新日期保存在新列中,这需要根据车辆、预期日期、范围、颜色和日期进行过滤。
我们必须根据相同的颜色、范围、预期日期和车辆在折扣栏中的两个日期(所有日期,因为我们有许多日期不限于两个日期) 2/18/2022,2/17/2022中找到最小值。
最后,此min添加到mindisc列的最新日期和相应的日期,其中min日期显示为mindate列的最新日期行。
输出应该如下所示
country expecteddate range color discount date mindisc mindate
0 car 2/24/2022 240 blue 70 2/18/2022 60 2/17/2022
1 car 2/24/2022 240 red 80 2/18/2022 40 2/17/2022
2 car 3/15/2022 240 blue 90 2/18/2022 50 2/17/2022
3 car 3/15/2022 240 red 60 2/18/2022 30 2/17/2022
4 car 4/20/2022 240 blue 40 2/18/2022 40 2/18/2022
5 car 4/20/2022 240 red 50 2/18/2022 45 2/17/2022
6 bus 2/24/2022 300 blue 120 2/18/2022 120 2/18/2022
7 bus 2/24/2022 300 red 110 2/18/2022 100 2/17/2022
8 bus 3/15/2022 300 blue 130 2/18/2022 130 2/18/2022
9 bus 3/15/2022 300 red 140 2/18/2022 120 2/17/2022
10 bus 4/20/2022 300 blue 80 2/18/2022 80 2/18/2022
11 bus 4/20/2022 300 red 90 2/18/2022 30 2/17/2022
12 car 2/24/2022 240 blue 60 2/17/2022
13 car 2/24/2022 240 red 40 2/17/2022
14 car 3/15/2022 240 blue 50 2/17/2022
15 car 3/15/2022 240 red 30 2/17/2022
16 car 4/20/2022 240 blue 70 2/17/2022
17 car 4/20/2022 240 red 45 2/17/2022
18 bus 2/24/2022 300 blue 130 2/17/2022
19 bus 2/24/2022 300 red 100 2/17/2022
20 bus 3/15/2022 300 blue 140 2/17/2022
21 bus 3/15/2022 300 red 120 2/17/2022
22 bus 4/20/2022 300 blue 90 2/17/2022
23 bus 4/20/2022 300 red 30 2/17/2022车辆不限于汽车和公共汽车,它有许多车辆,而且数据在车辆和范围列中并非总是相等行,日期不限于两个日期。
发布于 2022-02-20 19:22:53
这是我的方法。让我们首先将"date"列作为datetime对象来处理:
df["date"] = pd.to_datetime(df["date"])现在,我们可以按照您的描述对数据进行分组,以找到达到最低折扣的行:
common_groupby = df.groupby(["color", "range", "expecteddate", "vehicle"])现在,让我们找出每组的最小折扣和最大日期发生的行:
source_idx = common_groupby["discount"].idxmin()
target_idx = common_groupby["date"].idxmax()
# Use df.loc[idx] to see the rows where the minimum discount is reached现在,我们可以使用source_idx将最小折扣值和相应日期(从target_src行)添加到正确的行中。
df.loc[target_idx, "mindisc"] = df.loc[source_idx, "discount"].values
df.loc[target_idx, "mindate"] = df.loc[source_idx, "date"].values正如您所看到的,我们只更改达到最小折扣的行中的值(idx)。这是这些操作的输出:
vehicle expecteddate range color discount date mindisc mindate
0 car 2/24/2022 240 blue 70 2022-02-18 60.0 2022-02-17
1 car 2/24/2022 240 red 80 2022-02-18 40.0 2022-02-17
2 car 3/15/2022 240 blue 90 2022-02-18 50.0 2022-02-17
3 car 3/15/2022 240 red 60 2022-02-18 30.0 2022-02-17
4 car 4/20/2022 240 blue 40 2022-02-18 40.0 2022-02-18
5 car 4/20/2022 240 red 50 2022-02-18 45.0 2022-02-17
6 bus 2/24/2022 300 blue 120 2022-02-18 120.0 2022-02-18
7 bus 2/24/2022 300 red 110 2022-02-18 100.0 2022-02-17
8 bus 3/15/2022 300 blue 130 2022-02-18 130.0 2022-02-18
9 bus 3/15/2022 300 red 140 2022-02-18 120.0 2022-02-17
10 bus 4/20/2022 300 blue 80 2022-02-18 80.0 2022-02-18
11 bus 4/20/2022 300 red 90 2022-02-18 30.0 2022-02-17
12 car 2/24/2022 240 blue 60 2022-02-17 NaN NaT
13 car 2/24/2022 240 red 40 2022-02-17 NaN NaT
14 car 3/15/2022 240 blue 50 2022-02-17 NaN NaT
15 car 3/15/2022 240 red 30 2022-02-17 NaN NaT
16 car 4/20/2022 240 blue 70 2022-02-17 NaN NaT
17 car 4/20/2022 240 red 45 2022-02-17 NaN NaT
18 bus 2/24/2022 300 blue 130 2022-02-17 NaN NaT
19 bus 2/24/2022 300 red 100 2022-02-17 NaN NaT
20 bus 3/15/2022 300 blue 140 2022-02-17 NaN NaT
21 bus 3/15/2022 300 red 120 2022-02-17 NaN NaT
22 bus 4/20/2022 300 blue 90 2022-02-17 NaN NaT
23 bus 4/20/2022 300 red 30 2022-02-17 NaN NaT另一种可能的解决方案是通过排序,然后两次下垂复制(受这个问题的启发:Select the max row per group - pandas performance issue),来改善暂存性:
# Min discount rows
source_df = df.sort_values(by=["discount"], ascending=True, kind='mergesort').drop_duplicates(["color", "range", "expecteddate", "vehicle"])
source_df = source_df.rename(columns={"date": "mindate", "discount": "mindisc"})
# Max date rows
target_df = df.reset_index().sort_values(by=["date"], ascending=False, kind="mergesort").drop_duplicates(["color", "range", "expecteddate", "vehicle"])
# Put min discount values into max date rows
df.loc[target_df["index"], ["mindisc", "mindate"]] = source_df[["mindisc", "mindate"]].values发布于 2022-02-20 18:50:45
我没有很好地理解这个问题,但你可以尝试这样的方法:
new_df = df[(df['date'] == '2/18/2022') & (df['color'] == 'blue') & (df['vehicle'] == 'car')]
然后:
new_df['discount'].min()
https://stackoverflow.com/questions/71197112
复制相似问题