我的目标是大幅提高我的代码速度,我认为这可以使用np.select完成,尽管我不知道如何做到这一点。
下面是我的代码执行时的当前输出:
date starting_temp average_high average_low limit_temp observation_date Date_Limit_reached
2019-12-03 22:30:00 NaN 13.0 14.8 NaN nan
2019-12-03 23:00:00 NaN 14.7 14.9 NaN nan
2019-12-03 23:30:00 NaN 13.0 13.9 NaN nan
2019-12-04 00:00:00 13.2 13.0 14.7 NaN 2019-12-04 10:00:00
2019-12-04 00:30:00 NaN 14.0 13.8 NaN nan
2019-12-04 01:00:00 NaN 13.9 13.8 NaN nan
2019-12-04 01:30:00 NaN 13.6 14.8 NaN nan
2019-12-04 02:00:00 NaN 13.1 14.5 NaN nan
2019-12-04 02:30:00 NaN 14.9 13.7 NaN nan
2019-12-04 03:00:00 NaN 14.2 14.1 NaN nan
2019-12-04 03:30:00 NaN 13.4 14.1 NaN nan
2019-12-04 04:00:00 NaN 14.3 13.0 NaN nan
2019-12-04 04:30:00 NaN 13.5 14.1 NaN nan
2019-12-04 05:00:00 NaN 13.6 13.4 NaN nan
2019-12-04 05:30:00 NaN 14.5 13.9 NaN nan
2019-12-04 06:00:00 NaN 14.4 14.5 NaN nan
2019-12-04 06:30:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:00:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:30:00 NaN 13.2 14.4 NaN nan
2019-12-04 08:00:00 NaN 13.9 13.1 NaN nan
2019-12-04 08:30:00 NaN 13.9 14.4 NaN nan
2019-12-04 09:00:00 NaN 14.4 13.9 NaN nan
2019-12-04 09:30:00 NaN 14.4 13.8 NaN nan
2019-12-04 10:00:00 NaN 15.0 14.0 NaN nan
2019-12-04 10:30:00 NaN 13.2 13.2 NaN nan
2019-12-04 11:00:00 NaN 14.0 13.3 NaN nan
2019-12-04 11:30:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:00:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:30:00 NaN 13.7 13.6 NaN nan
2019-12-04 13:00:00 NaN 14.1 13.3 NaN nan
2019-12-04 13:30:00 NaN 13.1 14.1 NaN nan
2019-12-04 14:00:00 NaN 13.2 14.3 NaN nan
2019-12-04 14:30:00 NaN 13.7 13.8 NaN nan 生成最终的df‘’Date_Limit_reached‘列的代码太慢了,我在下面添加了它。如果可能的话,我想把它的结构改为np.select:
new_col = []
df_size = len(df)
# Loop the dataframe
for ind in df.index:
if not math.isnan(df['starting_temp'][ind]):
entry_price_val = df['starting_temp'][ind]
count = 0
hasValue = False
while count < df_size:
if df['starting_temp'][ind] > df['limit_temp'][ind] and df['limit_temp'][ind] >= df['asklow'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
elif df['starting_temp'][ind] < df['limit_temp'][ind] and df['limit_temp'][ind] <= df['average_high'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
# If matching value not meets, then append nan value to the column
if not hasValue:
new_col.append(float('nan'))
else:
new_col.append(float('nan'))
df['Date_Limit_reached'] = new_col发布于 2020-09-02 22:23:56
由于缺少df,我不能运行代码,所以我的建议是:
df['starting_temp'][ind] == df['limit_temp'][ind]条目,你会遇到问题,因为你的案例都不会触发。也许这就是慢代码的问题所在。while
解决问题。
不使用entry_price_val的
对于进一步的改进,
以下是我建议的代码
new_col = []
df_size = len(df)
for ind in df.index:
val = float('nan') # use data instead of flags
if not math.isnan(df['starting_temp'][ind]):
count = 0
if df['starting_temp'][ind] > df['limit_temp'][ind]:
while count < df_size:
if df['limit_temp'][ind] >= df['asklow'][count] and df['date'][count] >= df['observation_date'][ind] :
val=df['date'][count]
break # Break the loop if matching value meets
count += 1
elif df['starting_temp'][ind] < df['limit_temp'][ind]
while count < df_size:
if df['limit_temp'][ind] <= df['average_high'][count] and df['date'][count] >= df['observation_date'][ind] :
val = df['date'][count]
break # Break the loop if matching value meets
count += 1
new_col.append(val)
df['Date_Limit_reached'] = new_col未测试代码片段,需要测试正确性,可能会有进一步的改进(应请求提供提示)。
https://stackoverflow.com/questions/63521865
复制相似问题