我已经编写了下面的代码。有两个Pandas:df包含timestamp_milli和pressure,df2包含列timestamp_milli和acceleration_z。这两个数据格式都有大约100,000行。在下面所示的代码中,我搜索df的每一行的每个时间戳,df2的行,其中时间差在一个范围内并且最小。
不幸的是,代码非常慢。此外,我还从行df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]获得了以下消息
SettingWithCopyWarning:一个值试图在DataFrame的一个切片的副本上设置。尝试使用.locrow_indexer,col_indexer = value代替
我如何加快代码和解决警告?
acceleration = []
pressure = []
for index, row in df.iterrows():
mask = (df2["timestamp_milli"] >= (row["timestamp_milli"] - 5)) & (df2["timestamp_milli"] <= (row["timestamp_milli"] + 5))
df_temp = df2[mask]
# Select closest point
if len(df_temp) > 0:
df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
df_temp["timestamp_milli"] = df_temp["timestamp_milli"].abs()
df_temp = df_temp.loc[df_temp["timestamp_milli"] == df_temp["timestamp_milli"].min()]
for index2, row2 in df_temp.iterrows():
pressure.append(row["pressure"])
acc = row2["acceleration_z"]
acceleration.append(acc)发布于 2018-05-29 15:07:03
我也遇到了一个类似的问题,使用迭代而不是迭代行会显着地缩短时间。why iterrows have issues.希望这能帮上忙。
https://stackoverflow.com/questions/50587318
复制相似问题