我有一个有正负值的列的数据框架。我想找到负值的索引位置。我这里有两种方法,我试图找出哪一种方法是最好的和最快的。我的代码是:
import pandas as pd
import time
df = pd.DataFrame({'Current': [1, 3, -4, 9, -3, 1, -2]})
# Method-1
start1 = time.time()
neg_index1 = df[(df["Current"]<0)].index.tolist()
print(neg_index1)
end1 = time.time()
print("Method-1 time is = ",end1 - start1)
# Method-2
start2 = time.time()
neg_index2 = df.iloc[df["Current"].lt(0).values].index.tolist()
print(neg_index2)
end2 = time.time()
print("Method-2 time is = ",end2 - start2)在这里,第一次执行和方法-2的输出速度更快:
[2, 4, 6]
Method-1 time is = 0.002000093460083008
[2, 4, 6]
Method-2 time is = 0.0009999275207519531在第二次执行时输出,有趣的是,两个时钟同时进行:
[2, 4, 6]
Method-1 time is = 0.0009999275207519531
[2, 4, 6]
Method-2 time is = 0.0009999275207519531第四次执行时的输出和令人惊讶的方法-1在这里更快:
[2, 4, 6]
Method-1 time is = 0.0009999275207519531
[2, 4, 6]
Method-2 time is = 0.0019998550415039062一些解释和帮助,以了解哪种方法更快?
发布于 2018-09-10 05:48:11
我更喜欢使用np.where
np.where(df['Current']<0)[0].tolist()也不要使用time.time使用timeit
import pandas as pd, numpy as np
import timeit
df = pd.DataFrame({'Current': [1, 3, -4, 9, -3, 1, -2]})
# Method-1
neg_index1 = df[(df["Current"]<0)].index.tolist()
print(neg_index1)
print("Method-1 time is = ",timeit.timeit(lambda: df[(df["Current"]<0)].index.tolist(),number=10))
# Method-2
neg_index2 = df.iloc[df["Current"].lt(0).values].index.tolist()
print(neg_index2)
print("Method-2 time is = ",timeit.timeit(lambda: df.iloc[df["Current"].lt(0).values].index.tolist(),number=10))
# Method-3
neg_index2 = np.where(df['Current']<0)[0].tolist()
print(neg_index2)
print("Method-3 time is = ",timeit.timeit(lambda: np.where(df['Current']<0)[0].tolist(),number=10))输出:
[2, 4, 6]
Method-1 time is = 0.0211404744016608
[2, 4, 6]
Method-2 time is = 0.02377961247025239
[2, 4, 6]
Method-3 time is = 0.007515077367731743所以np.where赢得了巨大的胜利!
发布于 2018-09-10 05:53:01
在测量每次执行所需的时间时,可能会有其他流程消耗资源。垃圾收集器也可能在随机点启动,扭曲结果。因此,永远不要使用time.time()来比较性能。
使用timeit.timeit来测量性能。它多次重复代码运行,并测量每次运行所需的平均时间,从而提供更好的结果。它还禁用运行期间的垃圾收集。
https://stackoverflow.com/questions/52251680
复制相似问题