对于Pandas在引擎盖下使用Numpy,我很好奇为什么在下面的例子中,直接的numpy代码(509ms)比用数据帧(6.38s)做同样的操作快12倍?
# function with numpy arrays
def f_np(freq, asd):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
# function with dataframe
def f_df(df):
for f in np.arange(21.,2000.,1.):
df['fi'] = df.Freq/f
df['gi'] = (1+df.fi**2) / ((1-df.fi**2)**2 + df.fi**2) * df.ASD
# process each df ...
freq = np.arange(20., 2000., .1)
asd = np.ones(len(freq))
df = pd.DataFrame({'Freq':freq, 'ASD':asd})
%timeit f_np(freq, asd)
%timeit f_df(df)
509 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.38 s ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)发布于 2020-06-03 01:41:29
你确定速度的差异是因为在这个特定的情况下“一些数据帧的操作”吗?我认为速度上的差异是由于您在第一个示例中创建了fi和gi变量,并在列上分配了变量,但在第二个示例中没有这样做。当我在两者中分配一个变量时,结果是相似的。
import pandas as pd,numpy as np
# function with numpy arrays
def f_np(freq, asd):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
# function with dataframe
def f_df(df):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
freq = np.arange(20., 2000., .1)
asd = np.ones(len(freq))
df = pd.DataFrame({'Freq':freq, 'ASD':asd})
%timeit f_np(freq, asd)
%timeit f_df(df)
#562 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#569 ms ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)https://stackoverflow.com/questions/62157633
复制相似问题