我在试着计算一个“特殊的”移动平均值。我尝试的代码基于TradeStation EasyLanguage,并计算凯特纳波段,但有一点不同。代码的核心计算价格的平均真实范围,如下所示:
sum = sum + TrueRange
if (CurrentBar >= 20) then
MAverage = sum/20
sum = sum * (19/20)
else
MAverage = sum我可以用下面的python代码复制这段代码;但是,执行时间是天文数字。
# MAverage
tr = df['TR']
df['trsum'] = float(0)
trsum = df['trsum']
df['Avg Range'] = float(0)
ma = df['Avg Range']
trsum[1] = tr[1]
for ii in range(2,samples):
trsum[ii] = trsum[ii-1] + tr[ii]
if ii > 19:
ma[ii] = trsum[ii]/20
trsum[ii] = trsum[ii] * 19/20`我也尝试使用简单的旧ewa (指数移动平均),但数字比我想要的要远一点。
任何帮助都将不胜感激。
发布于 2020-06-24 03:57:57
使用Numpy和Numba
正如你已经发现的,当我们迭代行的时候,Pandas并不是很快。我们应该使用像df2'trsum‘=df2’TR‘.umsum()这样的方法。不幸的是,我也找不到使用Pandas的快速方法,所以我只使用了Numpy。我还尝试了Numba来加快执行速度。
下面的代码有3个函数:
sma(df): # This is the code from the question
sma_numpy(df): # This converts the Dataframe to a Numpy Array
sma_numba(df): # This converts the Dataframe to a Numpy Array and uses Numba to JIT compile the function计时结果
def sma() Pandas: 35.831744300000004s for 100000 rows
def sma_numpy() Numpy: 2.0248809000000065s for 1000000 rows
def sma_numba() Numpy + Numba: 0.05904679999999729s for 1000000 rows如你所见,Numba函数要快6000倍!我只能运行100000行的Pandas版本。
import numpy as np
import pandas as pd
import timeit
from numba import jit
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,100,size=(1000000, 1)), columns=['TR'])
def sma(df):
# code copied from the question
samples = len(df)
# MAverage
tr = df['TR']
df['trsum'] = float(0)
trsum = df['trsum']
df['Avg Range'] = float(0)
ma = df['Avg Range']
trsum[1] = tr[1]
for ii in range(2, samples):
trsum[ii] = trsum[ii - 1] + tr[ii]
if ii > 19:
ma[ii] = trsum[ii] / 20
trsum[ii] = trsum[ii] * 19 / 20
return df
def sma_numpy(df):
tr = 0
trsum = 1
ma = 2
samples = len(df)
df['trsum'] = float(0)
df['Avg Range'] = float(0)
npa = df.to_numpy()
npa[1,1] = npa[1,0]
for ii in range(2, samples):
npa[ii,trsum] = npa[ii-1,trsum] + npa[ii,tr]
if ii > 19:
npa[ii,ma] = npa[ii,trsum] / 20
npa[ii, trsum] *= 19 / 20
return pd.DataFrame(data=npa, columns=df.columns)
@jit(nopython=True)
def sma_numba_loop(npa):
tr = 0
trsum = 1
ma = 2
samples = len(npa)
for ii in range(2, samples):
npa[ii, trsum] = npa[ii - 1, trsum] + npa[ii, tr]
if ii > 19:
npa[ii, ma] = npa[ii, trsum] / 20
npa[ii, trsum] *= 19 / 20
def sma_numba(df):
df['trsum'] = float(0)
df['Avg Range'] = float(0)
npa = df.to_numpy()
npa[1, 1] = npa[1, 0]
sma_numba_loop(npa)
return pd.DataFrame(data=npa, columns=df.columns)
df_small = df[0:100_000].copy()
print(sma_numba(df[0:30].copy())) # JIT compile to save time
print("def sma() Pandas: ", timeit.Timer(lambda: sma(df_small.copy())).timeit(number=1), f's for {len(df_small)} rows', sep='')
print("def sma_numpy() Numpy: ", timeit.Timer(lambda: sma_numpy(df.copy())).timeit(number=1), f's for {len(df)} rows', sep='')
print("def sma_numba() Numpy + Numba: ", timeit.Timer(lambda: sma_numba(df.copy())).timeit(number=1), f's for {len(df)} rows', sep='')
'''
Check a sample to make sure they all return the same values
print(sma(df_small.copy())[10000:10010])
print(sma_numpy(df.copy())[10000:10010])
print(sma_numba(df.copy())[10000:10010])
'''https://stackoverflow.com/questions/62529197
复制相似问题