我有一个大文件,里面有2001-2017年的分钟级货币价格数据。我想创建一个简单的最近邻实现,看看5分钟前、25分钟前和50分钟前的%价格变化是否有任何解释能力(我相信它不会,但这只是为了学习)。我创建'dif_X‘列表的方式花费了非常长的时间(比如5个小时)。我对python非常陌生,甚至不知道在哪里可以找到解决这个问题的方法,但我知道有一种方法可以让它在python中运行得更快。代码如下:
import numpy as np
import pandas as pd
def findNNDistances(df_):
samp = df_[10]
count = 0
df_['dist'] = [None]*len(df_)
while count < len(df_):
print("Count: " + str(count))
df_['dist'] = np.sqrt((samp['dif_5'] - df_['dif_5'][count])**2 +
(samp['dif_25'] - df_['dif_25'][count])**2 +
(samp['dif_50'] - df_['dif_50'][count])**2)
df = pd.read_csv("Downloads/AUDUSD/AUDUSD.txt") # this is a csv
df['dif_5'] = [None]*len(df)
df['dif_25'] = [None]*len(df)
df['dif_50'] = [None]*len(df)
df['index'] = [None]*len(df)
count = 99
while count < len(df) - 1:
print("countA: " + str(count))
df['dif_5'][count] = (df['close'][count] - df['close'][count - 5])/df['close'][count-5]
df['dif_25'][count] = (df['close'][count] - df['close'][count - 25])/df['close'][count-25]
df['dif_50'][count] = (df['close'][count] - df['close'][count - 50])/df['close'][count-50]
df['index'][count] = count - 99
count += 1
half_size = int(np.round(len(df)/2))
train = df[99:half_size] # not used yet
test = df[half_size + 1: len(df) - 1] # not used yet
df.apply(findNNDistances)
print(df['dist'].head(20))怎样才能让它运行得更快呢?我也很欣赏一些关于在python中让这样的东西运行得更快的一般技巧。谢谢。
https://stackoverflow.com/questions/44534082
复制相似问题