我尝试通过在名为mibian.BS的数据run上迭代来运行函数df1,并将值赋值给一个名为“隐含_Vola”的新列。怎样才能加快整个过程?处理有3行Mio的原始数据take将占用我的机器9000分钟,这是太过了。mibian.BS不接受vektor的输入。因此,它必须被应用于数据帧中的每一行。
import mibian
import numpy
import time
mask=(df1['ask'] > 0) & (df1['bid'] > 0) & (df1['call put'] == 'C') & (df1['Restlaufzeit']>0)
for index, row in df1.loc[mask].iterrows() :
try:
c = mibian.BS([row['unadjusted stock price'],row['strike'], row['Zins'], row['Restlaufzeit']], callPrice=row['mean'])
mask2=((df1.index==index) & (df1['unadjusted stock price']==row['unadjusted stock price']) & (df1['strike']==row['strike']) & (df1['Zins']==row['Zins']) & (df1['Restlaufzeit']==row['Restlaufzeit']) & (df1['mean']==row['mean'] ))
df1.loc[mask2, 'Implied_Vola'] = c.impliedVolatility
except ZeroDivisionError, e:
df1.loc[mask2,'Implied_Vola'] = numpy.nanend=time.time() time=(结束-开始)/60打印时间,“分钟”
df1.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2002-05-16 00:00:00 to 2002-05-16 00:00:00
Data columns (total 13 columns):
adjusted stock close price 2 non-null float64
expiration 2 non-null datetime64[ns]
strike 2 non-null int64
call put 2 non-null object
ask 2 non-null float64
bid 2 non-null float64
volume 2 non-null int64
open interest 2 non-null int64
unadjusted stock price 2 non-null float64
Restlaufzeit 2 non-null int32
Zins 2 non-null float64
mean 2 non-null float64
Implied_Vola 2 non-null float64
dtypes: datetime64[ns](1), float64(7), int32(1), int64(3), object(1)
memory usage: 216.0+ bytes我在没有dataframe.iterrows()的情况下重写了循环:
import mibian
import numpy
import time
df2=df1.copy()
start = time.time()
mask=(df2['ask'] > 0) & (df2['bid'] > 0) & (df2['call put'] == 'C') & (df2['Restlaufzeit']>0)
vola=[]
for row in df2.loc[mask].values:
try:
c = mibian.BS([row[8],row[2], row[10], row[9]], callPrice=row[11])
vola.append(c.impliedVolatility)
except ZeroDivisionError, e:
vola.append(numpy.nan)
df2.loc[mask,'vola'] = vola
end=time.time()
time=(end-start)/60
print time, 'minutes'但是,没有加快速度。这种做法是否应该有所不同?
发布于 2014-12-16 05:45:28
在ndarray上循环比使用df.iterrows()快得多。
而不是
for index, row in df1.loc[mask].iterrows() :
# DO STUFF with row Series试着使用
for index, row in enumerate(df1.loc[mask].values) :
# DO STUFF with row tuple您必须返回整数索引,但它要快得多。
https://stackoverflow.com/questions/27486059
复制相似问题