文章/答案/技术大牛

发布

问加速数据迭代
EN

Stack Overflow用户

提问于 2014-12-15 14:17:30

回答 1查看 126关注 0票数 0

我尝试通过在名为mibian.BS的数据run上迭代来运行函数df1，并将值赋值给一个名为“隐含_Vola”的新列。怎样才能加快整个过程？处理有3行Mio的原始数据take将占用我的机器9000分钟，这是太过了。mibian.BS不接受vektor的输入。因此，它必须被应用于数据帧中的每一行。

import mibian
import numpy
import time
mask=(df1['ask'] > 0) & (df1['bid'] > 0) & (df1['call put'] == 'C') & (df1['Restlaufzeit']>0)

for index, row in df1.loc[mask].iterrows() :
try:
    c = mibian.BS([row['unadjusted stock price'],row['strike'], row['Zins'], row['Restlaufzeit']], callPrice=row['mean'])
    mask2=((df1.index==index) & (df1['unadjusted stock price']==row['unadjusted stock price']) &  (df1['strike']==row['strike']) &  (df1['Zins']==row['Zins']) &  (df1['Restlaufzeit']==row['Restlaufzeit']) & (df1['mean']==row['mean'] ))
    df1.loc[mask2, 'Implied_Vola'] = c.impliedVolatility
except ZeroDivisionError, e:
    df1.loc[mask2,'Implied_Vola'] = numpy.nan

end=time.time() time=(结束-开始)/60打印时间，“分钟”

df1.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2002-05-16 00:00:00 to 2002-05-16 00:00:00
Data columns (total 13 columns):
adjusted stock close price    2 non-null float64
expiration                    2 non-null datetime64[ns]
strike                        2 non-null int64
call put                      2 non-null object
ask                           2 non-null float64
bid                           2 non-null float64
volume                        2 non-null int64
open interest                 2 non-null int64
unadjusted stock price        2 non-null float64
Restlaufzeit                  2 non-null int32
Zins                          2 non-null float64
mean                          2 non-null float64
Implied_Vola                  2 non-null float64
dtypes: datetime64[ns](1), float64(7), int32(1), int64(3), object(1)
memory usage: 216.0+ bytes

我在没有dataframe.iterrows()的情况下重写了循环：

import mibian
import numpy
import time
df2=df1.copy()
start = time.time()
mask=(df2['ask'] > 0) & (df2['bid'] > 0) & (df2['call put'] == 'C') & (df2['Restlaufzeit']>0)
vola=[]
for row in df2.loc[mask].values:
    try:
        c = mibian.BS([row[8],row[2], row[10], row[9]], callPrice=row[11])
        vola.append(c.impliedVolatility)
    except  ZeroDivisionError, e:
        vola.append(numpy.nan)
df2.loc[mask,'vola'] = vola
end=time.time()
time=(end-start)/60
print time, 'minutes'

但是，没有加快速度。这种做法是否应该有所不同？

python-2.7

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-12-16 05:45:28

在ndarray上循环比使用df.iterrows()快得多。

而不是

for index, row in df1.loc[mask].iterrows() :
    # DO STUFF with row Series

试着使用

for index, row in enumerate(df1.loc[mask].values) :
    # DO STUFF with row tuple

您必须返回整数索引，但它要快得多。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/27486059

复制

相似问题

问加速数据迭代
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速数据迭代EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速数据迭代
EN