文章/答案/技术大牛

发布

社区首页 >问答首页 >递归公式在循环中很慢，有没有办法让这段代码运行得更快？

问递归公式在循环中很慢，有没有办法让这段代码运行得更快？
EN

Stack Overflow用户

提问于 2019-11-24 18:09:38

回答 1查看 186关注 0票数 4

我有以下数据集：

计算危险率的公式为：

For Year = 1: Hazard_rate(Year) = PD(Year)

For Year > 1: Hazard_rate(Year) = (PD(Year) + Hazard_rate(Year - 1) * (Year - 1)) / (Year)

假设:根据customer_ID，年份是单调的，并且严格>0

由于这个公式是递归的，并且需要前一年的风险率，下面的代码速度很慢，对于大型数据集变得难以管理，有没有一种方法可以向量化这个操作，或者至少让循环更快？

#Calculate the hazard rates
#Initialise an array to collect the hazard rate for each calculation, particularly useful for the recursive nature 
#of the formula
hr = []

#Loop through the dataframe, executing the hazard rate formula
    #If time_period (year) = 1 then the hazard rate is equal to the pd
for index, row in df.iterrows():
    if row["Year"] == 1:
        hr.append(row["PD"])
    elif row["Year"] > 1:
        #Create a row_num variable to indicate what the index is for each unique customer ID
        row_num = int(row["Year"])
        hr.append((row["PD"] + hr[row_num - 2] * (row["Year"] - 1)) / (row["Year"]))
    else:
        raise ValueError("Index contains negative or zero values")

#Attach the hazard_rates array to the dataframe
df["hazard_rate"] = hr

python

pandas

loops

for-loop

vectorization

回答 1

Stack Overflow用户

发布于 2019-11-24 19:21:18

此函数将计算第n个危险率

computed = {1: 0.05}
def func(n, computed = computed):
    '''
    Parameters:
        @n: int, year number
        @computed: dictionary with hazard rate already computed
    Returns:
        computed[n]: n-th hazard rate
    '''

    if n not in computed:
        computed[n] = (df.loc[n,'PD'] + func(n-1, computed)*(n-1))/n

    return computed[n]

现在让我们计算每一年的危险率：

df.set_index('year', inplace=True)
df['Hazard_rate'] = [func(i) for i in df.index]

请注意，该函数并不关心数据帧是否按year排序，但是我假定数据帧是按year索引的。

如果您想要恢复列，只需重置索引：

df.reset_index(inplace=True)

随着Customer_ID的引入，这个过程变得更加复杂：

#Function depends upon dataframe passed as argument
def func(df, n, computed):

    if n not in computed:
        computed[n] = (df.loc[n,'PD'] + func(n-1, computed)*(n-1))/n

    return computed[n]

#Set index
df.set_index('year', inplace=True)

#Initialize Hazard_rate column
df['Hazard_rate']=0

#Iterate over each customer
for c in df['Customer_ID']:

    #Create a customer mask
    c_mask = (df['Customer_ID'] == c)

    # Initialize computed dictionary for given customer
    c_computed = {1: df.loc[c_mask].loc[1,'PD']}

    df.loc[c_mask]['Hazard_rate'] = [func(df.loc[c_mask], i, c_computed ) for i in df.loc[c_mask].index]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59016561

复制

相似问题

问递归公式在循环中很慢，有没有办法让这段代码运行得更快？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问递归公式在循环中很慢，有没有办法让这段代码运行得更快？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问递归公式在循环中很慢，有没有办法让这段代码运行得更快？
EN