文章/答案/技术大牛

发布

社区首页 >问答首页 >避免pandas数据帧中循环的替代方法

问避免pandas数据帧中循环的替代方法
EN

Stack Overflow用户

提问于 2017-04-28 05:38:05

回答 1查看 169关注 0票数 0

我有以下数据帧：

table2 = pd.DataFrame({
        'Product Type': ['A', 'B', 'C', 'D'],
        'State_1_Value': [10, 11, 12, 13],
    'State_2_Value': [20, 21, 22, 23],
    'State_3_Value': [30, 31, 32, 33],
    'State_4_Value': [40, 41, 42, 43],
    'State_5_Value': [50, 51, 52, 53],
    'State_6_Value': [60, 61, 62, 63],
    'Lower_Bound': [-1, 1, .5, 5],
    'Upper_Bound': [1, 2, .625, 15],
    'sim_1': [0, 0, .61, 7],
    'sim_2': [1, 1.5, .7, 9],
    })

>>> table2
   Lower_Bound Product Type  State_1_Value  State_2_Value  State_3_Value  \
0         -1.0            A             10             20             30   
1          1.0            B             11             21             31   
2          0.5            C             12             22             32   
3          5.0            D             13             23             33   

   State_4_Value  State_5_Value  State_6_Value  Upper_Bound  sim_1  sim_2  
0             40             50             60        1.000    0.0    1.0  
1             41             51             61        2.000    0.0    1.5  
2             42             52             62        0.625    0.61    0.7  
3             43             53             63       15.000    7.0    9.0

我写了下面的代码来生成一个新的DataFrame，每个“sim”的输出都经过修改。

for i in range(1,3):
    table2['Bucket%s'%i] = 5 * (table2['sim_%s'%i] - table2['Lower_Bound']) / (table2['Upper_Bound'] - table2['Lower_Bound']) + 1
    table2['lv'] = table2['Bucket%s'%i].map(int)
    table2['hv'] = table2['Bucket%s'%i].map(int) + 1
    table2.ix[table2['lv'] < 1 , 'lv'] = 1
    table2.ix[table2['lv'] > 5 , 'lv'] = 5
    table2.ix[table2['hv'] > 6 , 'hv'] = 6
    table2.ix[table2['hv'] < 2 , 'hv'] = 2
    table2['nLower'] = table2.apply(lambda row: row['State_%s_Value'%row['lv']],axis=1)
    table2['nHigher'] = table2.apply(lambda row: row['State_%s_Value'%row['hv']],axis=1)
    table2['Final_value_%s'%i] = (table2['nHigher'] - table2['nLower'])*(table2['Bucket%s'%i]-table2['lv']) + table2['nLower']
df = table2.filter(regex="sim|Type")

输出：

>>> df
  Product Type  sim_1  sim_2
0            A   35.0   60.0
1            B  -39.0   36.0
2            C   56.0   92.0
3            D   23.0   33.0

我想在10,000个sims上运行这个程序，目前每个循环大约需要.25秒。有没有办法修改这段代码，以避免循环并提高时间效率？

编辑:如果你对这段代码想要完成什么感到好奇，你可以在这里看到我自己回答的有点混乱的问题：Pandas DataFrame: Complex linear interpolation

python

performance

python-2.7

loops

pandas

回答 1

Stack Overflow用户

发布于 2017-04-29 06:00:31

使用以下代码，我可以在没有循环的情况下完成此操作：

因此，在我的10kx200工作台上，它在3分钟内运行，而不是之前的2小时。

不幸的是，现在我需要在10kx4k的表上运行它，我在这个问题上点击了MemoryError，但这可能超出了这个问题的范围。

df= pd.DataFrame({
            'Product Type': ['A', 'B', 'C', 'D'],
            'State_1_Value': [10, 11, 12, 13],
        'State_2_Value': [20, 21, 22, 23],
        'State_3_Value': [30, 31, 32, 33],
        'State_4_Value': [40, 41, 42, 43],
        'State_5_Value': [50, 51, 52, 53],
        'State_6_Value': [60, 61, 62, 63],
        'Lower_Bound': [-1, 1, .5, 5],
        'Upper_Bound': [1, 2, .625, 15],
        'sim_1': [0, 0, .61, 7],
        'sim_2': [1, 1.5, .7, 9],
        })


buckets = df.ix[:,-2:].sub(df['Lower_Bound'],axis=0).div(df['Upper_Bound'].sub(df['Lower_Bound'],axis=0),axis=0) * 5 + 1
low = buckets.applymap(int)
high = buckets.applymap(int) + 1
low = low.applymap(lambda x: 1 if x < 1 else x)
low = low.applymap(lambda x: 5 if x > 5 else x)
high = high.applymap(lambda x: 6 if x > 6 else x)
high = high.applymap(lambda x: 2 if x < 2 else x)
low_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(low.shape[0])[:,None], low])
high_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(high.shape[0])[:,None], high])
df1 = (high_value - low_value).mul((buckets - low).values) + low_value
df1['Product Type'] = df['Product Type']

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43668346

复制

相似问题

问避免pandas数据帧中循环的替代方法
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问避免pandas数据帧中循环的替代方法EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问避免pandas数据帧中循环的替代方法
EN