首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python -在value_counts上提高速度

Python -在value_counts上提高速度
EN

Stack Overflow用户
提问于 2021-07-08 03:15:08
回答 1查看 55关注 0票数 0

我正在争论一些原始数据,并希望计数链中某些度量的实例(列'Stat') (由‘chain_id’列中指出的唯一标识符'c‘命名),然后将其保存到一个dict中,然后映射到一个新列(下面没有显示)。

不过,我希望:

  1. 提高了循环的速度,我必须在34k行上使用10 it/s,从初始的~3。
  2. 改进了try/ not语句的结构,注意到每个链在value_counts()输出中并不总是有‘踢’或‘标记’等,所以需要0.

我在SOF上搜索了其他方法,但是没有一个现有的答案适合--请忽略for循环的缩进,它将不允许我更正它

代码语言:javascript
复制
import pandas as pd
from tqdm.notebook import tqdm

s = ['Hitout', 'Kick', 'Disposal', 'Centre Clearance', 'Tackle', 'Hitout', 
     'Hitout To Advantage', 'Free Against', 'Contested Possession', 'Free For', 
     'Handball', 'Disposal', 'Effective Disposal', 'Stoppage Clearance', 
     'Uncontested Possession', 'Kick', 'Effective Kick', 'Disposal', 'Effective Disposal', 
     'Mark', 'Uncontested Possession', 'F 50 Mark', 'Mark On Lead', 'Kick', 'Disposal', 
     'Shot At Goal', 'Behind', 'Kick In', 'One Percenter', 'Kick', 'Effective Kick', 
     'Disposal', 'Effective Disposal', 'Rebound 50', 'Spoil', 'One Percenter']

x = ['Hitout', 'RI-1', 'RI-1', 'RI-1', 'RI-1', 'Hitout', 'Hitout', 'RI-7', 'RI-7', 
     'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 
     'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'RI-7', 'CA-27', 'CA-27', 
     'CA-27', 'CA-27', 'CA-27', 'CA-27', 'CA-27', 'CA-27', 'CA-27']

df = pd.DataFrame({'chain_id':x,'Stat':s})
            
for c in tqdm(chains):
    if c == 'Hitout':
        chain_count[c] = 0
        hb_count[c] = 0
        ki_count[c] = 0
        m_count[c] = 0
        goal_count[c] = 0
        behind_count[c] = 0
        cp_count[c] = 0
        up_count[c] = 0
        t_count[c] = 0
        chain_time[c] = 0
    else:
        temp = df[df['chain_id']==c]['Stat'].value_counts()
        try:
            chain_count[c] = temp['Disposal']
        except:
            chain_count[c] = 0
        try:
            ki_count[c] = temp['Kick']
        except:
            ki_count[c] = 0
        try:
            hb_count[c] = temp['Handball']
        except:
            hb_count[c] = 0
        try:
            m_count[c] = temp['Mark']
        except:
            m_count[c] = 0
        try:
            goal_count[c] = temp['Goal']
        except:
            goal_count[c] = 0
        try:
            behind_count[c] = temp['Behind']
        except:
            behind_count[c] = 0 
        try:
            cp_count[c] = temp['Contested Possession']
        except:
            cp_count[c] = 0
        try:
            up_count[c] = temp['Uncontested Possession']
        except:
            up_count[c] = 0
        try:
            t_count[c] = temp['Tackle']
        except:
            t_count[c] = 0
        chain_time[c] = time(c)

df['chain_length'] = df['chain_id'].map(chain_count)
df['chain_hb'] = df['chain_id'].map(hb_count)
df['chain_ki'] = df['chain_id'].map(ki_count)
df['chain_m'] = df['chain_id'].map(m_count)
df['chain_goal'] = df['chain_id'].map(goal_count)
df['chain_behind'] = df['chain_id'].map(behind_count)
df['chain_cp'] = df['chain_id'].map(cp_count)
df['chain_up'] = df['chain_id'].map(up_count)
df['chain_t'] = df['chain_id'].map(t_count)
df['chain_time'] = df['chain_id'].map(chain_time)

编辑:包含一个示例,其输出结果如下

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-08 05:28:34

创建状态列,将处理、手球和其他行为隔离开来。

代码语言:javascript
复制
 df['status']=np.select([df['Stat'].eq('Handball'),df['Stat'].eq('Disposal')],['chain_length','chain_hb'],'Notimportant')

找出每一种行为发生的频率

代码语言:javascript
复制
s=df.join(df.groupby(['chain_id','Stat']).apply(lambda x: pd.get_dummies(x['status'])).fillna(0)).drop(columns=['status','Notimportant'])

使用转换求和和级联总计。

代码语言:javascript
复制
 s[['chain_hb','chain_length']]=s.groupby('chain_id')[['chain_hb','chain_length']].transform('sum')

结局

代码语言:javascript
复制
 chain_id                   Stat      chain_hb  chain_length
0    Hitout                  Hitout       0.0           0.0
1      RI-1                    Kick       1.0           0.0
2      RI-1                Disposal       1.0           0.0
3      RI-1        Centre Clearance       1.0           0.0
4      RI-1                  Tackle       1.0           0.0
5    Hitout                  Hitout       0.0           0.0
6    Hitout     Hitout To Advantage       0.0           0.0
7      RI-7            Free Against       3.0           1.0
8      RI-7    Contested Possession       3.0           1.0
9      RI-7                Free For       3.0           1.0
10     RI-7                Handball       3.0           1.0
11     RI-7                Disposal       3.0           1.0
12     RI-7      Effective Disposal       3.0           1.0
13     RI-7      Stoppage Clearance       3.0           1.0
14     RI-7  Uncontested Possession       3.0           1.0
15     RI-7                    Kick       3.0           1.0
16     RI-7          Effective Kick       3.0           1.0
17     RI-7                Disposal       3.0           1.0
18     RI-7      Effective Disposal       3.0           1.0
19     RI-7                    Mark       3.0           1.0
20     RI-7  Uncontested Possession       3.0           1.0
21     RI-7               F 50 Mark       3.0           1.0
22     RI-7            Mark On Lead       3.0           1.0
23     RI-7                    Kick       3.0           1.0
24     RI-7                Disposal       3.0           1.0
25     RI-7            Shot At Goal       3.0           1.0
26     RI-7                  Behind       3.0           1.0
27    CA-27                 Kick In       1.0           0.0
28    CA-27           One Percenter       1.0           0.0
29    CA-27                    Kick       1.0           0.0
30    CA-27          Effective Kick       1.0           0.0
31    CA-27                Disposal       1.0           0.0
32    CA-27      Effective Disposal       1.0           0.0
33    CA-27              Rebound 50       1.0           0.0
34    CA-27                   Spoil       1.0           0.0
35    CA-27           One Percenter       1.0           0.0
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68295163

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档