首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在每个组上调用一个函数?

如何在每个组上调用一个函数?
EN

Stack Overflow用户
提问于 2021-03-23 21:58:46
回答 2查看 57关注 0票数 0

嗨,我有一个数据框,在其中一列中,我有一个标题列表。我编写了一个名为find_freq的函数来查找这些列表中元素的频率。但是我需要在每个组上调用这个函数。是否知道如何在每个组G1G2G3上调用该函数

数据:

代码语言:javascript
复制
   group                             skill
    G1                              [A,B,C,D]
    G1                              [B,C,K]
    G2                              [A,N,V,B]
    G3                              [B,H,A,D]
    G3                              [A,B,C]
    G3                              [B,C]

输出:

代码语言:javascript
复制
   skill               G1      G2        G3
    A                  1       2         1
    B                  2       2         2
    C                  2       1         2
    D                  1       1         0
    K                  1       0         0
    N                  0       1         0

真实数据:

代码语言:javascript
复制
{
 'skill_name': {0: "['Planning', 'Patient Care', 'Design', 'Digital Literacy', 'Business Planning', 'Data Mining', 'Statistical Analysis', 'Data Risk Analysis', 'SQL', 'Creativity', 'Interpersonal', 'Machine Learning', 'Data Science', 'Web Design', 'Business Analysis', 'WordPress', 'Business Strategy', 'Statistics', 'Big Data', 'Legal Documentation', 'Business Data Analytics', 'Business Intelligence', 'Problem Solving', 'Customer Analysis', 'Business Process Analysis', 'Market Analysis', None, 'Online Sales Management', 'Legal Writing', 'Customer Needs Analysis', 'Predictive Modelling', 'Marketing Data Analytics', 'Product Marketing', None]",
  1: "['Financial Management', 'Leadership', 'Reporting', 'Design', 'Statistical Analysis', 'Data Science', 'Business Analysis', 'Data Processing', 'Problem Solving', 'Business Operations', 'Business Process Analysis', 'Executive Management', 'Process Design', 'Framework Design', None, None, None]",
  2: "['Hiring', 'Training', 'Teaching', 'Data Entry', 'Mathematical Modelling', 'Work Collaboratively', None, 'SQL', 'Python', 'Analytical skills', 'Machine Learning', 'Data Science', 'Online Teaching', 'Responsibility', 'Data Research', 'Research', 'Friendly']"},
 'dates': {0: Period('2019-01', 'M'),
  1: Period('2019-01', 'M'),
  2: Period('2019-01', 'M')}}
In [59]:
EN

回答 2

Stack Overflow用户

发布于 2021-03-23 22:02:55

crosstab中使用DataFrame.explode

代码语言:javascript
复制
from pandas import  Period
    
d = {
 'skill_name': {0: "['Planning', 'Patient Care', 'Design', 'Digital Literacy', 'Business Planning', 'Data Mining', 'Statistical Analysis', 'Data Risk Analysis', 'SQL', 'Creativity', 'Interpersonal', 'Machine Learning', 'Data Science', 'Web Design', 'Business Analysis', 'WordPress', 'Business Strategy', 'Statistics', 'Big Data', 'Legal Documentation', 'Business Data Analytics', 'Business Intelligence', 'Problem Solving', 'Customer Analysis', 'Business Process Analysis', 'Market Analysis', None, 'Online Sales Management', 'Legal Writing', 'Customer Needs Analysis', 'Predictive Modelling', 'Marketing Data Analytics', 'Product Marketing', None]",
  1: "['Financial Management', 'Leadership', 'Reporting', 'Design', 'Statistical Analysis', 'Data Science', 'Business Analysis', 'Data Processing', 'Problem Solving', 'Business Operations', 'Business Process Analysis', 'Executive Management', 'Process Design', 'Framework Design', None, None, None]",
  2: "['Hiring', 'Training', 'Teaching', 'Data Entry', 'Mathematical Modelling', 'Work Collaboratively', None, 'SQL', 'Python', 'Analytical skills', 'Machine Learning', 'Data Science', 'Online Teaching', 'Responsibility', 'Data Research', 'Research', 'Friendly']"},
 'dates': {0: Period('2019-01', 'M'),
  1: Period('2019-01', 'M'),
  2: Period('2019-01', 'M')}}

代码语言:javascript
复制
data = pd.DataFrame(d)

import ast
df = (data.assign(skill_name=data['skill_name'].astype(str)
                                               .apply(ast.literal_eval))
          .explode('skill_name'))

print (df.tail(12))
             skill_name    dates
2  Work Collaboratively  2019-01
2                  None  2019-01
2                   SQL  2019-01
2                Python  2019-01
2     Analytical skills  2019-01
2      Machine Learning  2019-01
2          Data Science  2019-01
2       Online Teaching  2019-01
2        Responsibility  2019-01
2         Data Research  2019-01
2              Research  2019-01
2              Friendly  2019-01

代码语言:javascript
复制
df1 = pd.crosstab(df['skill_name'], df['dates'])
print (df1.head(10))
dates                      2019-01
skill_name                        
Analytical skills                1
Big Data                         1
Business Analysis                2
Business Data Analytics          1
Business Intelligence            1
Business Operations              1
Business Planning                1
Business Process Analysis        2
Business Strategy                1
Creativity                       1

打印(df1)

代码语言:javascript
复制
print (df1)
group  G1  G2  G3
skill            
A       1   1   2
B       2   1   3
C       2   0   2
D       1   0   1
H       0   0   1
K       1   0   0
N       0   1   0
V       0   1   0

最后一些数据清理:

代码语言:javascript
复制
df = df.rename_axis(None, axis=1).reset_index()
票数 2
EN

Stack Overflow用户

发布于 2021-03-23 22:01:36

您可以尝试:

代码语言:javascript
复制
(data.explode('skill')
     .groupby('skill')
     ['group'].value_counts()
     .unstack('group', fill_value=0)
)

每个comment skill列的注释是字符串类型。尝试:

代码语言:javascript
复制
(data.assign(skill=data['skill'].str[1:-1].str.split(','))
     .explode('skill')
     .groupby('skill')
     ['group'].value_counts()
     .unstack('group', fill_value=0)
)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66764656

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档