嗨,我有一个数据框,在其中一列中,我有一个标题列表。我编写了一个名为find_freq的函数来查找这些列表中元素的频率。但是我需要在每个组上调用这个函数。是否知道如何在每个组G1、G2和G3上调用该函数
数据:
group skill
G1 [A,B,C,D]
G1 [B,C,K]
G2 [A,N,V,B]
G3 [B,H,A,D]
G3 [A,B,C]
G3 [B,C]输出:
skill G1 G2 G3
A 1 2 1
B 2 2 2
C 2 1 2
D 1 1 0
K 1 0 0
N 0 1 0真实数据:
{
'skill_name': {0: "['Planning', 'Patient Care', 'Design', 'Digital Literacy', 'Business Planning', 'Data Mining', 'Statistical Analysis', 'Data Risk Analysis', 'SQL', 'Creativity', 'Interpersonal', 'Machine Learning', 'Data Science', 'Web Design', 'Business Analysis', 'WordPress', 'Business Strategy', 'Statistics', 'Big Data', 'Legal Documentation', 'Business Data Analytics', 'Business Intelligence', 'Problem Solving', 'Customer Analysis', 'Business Process Analysis', 'Market Analysis', None, 'Online Sales Management', 'Legal Writing', 'Customer Needs Analysis', 'Predictive Modelling', 'Marketing Data Analytics', 'Product Marketing', None]",
1: "['Financial Management', 'Leadership', 'Reporting', 'Design', 'Statistical Analysis', 'Data Science', 'Business Analysis', 'Data Processing', 'Problem Solving', 'Business Operations', 'Business Process Analysis', 'Executive Management', 'Process Design', 'Framework Design', None, None, None]",
2: "['Hiring', 'Training', 'Teaching', 'Data Entry', 'Mathematical Modelling', 'Work Collaboratively', None, 'SQL', 'Python', 'Analytical skills', 'Machine Learning', 'Data Science', 'Online Teaching', 'Responsibility', 'Data Research', 'Research', 'Friendly']"},
'dates': {0: Period('2019-01', 'M'),
1: Period('2019-01', 'M'),
2: Period('2019-01', 'M')}}
In [59]:发布于 2021-03-23 22:02:55
在crosstab中使用DataFrame.explode
from pandas import Period
d = {
'skill_name': {0: "['Planning', 'Patient Care', 'Design', 'Digital Literacy', 'Business Planning', 'Data Mining', 'Statistical Analysis', 'Data Risk Analysis', 'SQL', 'Creativity', 'Interpersonal', 'Machine Learning', 'Data Science', 'Web Design', 'Business Analysis', 'WordPress', 'Business Strategy', 'Statistics', 'Big Data', 'Legal Documentation', 'Business Data Analytics', 'Business Intelligence', 'Problem Solving', 'Customer Analysis', 'Business Process Analysis', 'Market Analysis', None, 'Online Sales Management', 'Legal Writing', 'Customer Needs Analysis', 'Predictive Modelling', 'Marketing Data Analytics', 'Product Marketing', None]",
1: "['Financial Management', 'Leadership', 'Reporting', 'Design', 'Statistical Analysis', 'Data Science', 'Business Analysis', 'Data Processing', 'Problem Solving', 'Business Operations', 'Business Process Analysis', 'Executive Management', 'Process Design', 'Framework Design', None, None, None]",
2: "['Hiring', 'Training', 'Teaching', 'Data Entry', 'Mathematical Modelling', 'Work Collaboratively', None, 'SQL', 'Python', 'Analytical skills', 'Machine Learning', 'Data Science', 'Online Teaching', 'Responsibility', 'Data Research', 'Research', 'Friendly']"},
'dates': {0: Period('2019-01', 'M'),
1: Period('2019-01', 'M'),
2: Period('2019-01', 'M')}}data = pd.DataFrame(d)
import ast
df = (data.assign(skill_name=data['skill_name'].astype(str)
.apply(ast.literal_eval))
.explode('skill_name'))
print (df.tail(12))
skill_name dates
2 Work Collaboratively 2019-01
2 None 2019-01
2 SQL 2019-01
2 Python 2019-01
2 Analytical skills 2019-01
2 Machine Learning 2019-01
2 Data Science 2019-01
2 Online Teaching 2019-01
2 Responsibility 2019-01
2 Data Research 2019-01
2 Research 2019-01
2 Friendly 2019-01df1 = pd.crosstab(df['skill_name'], df['dates'])
print (df1.head(10))
dates 2019-01
skill_name
Analytical skills 1
Big Data 1
Business Analysis 2
Business Data Analytics 1
Business Intelligence 1
Business Operations 1
Business Planning 1
Business Process Analysis 2
Business Strategy 1
Creativity 1打印(df1)
print (df1)
group G1 G2 G3
skill
A 1 1 2
B 2 1 3
C 2 0 2
D 1 0 1
H 0 0 1
K 1 0 0
N 0 1 0
V 0 1 0最后一些数据清理:
df = df.rename_axis(None, axis=1).reset_index()发布于 2021-03-23 22:01:36
您可以尝试:
(data.explode('skill')
.groupby('skill')
['group'].value_counts()
.unstack('group', fill_value=0)
)每个comment skill列的注释是字符串类型。尝试:
(data.assign(skill=data['skill'].str[1:-1].str.split(','))
.explode('skill')
.groupby('skill')
['group'].value_counts()
.unstack('group', fill_value=0)
)https://stackoverflow.com/questions/66764656
复制相似问题