首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在python中对dataframe列内容使用应用函数/for循环

如何在python中对dataframe列内容使用应用函数/for循环
EN

Stack Overflow用户
提问于 2018-11-30 11:26:34
回答 2查看 1.8K关注 0票数 0

作为背景,我正在查看数据科学家职位和职位描述的数据集,我试图确定每个学位级别在这些职位描述中被引用了多少。

我能够让代码在一个特定的工作描述上工作,但现在我需要做一个"for循环“或等效的循环来遍历”描述列“,并累计计算每个教育级别被引用的次数。

代码语言:javascript
复制
sentence = set(data_scientist_filtered.description.iloc[30].split())
degree_level = {'level_1':{'bachelors','bachelor','ba'},
    'level_2':{'masters','ms','m.s',"master's",'master of science'},
    'level_3':{'phd','p.h.d'}}
results = {}
for key, words in degree_level.items():
    results[key] = len(words.intersection(sentence))
results

示例字符串应该是这样的: data_scientist_filtered.description.iloc30=

代码语言:javascript
复制
 'the team: the data science team is a newly formed applied research team within s&p global ratings that will be responsible for building and executing a bold vision around using machine learning, natural language processing, data science, knowledge engineering, and human computer interfaces for augmenting various business processes.\n\nthe impact: this role will have a significant impact on the success of our data science projects ranging from choosing which projects should be undertaken, to delivering highest quality solution, ultimately enabling our business processes and products with ai and data science solutions.\n\nwhat’s in it for you: this is a high visibility team with an opportunity to make a very meaningful impact on the future direction of the company. you will work with senior leaders in the organization to help define, build, and transform our business. you will work closely with other senior scientists to create state of the art augmented intelligence, data science and machine learning solutions.\n\nresponsibilities: as a data scientist you will be responsible for building ai and data science models. you will need to rapidly prototype various algorithmic implementations and test their efficacy using appropriate experimental design and hypothesis validation.\n\nbasic qualifications: bs in computer science, computational linguistics, artificial intelligence, statistics, or related field with 5+ years of relevant industry experience.\n\npreferred qualifications:\nms in computer science, statistics, computational linguistics, artificial intelligence or related field with 3+ years of relevant industry experience.\nexperience with financial data sets, or s&p’s credit ratings process is highly preferred.

示例数据帧:

代码语言:javascript
复制
 position       company       description             location
data scientist  Xpert Staffing  this job is for..      Atlanta, GA
data scientist  Cotiviti     great opportunity of..   Atlanta, GA
EN

回答 2

Stack Overflow用户

发布于 2018-11-30 11:59:52

我建议在这里使用isin()方法,然后获得总和。

代码语言:javascript
复制
data = [['John',"ba"],['Harry',"ms"],['Bill',"phd"],['Mary', 'bachelors']]
df = pd.DataFrame(data,columns=['name','description'])

degree_level = {
    'level_1':{'bachelors','bachelor','ba'},
    'level_2':{'masters','ms','m.s',"master's",'master of science'},
    'level_3':{'phd','p.h.d'}
}

results = {}
for level, values in degree_level:
    results[level] = data_scientist_filtered['description'].isin(values).sum()

print(results)
#{"level_1": 2, "level_2": 1, "level_3": 1}

编辑循环可以用一个理解来代替,仅供参考。

代码语言:javascript
复制
def num_of_degrees(degrees):
    return data_scientist_filtered['description'].isin(values).sum()

results = {level: num_of_degrees(values) for level, values in degree_level}

编辑2

在您展示了df的样子之后,现在我知道问题出在哪里了。您需要对df执行filter()操作,然后获取count()

代码语言:javascript
复制
#just cleaning some unnessecary values from degrees_level
degree_level = {
'level_1':{'bachelor',' ba '},
'level_2':{'masters',' ms ',' m.s ',"master's"},
'level_3':{'phd','p.h.d'}}

results = {}

for level, values in degree_level:
    results[level] = df.query(' or '.join((f"column_name.str.contains({value})" for value in values)), case=False, engine='python').count()

像这样的东西应该是可行的

票数 0
EN

Stack Overflow用户

发布于 2018-12-06 15:00:06

代码语言:javascript
复制
The simple way to do this breakup of text is by using n gram compare of text column by column. 
Create a list of position, company, location for possible values to be found.
Later compare the list column by column and save it in a data frame which can be combined lastly.

text1 = "Growing company located in the Atlanta, GA area is currently looking to add a Data Scientist to their team. The Data Scientist will analyze business level data to produce actionable insights utilizing analytics tools"

text2 = "Data scientist data analyst"

bigrams1 = ngrams(text1.lower().split(), n)  # For description 
bigrams2 = ngrams(text2.lower().split(), n)  # For position dictionary 

def compare(bigrams1, bigrams2):
    common=[]
    for grams in bigrams2:
       if grams in bigrams1:
         common.append(grams)
    return common

compare(bigrams1, bigrams2)

Output as 
compare(trigrams1,trigrams2)
Out[140]: [('data', 'scientist')]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53550800

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档