问我想从简历中提取特定的部分，简历。
EN

Stack Overflow用户

提问于 2021-06-12 11:39:19

回答 1查看 416关注 0票数 0

我想从简历或CV...like教育、经验中提取一个特定的章节。我这样做了，但当教育或其他部分最后写在简历上时，它就行不通了。

` def extract_experience(ex_cl):   #create function of experience
     doc= fitz.open(ex_cl)   #open pdf file
     text=""             #crate string
     for page in doc:
     text= text + str(page.getText())  #conver pdf text into string
     words= nltk.word_tokenize(text)  #convert all text of CV into words

     start = 0
     end= 0

     #manually create [exp_list] which contain all CVs titles are possibel [not including 
     experience word](lan= german and English)

     exp_list=["FÄHIGKEITEN","KENNTNISSE","AUSBILDUNG","Ausbildung", "BILDUNG", "Bildung", 
                "Hobbies","HOBBIES","Personliche","Fahigkeiten",
                "Kenntnisse","Ehrenamtliches","Engagement",
                 "Sprachen","SPRACHEN","EHRENAMTLICHES",
                  "ENGAGEEMENT","EDUCATION" ,"Education","Hochschul",
                    "HOCHSCHUL","Studium","STUDIUM","Sprachkurse","Computerkenntnisse",
                  "SPRACHEN","SPRACHKURSE","COMPUTERKENNTNISSE", 
           "AWARDS","Awards","PERSONAL","Personal","Information", "INFORMATION",
           "SKILLS","Skills","SKILL","Skill",'Soziales']

     #manually create  [exp] list which contain experience title and also synonym words of 
     experiance word


     exp=['Erfahrung' ,'Laufbahn','ERFAHRUNG' ," Erfahrungen" ,'LAUFBAHN','Praktische',
                                                           
          'PRAKTISCHE','ERFAHRUNGEN','Praktika','PRAKTIKA' ,
         'Berufserfahrung' ,'EXPERIENCE','Experience' ,'BERÜFSERFAHRUNG','Berufserfahrung']

     for vari in words:        # Match experience word or synonym word from CV and manually 
                                                         created list[exp]  
        if vari in exp:         # if match then find index of that word
          st=words.index(vari)
          start= st+1           #(st+1)for take next word  
                            # get index of experience word of CV
          i = start             #give another variable(i)
     for j in words:                          #create for loop
        if words[i]  not in exp_list:   #if  start index is not in [exp_list(without 
                                                             experience 
                                                                                       word)] 
           i += 1                        #then take next index untill it match the word 
                                                                       of[exp_list]
           end= start+(i-start)               # find end index 
      
 
    

      f_list=[]  #create list
      for item in words[start:end]: #give slicing for take start index and end index
         f_list.append(item)  #append into list
      stringlist = ' '.join(f_list )  #convert into string


      return stringlist

extract_experience('020.pdf')

python

nlp

回答 1

Stack Overflow用户

发布于 2022-02-08 11:56:08

您可以首先使用Apache从pdf中提取文本。它有助于正确整理课文。但是要提取这些部分需要一些脏代码。

我会尝试基于多个换行符(/n/n或更多)提取节。一种更脏的方法是根据启发式方法(如

创建可能的标题节列表
遍历文本
如果标题部分：
- 开始从该索引开始计算，直到它满足另一节的标题大写或大写(这有助于避免像技能与技能或技能这样的案例)，就像我提到的-肮脏。

一种可能更简洁的方法是在SpaCy中使用NER (命名实体识别)。但是，您必须创建一个CVs数据集，并手动标记每个部分。

请参见使用NER提取所需信息的这个回购

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67948506

复制

相似问题

问我想从简历中提取特定的部分，简历。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我想从简历中提取特定的部分，简历。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我想从简历中提取特定的部分，简历。
EN