文章/答案/技术大牛

发布

社区首页 >问答首页 >如何从python中的文本中提取关键字？

问如何从python中的文本中提取关键字？
EN

Stack Overflow用户

提问于 2021-09-08 13:09:25

回答 3查看 1.2K关注 0票数 1

我想从文本和打印中提取一些关键词，但是怎么做呢？

这是我想从中提取的样本文本。

text = "Merhaba bugun bir miktar bas agrisi var, genellikle sonbahar gunlerinde baslayan bu bas agrisi insanin canini sikmakta. Bu durumdan kurtulmak icin neler yapmali."

这是从文本中提取的关键字示例。

keywords = ('bas agrisi', 'kurtulmak')

我想要检测这些关键词并打印类似；

bas agrisi
kurtulmak

我怎么能在蟒蛇身上做到这一点呢？

python

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-09-08 13:12:29

试试这个：

string = "Merhaba bugun bir miktar bas agrisi var, genellikle sonbahar gunlerinde baslayan bu bas agrisi insanin canini sikmakta. Bu durumdan kurtulmak icin neler yapmali."

keywords = ('bas agrisi', 'kurtulmak')

print(*[key for key in keywords if key in string], sep='\n')

输出：

bas agrisi
kurtulmak

票数 2

Stack Overflow用户

发布于 2021-09-08 13:18:50

使用re库查找所有可能的关键字。

import re

text = "Merhaba bugun bir miktar bas agrisi var, genellikle sonbahar gunlerinde baslayan bu bas agrisi insanin canini sikmakta. Bu durumdan kurtulmak icin neler yapmali."
keywords = ('bas agrisi', 'kurtulmak')

result = re.findall('|'.join(keywords), text)
for key in result:
    print(key)

bas agrisi
bas agrisi
kurtulmak

票数 1

Stack Overflow用户

发布于 2021-09-08 14:28:03

您希望python理解关键字，还是希望在特定文本中将单词视为标记？因为对于第一个问题，您可能需要建立一个机器学习机制或神经网络来理解并从文本中提取关键字。但是对于第二个步骤，您可以使用一个非常简单的步骤来标记单词。

例如,

 import nltk    #need to download necessary dictionaries 
 nltk.download('punkt')
 nltk.download('stopwords')
 nltk.download('wordnet')
 text = "I wonder if I have been changed in the night. Let me think. Was 
 I the same when I got up this morning? I almost can remember feeling a 
 little different. But if I am not the same, the next question is 'Who 
 in the world am I?' Ah, that is the great puzzle!"  # This is an 
 #example of a text
 tokens = nltk.word_tokenize(text)
 tokens  #punctuations did not removed and conceived as part of the word
 #Output will look like the following;
 ['I',
  'wonder',
  'if',
  'I',
  'have',
  'been',
  'changed',
  'in',
  'the',
  'night',
  '.',
 'Let',
  'me',
  'think',
  '.',
  'Was',
  'I',
  'the',
  'same',
  'when',
  'I',....]
  #As first, you can clean the text by lowering the letters
  tokens2 = [ word.lower() for word in tokens if word.isalpha()]
  #Second, you can remove stops words in the text. There are different 
  #libraries available for various languages but admittedly English is 
  #the best library
  from nltk.corpus import stopwords
  stop_words = stopwords.words("english")
  #You can filter the text from stop words by filtering the previously 
  #created tokens2
  tokens3 = [word for word in tokens2 if word not in stop_words] #word 
  #for word named as list comprehension
  #Tokenization is a pre-set up for the lemmatization which is a way to  
  eliminate repeating words and comprehend the stems of the words
  # lemmatization
  from nltk.stem import WordNetLemmatizer 
  lemmatizer = WordNetLemmatizer()
  lemmatizer.lemmatize('stripes', pos= 'v') # n is for noun v is for 
  #verb
  print(lemmatizer.lemmatize("stripes", 'n'))
  #output is stripe because the stem of the word stripes is stripe
  # The following is an example for using stemming
  from nltk.stem import PorterStemmer 
  stemmer = PorterStemmer()
  [stemmer.stem(word) for word in tokens3]
  #output will be 
  ['wonder',
   'chang',
   'night',
   'let',
   'think',
   'got',
   'morn',
   'almost',
   'rememb',
   'feel',
   'littl',
   'differ',
   'next',
   'question',
   'world',
   'ah',
   'great',
   'puzzl'] # From the text, stop words were eliminated. Such as I, 
    #have, been and etc. Also stems of the words retrieved.
    #One last thing to see how lemmatizer works         
    tokens4 = [lemmatizer.lemmatize(word, pos='n') for word in tokens3]
    tokens4 = [lemmatizer.lemmatize(word, pos='v') for word in tokens4]
    print(tokens4)
    #Output will be
    ['wonder', 'change', 'night', 'let', 'think', 'get', 'morning', 
    'almost', 'remember', 'feel', 'little', 'different', 'next', 
    'question', 'world', 'ah', 'great', 'puzzle']

我希望我能解释清楚。此外，如果您想继续前进，并创建一个神经网络或类似的机制，您可以使用一个热编码。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69103712

复制

相似问题

问如何从python中的文本中提取关键字？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从python中的文本中提取关键字？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从python中的文本中提取关键字？
EN