我从NLTK图书馆进口了所有的书籍,我只是想弄清楚如何定义语料库,然后再打印句子。
例如,如果我想打印文本3的第1句,那么文本4的第2句。
import nltk
from nltk.book import *
print(???)
print(???)我尝试了以下几种组合,但这些组合不起作用:
print(text3.sent1)
print(text4.sent2)
print(sent1.text3)
print(sent2.text4)
print(text3(sent1))
print(text4(sent2))我对python很陌生,所以这可能是一个基本的问题,但我似乎无法在其他地方找到解决办法。
非常感谢,提前!
发布于 2017-06-30 03:15:52
你需要先把课文分成几个句子。
如果你已经有了text3和text4
from nltk.tokenize import sent_tokenize
sents = sent_tokenize(text3)
print(sents[0]) # the first sentence in the list is at position 0
sents = sent_tokenize(text4)
print(sents[1]) # the second sentence in the list is at position 1
print(text3[0]) # prints the first word of text3您似乎同时需要一个NLTK教程和一个python教程。幸运的是,NLTK书两者都是。
发布于 2017-06-30 10:25:26
简单的例子如下:
from nltk.tokenize import sent_tokenize
# List of sentences
sentences = "This is first sentence. This is second sentence. Let's try to tokenize the sentences. how are you? I am doing good"
# define function
def sentence_tokenizer(sentences):
sentence_tokenize_list = sent_tokenize(sentences)
print "tokenized sentences are = ", sentence_tokenize_list
return sentence_tokenize_list
# call function
tokenized_sentences = sentence_tokenizer(sentences)
# print first sentence
print tokenized_sentences[0]希望这能有所帮助。
https://stackoverflow.com/questions/44837638
复制相似问题