文章/答案/技术大牛

发布

问使用nltk在python中提取块
EN

Stack Overflow用户

提问于 2012-01-26 00:00:58

回答 1查看 451关注 0票数 0

假设我有一个带标签的语料库(像棕色语料库)，我想提取只有'/nn‘标签的单词。例如：

            Daniel/np termed/vbd ``/`` extremely/rb conservative/jj ''/'' his/pp$    estimate/nn.....

这是标记为“棕色”的语料库的一部分。我要做的是提取单词，如- estimate (因为它用/nn标记)，并将它们添加到列表中。但我发现的大多数例子通常都是关于标注语料库的。看到这些例子我真的很困惑。有没有人可以帮助我，提供一个从标记语料库中提取单词的示例或教程。

提前谢谢。

python

nlp

nltk

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-01-26 00:24:53

请参阅：http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html

>>> sent = '''
... The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN
... other/AP topics/NNS ,/, AMONG/IN them/PPO the/AT Atlanta/NP and/CC
... Fulton/NP-tl County/NN-tl purchasing/VBG departments/NNS which/WDT it/PPS
... said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB generally/RB
... accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT
... interest/NN of/IN both/ABX governments/NNS ''/'' ./.
... '''
>>> [nltk.tag.str2tuple(t) for t in sent.split()]
[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'), ('commented', 'VBD'),
('on', 'IN'), ('a', 'AT'), ('number', 'NN'), ... ('.', '.')]

如果你只想用NN标记它们，你可以这样做：

>>> [nltk.tag.str2tuple(t) for t in sent.split() if t.split('/')[1] == 'NN']
[('jury', 'NN'), ('number', 'NN'), ('interest', 'NN')]

编辑：

这里的sent是一个去掉省略号的字符串。

sent = """The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN other/AP topics/NNS ,/, AMONG/IN them/PPO the/AT Atlanta/NP and/CC Fulton/NP-tl County/NN-tl purchasing/VBG departments/NNS which/WDT it/PPS said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB generally/RB accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT interest/NN of/IN both/ABX governments/NNS ''/'' ./."""

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/9005805

复制

相似问题

问使用nltk在python中提取块
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用nltk在python中提取块EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用nltk在python中提取块
EN