如何定义nltk.grammar.is_terminal()使用的语法?无论我在哪个对象上强制转换此方法,我都会得到一个true作为返回值。但是,我想要检查一个名为wordlist的列表是否包含在grammar.cfg下使用上下文无关文法定义的结果。
发布于 2014-11-14 06:00:31
查看https://github.com/nltk/nltk/blob/develop/nltk/grammar.py上的代码
def is_nonterminal(item):
"""
:return: True if the item is a ``Nonterminal``.
:rtype: bool
"""
return isinstance(item, Nonterminal)
def is_terminal(item):
"""
Return True if the item is a terminal, which currently is
if it is hashable and not a ``Nonterminal``.
:rtype: bool
"""
return hasattr(item, '__hash__') and not isinstance(item, Nonterminal)尽管我不确定应该如何使用这些函数,但是对于任何字符串输入,is_terminal()的默认值始终是True。
首先,因为所有字符串都包含__hash__属性,所以它是一个对字符串进行散列的函数,请参见https://docs.python.org/2/reference/datamodel.html#object.
>>> astring = 'foo bar'
>>> astring.__hash__
<method-wrapper '__hash__' of str object at 0x7f06bb0cbcc0>
>>> astring.__hash__()
8194924035431162904其次,在NLTK中,所有字符串肯定不是Nonterminal对象,因为Nonterminal类是:
class Nonterminal(object):
"""
A non-terminal symbol for a context free grammar. ``Nonterminal``
is a wrapper class for node values; it is used by ``Production``
objects to distinguish node values from leaf values.
The node value that is wrapped by a ``Nonterminal`` is known as its
"symbol". Symbols are typically strings representing phrasal
categories (such as ``"NP"`` or ``"VP"``). However, more complex
symbol types are sometimes used (e.g., for lexicalized grammars).
Since symbols are node values, they must be immutable and
hashable. Two ``Nonterminals`` are considered equal if their
symbols are equal.
:see: ``CFG``, ``Production``
:type _symbol: any
:ivar _symbol: The node value corresponding to this
``Nonterminal``. This value must be immutable and hashable.
"""因此,一个字符串需要同时满足(1)具有__hash__属性和(2)不是Nonterminal对象这两个条件。因此,对于所有字符串,nltk.grammar.is_terminal()总是返回True。
那么,只有当您加载语法,然后读取语法中的非终结符对象时,可能仅当某个对象被专门创建或强制转换为非终结符时,我才能使其返回False,例如http://www.nltk.org/_modules/nltk/parse/pchart.html
https://stackoverflow.com/questions/26917726
复制相似问题