首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用python和语法中的列表进行文本文件解析

使用python和语法中的列表进行文本文件解析
EN

Stack Overflow用户
提问于 2017-08-31 12:21:03
回答 1查看 845关注 0票数 1

我必须做一个解析:目标是创建一个将应用于语料库的语法规则。我有一个问题:有可能在语法中列出一个列表吗?

示例:

代码语言:javascript
复制
1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
   grammar("""
   S -> NP VP
   NP -> DET N
   VP -> V N
   DET -> list_det.txt
   N -> list_n.txt
   V -> list.txt""")
3) Print the result with the entries that obey this grammar

有可能吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-01 08:15:49

下面是您的语法的一个快速概念原型,使用pyparsing。我无法从您的问题中知道NVDET列表的内容可以是什么,所以我只是随意地选择了由'n's和'v's以及字面'det‘组成的单词。您可以用语法的正确表达式替换<<=赋值,但是这个解析器和示例字符串应该表明您的语法至少是可行的。(如果您编辑您的问题以显示什么是NVDET列表,我可以用较少的任意表达式和示例更新这个答案。另外,包含要解析的示例字符串也是有用的。)

我还添加了一些分组,以便您可以看到语法的结构是如何反映在结果的结构中的。您可以将其留在或移除它,解析器仍将工作。

代码语言:javascript
复制
import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))

VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP

# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')

sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'

parsed = S.parseString(sample)
print(parsed.asList())

指纹:

代码语言:javascript
复制
[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']], 
 [['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]

编辑:

我猜"NP“和"VP”是“名词短语”和“动词短语”,但我不知道"DET“是什么。尽管如此,我还是提出了一个不那么抽象的例子。我还扩大了列表,以接受更多的语法形式的名词和动词列表,连接‘’和‘’和逗号。

代码语言:javascript
复制
import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

def collectionOf(expr):
    '''
    Compose a collection expression for a base expression that matches
        expr
        expr and expr
        expr, expr, expr, and expr
    '''
    AND = pp.Literal('and')
    OR = pp.Literal('or')
    COMMA = pp.Suppress(',')
    return expr + pp.Optional(
            pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)

V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')

VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')

# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')

samples = '''
    when does the dog eat the food
    does the dog like the cat
    do the horse, cat, and dog like or hate their food
    do the horse and dog love the cat
    why did the dog chase the squirrel
'''
S.runTests(samples)

指纹:

代码语言:javascript
复制
when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
  - NP: [['when', 'does'], ['the', 'dog']]
    - DET: ['when', 'does']
    - N: ['the', 'dog']
  - VP: [['eat'], ['the', 'food']]
    - N: ['the', 'food']
    - V: ['eat']


does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
  - NP: [['does'], ['the', 'dog']]
    - DET: ['does']
    - N: ['the', 'dog']
  - VP: [['like'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['like']


do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
  - NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'cat', 'and', 'dog']
  - VP: [['like', 'or', 'hate'], ['their', 'food']]
    - N: ['their', 'food']
    - V: ['like', 'or', 'hate']


do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
  - NP: [['do'], ['the', 'horse', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'and', 'dog']
  - VP: [['love'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['love']


why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
  - NP: [['why', 'did'], ['the', 'dog']]
    - DET: ['why', 'did']
    - N: ['the', 'dog']
  - VP: [['chase'], ['the', 'squirrel']]
    - N: ['the', 'squirrel']
    - V: ['chase']
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45981339

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档