我已经看过pattern.en的conjugate,但是它只是结合成几种形式,我不想坐下来对那些允许我做共轭的规则的所有例外情况进行编程,比如
nltk有阻塞,但它似乎没有相反的操作,至少从搜索StackOverflow。这似乎是一个非常基本的NLP任务,但我在Python中找不到任何能做到这一点的现代工具。任何一般的共轭工具都会很好,尽管英语中的累进形式没有我所知道的不规则之处。
我还想看看这条规则是否有例外,它可以作为一种替代功能:
def present_to_progressive(x):
vowels = set(['a','e','i','o','u'])
size = len(x)
if size == 2:
return x + 'ing'
elif x[size - 2:] == 'ie':
return x[:(size-2)] + 'ying'
elif x[size - 1] not in vowels and x[size - 2] not in vowels:
return x + 'ing'
elif x[size - 1] == 'e' and x[size-2] not in vowels:
return x[0:(size-1)] + 'ing'
elif x[size - 1] not in vowels and x[size-2] in vowels:
if x[size - 3] not in vowels:
return x + x[size-1] + 'ing'
else:
return x + 'ing'
else:
return x + 'ing'编辑:添加"ie“动词的大小写
发布于 2017-03-18 19:38:31
对于这种类型的修改,有一个完整的库来执行您想要的操作。它被称为pattern.en
你可以在这里找到它:pattern.en
这是个很好的来源。
以下是该网站共轭教程的摘录:
conjugate(verb,
tense = PRESENT, # INFINITIVE, PRESENT, PAST, FUTURE
person = 3, # 1, 2, 3 or None
number = SINGULAR, # SG, PL
mood = INDICATIVE, # INDICATIVE, IMPERATIVE, CONDITIONAL, SUBJUNCTIVE
aspect = IMPERFECTIVE, # IMPERFECTIVE, PERFECTIVE, PROGRESSIVE
negated = False, # True or False
parse = True) 它是非常有用和非常广泛的!
发布于 2017-03-18 19:43:33
我认为你的代码涵盖了大多数情况。我查了一下从本站上取来的620个不规则动词的清单,它漏掉了大约84个例子。
with open('/tmp/Verblist.vrb', 'rt') as f:
err = 0
for l in f:
if l.startswith('>'):
forms = l[1:].split(' ')
guess = present_to_progressive(forms[0])
if forms[4].lower() != guess.lower():
print('CHECK: {} {} {}'.format(forms[0], forms[4], guess))
err += 1
print(err)只要将'w','y'添加到元音列表中,可能出现的错误就会减少到18种情况:
CHECK: Aby/Abey Abying/Abeying Aby/Abeying -- Correct
CHECK: Eat Eating Eatting
CHECK: Fordo/Foredo Fordoing Fordo/Foredoing -- Correct in one of the 2 variants
CHECK: Forget Foregetting Forgetting -- Correct, the list has a typo
CHECK: Lie Lying Lieing -- Fixed in your second version
CHECK: Mischoose Mischoosins Mischoosing -- Correct, the list has a typo
CHECK: Miswed Miswedding Misweding
CHECK: Outswim Outswimming Outswiming
CHECK: Overlie Overlying Overlieing -- Fixed in your second version
CHECK: Quit Quitting Quiting
CHECK: Relearn Relearn Relearning
CHECK: Rewed Rewedding Reweding
CHECK: Rewet Rewetting Reweting
CHECK: Rewin Rewinning Rewining
CHECK: Swim Swimming Swiming
CHECK: Underlie Underlying Underlieing -- Fixed in your second version
CHECK: Vex Vexing Vexxing
CHECK: Zinc Zincking Zincing其中最重要的是增加特例“谎言”,并改进最后一个辅音加倍的规则。我想你可能决定安全地忽略一些非常不常见的动词。
https://stackoverflow.com/questions/42878419
复制相似问题