问在斯坦福重写TokensRegex的默认env设置
EN

Stack Overflow用户

提问于 2019-01-17 02:40:02

回答 1查看 56关注 0票数 0

添加令牌正则表达式规则时，edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation到edu.stanford.nlp.ling.CoreAnnotations$OriginalTextAnnotation的设置是什么？

示例示例：

在斯坦福，#123456被标记为金钱，因此为了超越NER的行为，我写了一条规则，将123456标记为数字而不是金钱。作为一个副作用，跟随£20.49现在被标记为数字。

我调试了代码，并意识到当应用模式时，使用edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation进行匹配。因此，在£20.49是输入的情况下，£是edu.stanford.nlp.ling.CoreAnnotations$OriginalTextAnnotation的值，#是edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation的值。

是否有环境设置来更改此行为？

样本规则

# make all patterns case-sensitive
ENV.defaultStringMatchFlags = 0
ENV.defaultStringPatternFlags = 0

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

normalizedValue = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NormalizedNamedEntityTagAnnotation" }

{ ruleType: "tokens", pattern: (([{word:"#"}]) ([{ner:"MONEY"}])), action: (Annotate($1, ner, "IGNORE"), Annotate($2, ner, "NUMBER"), Annotate($0, normalizedValue, "TOKENS_REGEX")), result: "NUMBER" }

stanford-nlp

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-01-18 02:21:28

您应该在GitHub上使用最新版本或3.9.2版本。货币不再正常化，因此英镑符号在默认情况下将不再被转换为"#“。

你应该能做这样的事

originalWord = { type: "CLASS", value: edu.stanford.nlp.ling.CoreAnnotations$OriginalTextAnnotation }

然后，您可以将规则中的word替换为originalWord。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54228221

复制

相似问题

问在斯坦福重写TokensRegex的默认env设置
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在斯坦福重写TokensRegex的默认env设置EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在斯坦福重写TokensRegex的默认env设置
EN