通过阅读文档和使用应用程序接口,看起来CoreNLP会告诉我每个令牌的NER标记,但它不会帮助我从句子中提取全名。例如:
Input: John Wayne and Mary have coffee
CoreNLP Output: (John,PERSON) (Wayne,PERSON) (and,O) (Mary,PERSON) (have,O) (coffee,O)
Desired Result: list of PERSON ==> [John Wayne, Mary]除非我遗漏了一些标志,否则我相信要做到这一点,我需要解析标记,并将标记为PERSON的连续标记粘合在一起。
有人能确认这确实是我需要做的吗?我主要想知道CoreNLP中是否有一些标志或实用程序可以为我做这样的事情。如果某人有一个实用程序(理想情况下是Java,因为我使用的是Java API)来做这件事并且想要分享:)
谢谢!
PS:有一个非常类似的问题,here,它似乎建议答案是“滚你自己的”,但它从未得到任何其他人的证实。
发布于 2019-08-21 00:57:45
您可能正在寻找entity mentions而不是或以及NER标记。例如,使用Simple API
new Sentence("Jimi Hendrix was the greatest").nerTags()
[PERSON, PERSON, O, O, O]
new Sentence("Jimi Hendrix was the greatest").mentions()
[Jimi Hendrix]上面的链接有一个使用老的StanfordCoreNLP管道的传统非简单API的示例。
发布于 2019-08-21 03:05:13
此链接上的基本Java API示例中显示了这一点:
https://stanfordnlp.github.io/CoreNLP/api.html
以下是完整的Java API示例,其中有一节是关于实体的提及:
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample {
public static String text = "Joe Smith was born in California. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = new CoreDocument(text);
// annnotate the document
pipeline.annotate(document);
// examples
// 10th token of the document
CoreLabel token = document.tokens().get(10);
System.out.println("Example: token");
System.out.println(token);
System.out.println();
// text of the first sentence
String sentenceText = document.sentences().get(0).text();
System.out.println("Example: sentence");
System.out.println(sentenceText);
System.out.println();
// second sentence
CoreSentence sentence = document.sentences().get(1);
// list of the part-of-speech tags for the second sentence
List<String> posTags = sentence.posTags();
System.out.println("Example: pos tags");
System.out.println(posTags);
System.out.println();
// list of the ner tags for the second sentence
List<String> nerTags = sentence.nerTags();
System.out.println("Example: ner tags");
System.out.println(nerTags);
System.out.println();
// constituency parse for the second sentence
Tree constituencyParse = sentence.constituencyParse();
System.out.println("Example: constituency parse");
System.out.println(constituencyParse);
System.out.println();
// dependency parse for the second sentence
SemanticGraph dependencyParse = sentence.dependencyParse();
System.out.println("Example: dependency parse");
System.out.println(dependencyParse);
System.out.println();
// kbp relations found in fifth sentence
List<RelationTriple> relations =
document.sentences().get(4).relations();
System.out.println("Example: relation");
System.out.println(relations.get(0));
System.out.println();
// entity mentions in the second sentence
List<CoreEntityMention> entityMentions = sentence.entityMentions();
System.out.println("Example: entity mentions");
System.out.println(entityMentions);
System.out.println();
// coreference between entity mentions
CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
System.out.println("Example: original entity mention");
System.out.println(originalEntityMention);
System.out.println("Example: canonical entity mention");
System.out.println(originalEntityMention.canonicalEntityMention().get());
System.out.println();
// get document wide coref info
Map<Integer, CorefChain> corefChains = document.corefChains();
System.out.println("Example: coref chains for document");
System.out.println(corefChains);
System.out.println();
// get quotes in document
List<CoreQuote> quotes = document.quotes();
CoreQuote quote = quotes.get(0);
System.out.println("Example: quote");
System.out.println(quote);
System.out.println();
// original speaker of quote
// note that quote.speaker() returns an Optional
System.out.println("Example: original speaker of quote");
System.out.println(quote.speaker().get());
System.out.println();
// canonical speaker of quote
System.out.println("Example: canonical speaker of quote");
System.out.println(quote.canonicalSpeaker().get());
System.out.println();
}
}https://stackoverflow.com/questions/57576640
复制相似问题