我在寻找在java中使用stemm字符串的可能性。首先,我想用lucene来做这件事,但是我在网上找到的所有例子都被否决了。(SnowballAnalyzer,PorterStemmer,…)我只想整句话。
public static String stemSentence(String sentence) {
...
return stemmedSentence;
}我该怎么做呢?
发布于 2014-06-07 16:41:09
这样做:
public static String stem(String string) throws IOException {
TokenStream tokenizer = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
tokenizer = new StandardFilter(Version.LUCENE_47, tokenizer);
tokenizer = new LowerCaseFilter(Version.LUCENE_47, tokenizer);
tokenizer = new PorterStemFilter(tokenizer);
CharTermAttribute token = tokenizer.getAttribute(CharTermAttribute.class);
tokenizer.reset();
StringBuilder stringBuilder = new StringBuilder();
while(tokenizer.incrementToken()) {
if(stringBuilder.length() > 0 ) {
stringBuilder.append(" ");
}
stringBuilder.append(token.toString());
}
tokenizer.end();
tokenizer.close();
return stringBuilder.toString();
}https://stackoverflow.com/questions/24096227
复制相似问题