我需要使用Lucene MoreLikeThis来查找给定一段文本的类似文档。我是Lucene的新手,使用的代码是here
我已经为目录"C:\Users\lucene_index_files\v2“中的文档编制了索引。
我使用的是“他们是计算机工程师,他们喜欢开发自己的工具。程序用Java、CPP等语言编写。”作为我想要找到的类似文档的文档。
public class LuceneSearcher2 {
public static void main(String[] args) throws IOException {
LuceneSearcher2 m = new LuceneSearcher2();
System.out.println("1");
m.start();
System.out.println("2");
//m.writerEntries();
m.findSilimar("They are computer engineers and they like to develop their own tools. The program in languages like Java, CPP.");
System.out.println("3");
}
private Directory indexDir;
private StandardAnalyzer analyzer;
private IndexWriterConfig config;
public void start() throws IOException{
//analyzer = new StandardAnalyzer(Version.LUCENE_42);
//config = new IndexWriterConfig(Version.LUCENE_42, analyzer);
analyzer = new StandardAnalyzer();
config = new IndexWriterConfig(analyzer);
config.setOpenMode(OpenMode.CREATE_OR_APPEND);
indexDir = new RAMDirectory(); //don't write on disk
//https://stackoverflow.com/questions/36542551/lucene-in-java-method-not-found?rq=1
indexDir = FSDirectory.open(FileSystems.getDefault().getPath("C:\\Users\\lucene_index_files\\v2")); //write on disk
//System.out.println(indexDir);
}
private void findSilimar(String searchForSimilar) throws IOException {
IndexReader reader = DirectoryReader.open(indexDir);
IndexSearcher indexSearcher = new IndexSearcher(reader);
System.out.println("2a");
MoreLikeThis mlt = new MoreLikeThis(reader);
mlt.setMinTermFreq(0);
mlt.setMinDocFreq(0);
mlt.setFieldNames(new String[]{"title", "content"});
mlt.setAnalyzer(analyzer);
System.out.println("2b");
StringReader sReader = new StringReader(searchForSimilar);
//Query query = mlt.like(sReader, null);
//Throws error - The method like(String, Reader...) in the type MoreLikeThis is not applicable for the arguments (StringReader, null)
Query query = mlt.like("computer");
System.out.println("2c");
System.out.println(query.toString());
TopDocs topDocs = indexSearcher.search(query,10);
for ( ScoreDoc scoreDoc : topDocs.scoreDocs ) {
Document aSimilar = indexSearcher.doc( scoreDoc.doc );
String similarTitle = aSimilar.get("title");
String similarContent = aSimilar.get("content");
System.out.println("====similar finded====");
System.out.println("title: "+ similarTitle);
System.out.println("content: "+ similarContent);
}
System.out.println("2d");
}}我不确定是什么原因导致系统无法生成输出/
发布于 2018-07-28 20:16:26
你的输出是什么?我假设你没有找到类似的文档。原因可能是您正在创建的查询为空。
首先,以一种有意义的方式运行您的代码
Query query = mlt.like(sReader, null); 需要一个包含字段名的String[]作为参数,因此它的工作方式如下所示
Query query = mlt.like(sReader, new String[]{"title", "content"}); 现在,为了在Lucene中使用MoreLikeThis,您的存储字段必须在创建字段时设置存储术语向量的选项"setStoreTermVectors( true );“true,例如:
FieldType fieldType = new FieldType();
fieldType.setStored(true);
fieldType.setStoreTermVectors(true);
fieldType.setTokenized(true);
Field contentField = new Field("contents", this.getBlurb(), fieldType);
doc.add(contentField);如果不这样做,可能会导致查询字符串为空,从而导致查询没有结果
https://stackoverflow.com/questions/48885085
复制相似问题