文章/答案/技术大牛

发布

社区首页 >问答首页 >Lucene从RAMDirectory转换为FSDIrectory -Content-字段缺失

问Lucene从RAMDirectory转换为FSDIrectory -Content-字段缺失
EN

Stack Overflow用户

提问于 2014-10-27 11:46:31

回答 1查看 381关注 0票数 0

我只是一个lucene初学者，在从RAMDIrectory到FSDirectory的转换过程中，我遇到了一个问题：

首先，我的代码：

    private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
            new StandardAnalyzer(Version.LUCENE_43));
    Directory DIR = FSDirectory.open(new File(INDEXLOC)); //INDEXLOC = "path/to/dir/"
    // RAMDirectory DIR = new RAMDirectory();

    // Index some made up content      
    IndexWriter writer =
            new IndexWriter(DIR, iwc);


    // Store both position and offset information
    FieldType type = new FieldType();
    type.setStored(true);
    type.setStoreTermVectors(true);
    type.setStoreTermVectorOffsets(true);
    type.setStoreTermVectorPositions(true);
    type.setIndexed(true);
    type.setTokenized(true);

    IDocumentParser p = DocumentParserFactory.getParser(f);
    ArrayList<ParserDocument> DOCS = p.getParsedDocuments();

    for (int i = 0; i < DOCS.size(); i++) {
        Document doc = new Document();
        Field id = new StringField("id", "doc_" + i, Field.Store.YES);
        doc.add(id);
        Field text = new Field("content", DOCS.get(i).getContent(), type);
        doc.add(text);
        writer.addDocument(doc);
    }
    writer.close();
    // Get a searcher
    IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(DIR));
    // Do a search using SpanQuery
    SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "zahl"));
    TopDocs results = searcher.search(fleeceQ, 10);
    for (int i = 0; i < results.scoreDocs.length; i++) {
        ScoreDoc scoreDoc = results.scoreDocs[i];
        System.out.println("Score Doc: " + scoreDoc);
    }
    IndexReader reader = searcher.getIndexReader();

    AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
    Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
    Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
    int window = 2;// get the words within two of the match
    while (spans.next() == true) {
        Map<Integer, String> entries = new TreeMap<Integer, String>();
        System.out.println("Doc: " + spans.doc() + " Start: " + spans.start() + " End: " + spans.end());
        int start = spans.start() - window;
        int end = spans.end() + window;
        Terms content = reader.getTermVector(spans.doc(), "content");
        TermsEnum termsEnum = content.iterator(null);
        BytesRef term;
        while ((term = termsEnum.next()) != null) {
            // could store the BytesRef here, but String is easier for this
            // example
            String s = new String(term.bytes, term.offset, term.length);
            DocsAndPositionsEnum positionsEnum = termsEnum.docsAndPositions(null, null);
            if (positionsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
                int i = 0;
                int position = -1;
                while (i < positionsEnum.freq() && (position = positionsEnum.nextPosition()) != -1) {
                    if (position >= start && position <= end) {
                        entries.put(position, s);
                    }
                    i++;
                }
            }
        }
        System.out.println("Entries:" + entries);
    }

这只是我在一个很棒的网站上找到的一些代码，我想试试.的所有功能都很好，使用RAMDirectory.，但是如果我将其更改为RAMDirectory.，则会给我一个NullpointerException，比如：

线程"main“java.lang.NullPointerException at com.org.test.TextDB.myMethod(TextDB.java:184) at com.org.test.Main.main(Main.java:31)中的异常

语句术语content = reader.getTermVector(spans.doc()，"content");似乎没有得到结果并返回null。所以例外。但是为什么呢？在我的ramDIR中，一切都很好。

似乎indexWriter或阅读器(真的不知道)没有正确地从索引中写入或读取字段“内容”。但是我真的不知道为什么它是用RAMDirectory写的，而不是用FSDIrectory写的？！

有什么想法吗？

java

lucene

ramdirectory

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-10-27 21:17:08

给了这个测试一个快速的测试运行，我不能重现你的问题。

我认为这里最有可能的问题是索引中的旧文档。按照编写的方式，每次运行时，都会向索引中添加更多的文档。以前运行的旧文档不会被删除，也不会被覆盖，它们只是停留在附近。因此，如果您之前在同一个目录上运行过这个操作，例如，在添加行type.setStoreTermVectors(true);之前，您的一些结果可能是这些带有术语向量的旧文档，如果文档不存储术语向量，reader.getTermVector(...)将返回null。

当然，一旦执行完成，RAMDirectory中索引的任何内容都会被删除，因此在这种情况下不会出现问题。

简单的解决方案是尝试删除索引目录并再次运行它。

如果您想在运行时使用一个新的索引，您可以通过IndexWriterConfig设置它。

private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
        new StandardAnalyzer(Version.LUCENE_43));
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

当然，这只是猜测，但似乎与你描述的行为是一致的。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26586867

复制

相似问题

问Lucene从RAMDirectory转换为FSDIrectory -Content-字段缺失
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene从RAMDirectory转换为FSDIrectory -Content-字段缺失EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene从RAMDirectory转换为FSDIrectory -Content-字段缺失
EN