首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >生成Lucene segments_N文件

生成Lucene segments_N文件
EN

Stack Overflow用户
提问于 2016-02-08 15:36:43
回答 3查看 1.4K关注 0票数 3

在将Lucene索引文件从服务器移动到另一个服务器时,我忘记移动segments_N文件(因为我使用模式*.*)。

不幸的是,我已经删除了原始文件夹,现在我的目录中只有这些文件:

代码语言:javascript
复制
_1rpt.fdt
_1rpt.fdx
_1rpt.fnm
_1rpt.nvd
_1rpt.nvm
_1rpt.si
_1rpt_Lucene50_0.doc
_1rpt_Lucene50_0.dvd
_1rpt_Lucene50_0.dvm
_1rpt_Lucene50_0.pos
_1rpt_Lucene50_0.tim
_1rpt_Lucene50_0.tip
write.lock

我丢失了segments_42u文件,如果没有它,我甚至不能执行org.apache.lucene.index.CheckIndex

代码语言:javascript
复制
Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in MMapDirectory@/solr-5.3.1/nodes/node1/core/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx, _1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc, _1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos, _1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip]
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483)
at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)

索引非常庞大(> 800 to ),重建它需要几周时间。

是否有一种方法来生成这个缺失的段信息文件?

非常感谢你的帮助。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-02-10 11:19:57

正如ameertawfik所建议的,我向Lucene邮件列表问这个问题,他们帮助我解决了这个问题。

以下是我的解决方案,以防它可以帮助其他人(将lucene-core-x.x.x.jar添加到类路径中):

代码语言:javascript
复制
package org.apache.lucene.index;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.SimpleFSDirectory;

public class GenSegmentInfo {
    public static void main(String[] args) throws IOException {
        Codec codec = Codec.getDefault();
        Path myPath = Paths.get("/tmp/index");
        Directory directory = new SimpleFSDirectory(myPath);

        //launch this the first time with random segmentID value
        //then with java debug, get the right segment ID
        //by putting a breakpoint on CodecUtil#checkIndexHeaderID(...)
        byte[] segmentID = {88, 55, 58, 78, -21, -55, 102, 99, 123, 34, 85, -38, -70, -120, 102, -67};

        SegmentInfo info = codec.segmentInfoFormat().read(directory, "_1rpt",
                segmentID, IOContext.READ);
        info.setCodec(codec);
        SegmentInfos infos = new SegmentInfos();
        SegmentCommitInfo commit = new SegmentCommitInfo(info, 1, -1, -1, -1);
        infos.add(commit);
        infos.commit(directory);
    }
}
票数 2
EN

Stack Overflow用户

发布于 2017-11-24 21:46:40

在没有调试的情况下添加了在segmentID中查找Lucene62的自动化:

代码语言:javascript
复制
package org.apache.lucene.index;

import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat;
import org.apache.lucene.store.BufferedChecksumIndexInput;
import org.apache.lucene.store.ChecksumIndexInput;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.StringHelper;

public class GenSegmentInfo {
    public static void main(String[] args) throws IOException {
        if (args.length < 2) {
            help();
            System.exit(1);
        }
        Codec codec = Codec.getDefault();
        Directory directory = new SimpleFSDirectory(Paths.get(args[0]));

        SegmentInfos infos = new SegmentInfos();
        for (int i = 1; i < args.length; i++) {
            infos.add(getSegmentCommitInfo6(codec, directory, args[i]));
        }
        infos.commit(directory);
    }

    private static SegmentCommitInfo getSegmentCommitInfo(Codec codec, Directory directory, String segmentName) throws IOException {
        byte[] segmentID = new byte[StringHelper.ID_LENGTH];
        final String fileName = IndexFileNames.segmentFileName(segmentName, "", Lucene62SegmentInfoFormat.SI_EXTENSION);
        ChecksumIndexInput input = directory.openChecksumInput(fileName, IOContext.READ);
        DataInput in = new BufferedChecksumIndexInput(input);

        final int actualHeader = in.readInt();
        final String actualCodec = in.readString();
        final int actualVersion = in.readInt();
        in.readBytes(segmentID, 0, segmentID.length);

        SegmentInfo info = codec.segmentInfoFormat().read(directory, segmentName, segmentID, IOContext.READ);

        info.setCodec(codec);
        return new SegmentCommitInfo(info, 1, -1, -1, -1);
    }

    private static void help() {
        System.out.println("Not enough arguments");
        System.out.println("Usage: java -cp lucene-core-6.6.0.jar GenSegmentInfo <path to index> [segment1 [segment2 ...] ]");
    }
}

为了使它在Lucene410库下工作,必须调整代码的以下部分,因为库的工作方式不同:

  • SimpleFSDirectory需要文件而不是路径
  • 不存在checkIndexHeaderID函数,不需要segmentID。
  • codec.segmentInfoFormat().getSegmentInfoReader().read(directory,segmentName,IOContext.READ)提供SegmentInfo )
票数 4
EN

Stack Overflow用户

发布于 2019-05-08 17:40:30

对于那些使用Lucene.NET的人,这是如何重建段文件。

代码语言:javascript
复制
public static void Main(string[] args)
{
    string dirPath = "path here";
    string filePrefix = "prefix here"; // ex: it's the _1 of _1.fdt, _1.fdx, etc.
    int numberOfFiles = 8;//this is how many files start with the given prefix

    SimpleFSDirectory directory = new SimpleFSDirectory(dirPath);
    SegmentInfos infos = new SegmentInfos();

    SegmentInfo si = new SegmentInfo(filePrefix, numberOfFiles, directory);
    infos.Add(si);
    infos.Commit(directory);
}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35273381

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档