文章/答案/技术大牛

发布

社区首页 >问答首页 >Lucene:通配符不匹配一个接一个的数字

问Lucene:通配符不匹配一个接一个的数字
EN

Stack Overflow用户

提问于 2018-11-08 10:39:03

回答 1查看 798关注 0票数 2

最近，我从Lucene 3升级到Lucene 6，在v6中，我发现通配符?不再匹配跟随点的数字。下面是一个例子：

要匹配的字符串：a.1a

查询：a.?a

在本例中，查询匹配Lucene 3中的字符串，而不是Lucene 6中的字符串。另一方面，查询a*在Lucene 3和6中都匹配。进一步的测试表明，这种行为上的差异只在点后面跟着一个数字时发生。顺便说一下，我在Lucene 3和6中都使用了StandardAnalyzer。

有人知道这是怎么回事吗？我如何恢复Lucene 3的行为，或者调整我的Lucene 6查询，使其等同于Lucene 3查询？

更新

Lucene6.6代码片段，按要求。

public List<ResultDocument> search(String queryString)
        throws SearchException, CheckedOutOfMemoryError {
    stopped =false;

    QueryWrapper queryWrapper = createQuery(queryString);
    Query query = queryWrapper.query;
    boolean isPhraseQuery = queryWrapper.isPhraseQuery;

    readLock.lock();
    try {
        checkIndexesExist();

        DelegatingCollector collector= new DelegatingCollector(){
            @Override
            public void collect(int doc) throws IOException {
                leafDelegate.collect(doc);
                if(stopped){
                    throw new StoppedSearcherException();
                }
            }
        };
        collector.setDelegate(TopScoreDocCollector.create(MAX_RESULTS, null));
        try{
            luceneSearcher.search(query, collector);
        }
        catch (StoppedSearcherException e){}
        ScoreDoc[] scoreDocs = ((TopScoreDocCollector)collector.getDelegate()).topDocs().scoreDocs;

        ResultDocument[] results = new ResultDocument[scoreDocs.length];
        for (int i = 0; i < scoreDocs.length; i++) {
            Document doc = luceneSearcher.doc(scoreDocs[i].doc);
            float score = scoreDocs[i].score;
            LuceneIndex index = indexes.get(((DecoratedMultiReader) luceneSearcher.getIndexReader()).decoratedReaderIndex(i));
            IndexingConfig config = index.getConfig();
            results[i] = new ResultDocument(
                doc, score, query, isPhraseQuery, config, fileFactory,
                outlookMailFactory);
        }
        return Arrays.asList(results);
    }
    catch (IllegalArgumentException e) {
        throw wrapEmptyIndexException(e);
    }
    catch (IOException e) {
        throw new SearchException(e.getMessage());
    }
    catch (OutOfMemoryError e) {
        throw new CheckedOutOfMemoryError(e);
    }
    finally {
        readLock.unlock();
    }
}

更多代码：

private static QueryWrapper createQuery(String queryString)
        throws SearchException {
    PhraseDetectingQueryParser queryParser = new PhraseDetectingQueryParser(
        Fields.CONTENT.key(), IndexRegistry.getAnalyzer());
    queryParser.setAllowLeadingWildcard(true);
    RewriteMethod rewriteMethod = MultiTermQuery.SCORING_BOOLEAN_REWRITE;
    queryParser.setMultiTermRewriteMethod(rewriteMethod);

    try {
        Query query = queryParser.parse(queryString);
        boolean isPhraseQuery = queryParser.isPhraseQuery();
        return new QueryWrapper(query, isPhraseQuery);
    }
    catch (IllegalArgumentException e) {
        throw new SearchException(e.getMessage());
    }
    catch (ParseException e) {
        throw new SearchException(e.getMessage());
    }
}

private static final class QueryWrapper {
    public final Query query;
    public final boolean isPhraseQuery;

    private QueryWrapper(Query query, boolean isPhraseQuery) {
        this.query = query;
        this.isPhraseQuery = isPhraseQuery;
    }
}

更多的代码：

public final class PhraseDetectingQueryParser extends QueryParser {

    /*
     * This class is used for determining whether the parsed query is supported
     * by the fast-vector highlighter. The latter only supports queries that are
     * a combination of TermQuery, PhraseQuery and/or BooleanQuery.
     */

    private boolean isPhraseQuery = true;

    public PhraseDetectingQueryParser(  String defaultField,
                                        Analyzer analyzer) {
        super(defaultField, analyzer);
    }

    public boolean isPhraseQuery() {
        return isPhraseQuery;
    }

    protected Query newFuzzyQuery(  Term term,
                                    float minimumSimilarity,
                                    int prefixLength) {
        isPhraseQuery = false;
        return super.newFuzzyQuery(term, minimumSimilarity, prefixLength);
    }

    protected Query newMatchAllDocsQuery() {
        isPhraseQuery = false;
        return super.newMatchAllDocsQuery();
    }

    protected Query newPrefixQuery(Term prefix) {
        isPhraseQuery = false;
        return super.newPrefixQuery(prefix);
    }

    protected Query newWildcardQuery(org.apache.lucene.index.Term t) {
        isPhraseQuery = false;
        return super.newWildcardQuery(t);
    }

}

java

lucene

回答 1

Stack Overflow用户

发布于 2019-01-22 20:58:53

StandardAnalyzer在这段时间将输入拆分成术语(除非它的两边有一个字母，或者两边都有一个数字)。所以它把它分成两个术语:a和1a。

由于您使用的是通配符查询，所以在查询结束时没有进行任何分析，因此不会得到标记化，而且索引中也没有与查询匹配的任何术语。如果您要搜索"1a"，没有通配符或任何东西，您应该找到该文档。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53205997

复制

相似问题

问Lucene:通配符不匹配一个接一个的数字
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene:通配符不匹配一个接一个的数字EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene:通配符不匹配一个接一个的数字
EN