首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >synonym.txt文件中使用的特殊字符(例如*)在solr中不起作用

synonym.txt文件中使用的特殊字符(例如*)在solr中不起作用
EN

Stack Overflow用户
提问于 2017-04-17 21:00:34
回答 2查看 400关注 0票数 2

对于版本为4.10.4的solr,我已经创建了文件synonyms.txt并应用了synonymFilterFactory,如下所示:

代码语言:javascript
复制
<fieldType name="text_general" class="solr.TextField">  
     <analyzer>     
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" splitOnCaseChange="0" splitOnNumerics="0" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1" stemEnglishPossessive="0"/>
     </analyzer>
     <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
</fieldType>

synonyms.txt的内容如下:

代码语言:javascript
复制
holland* => holland

holland, netherland, netherlands, niederlande

我在应用程序中有某些条件可以生成术语,例如:holland*

在这种情况下,我希望显示的结果与我输入术语hollandnetherlandnetherlandsniederlande时得到的结果相同。

但目前,对于术语holland*,它不能给出匹配的结果。荷兰的结果*包含与“荷兰”或“荷兰”相同的结果,但这些结果处于底部,所以我们可以提高这些结果吗?

有没有人知道,我怎么才能做到呢?

以下是更多细节:

对于holland,我得到了一些结果,当我调试查询时,它显示为

代码语言:javascript
复制
"debug": {
    "rawquerystring": "holland",
    "querystring": "holland",
    "parsedquery": "(name:holland name:netherland name:netherlands name:niederlande)/no_coord",
    "parsedquery_toString": "name:holland name:netherland name:netherlands name:niederlande",
    "explain": {
      "country-NLD-de": "\n7.42217 = (MATCH) sum of:\n  7.42217 = (MATCH) weight(name:niederlande in 1775593) [DefaultSimilarity], result of:\n    7.42217 = score(doc=1775593,freq=1.0), product of:\n      0.5213204 = queryWeight, product of:\n        14.237252 = idf(docFreq=14, maxDocs=8413113)\n        0.036616646 = queryNorm\n      14.237252 = fieldWeight in 1775593, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        14.237252 = idf(docFreq=14, maxDocs=8413113)\n        1.0 = fieldNorm(doc=1775593)\n",
      "country-NLD-en": "\n7.3550315 = (MATCH) sum of:\n  7.3550315 = (MATCH) weight(name:netherlands in 230095) [DefaultSimilarity], result of:\n    7.3550315 = score(doc=230095,freq=1.0), product of:\n      0.5189572 = queryWeight, product of:\n        14.172713 = idf(docFreq=15, maxDocs=8413113)\n        0.036616646 = queryNorm\n      14.172713 = fieldWeight in 230095, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        14.172713 = idf(docFreq=15, maxDocs=8413113)\n        1.0 = fieldNorm(doc=230095)\n",
      "place-49218-de": "\n5.0385056 = (MATCH) sum of:\n  5.0385056 = (MATCH) weight(name:holland in 385574) [DefaultSimilarity], result of:\n    5.0385056 = score(doc=385574,freq=1.0), product of:\n      0.4295267 = queryWeight, product of:\n        11.730367 = idf(docFreq=183, maxDocs=8413113)\n        0.036616646 = queryNorm\n      11.730367 = fieldWeight in 385574, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        11.730367 = idf(docFreq=183, maxDocs=8413113)\n        1.0 = fieldNorm(doc=385574)\n",

在荷兰*的情况下,结果包含一些来自荷兰的记录,但调试部分如下:

代码语言:javascript
复制
"debug": {
    "rawquerystring": "holland*",
    "querystring": "holland*",
    "parsedquery": "name:holland*",
    "parsedquery_toString": "name:holland*",
    "explain": {
      "place-51432-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n  1.0 = boost\n  1.0 = queryNorm\n",
      "place-49196-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n  1.0 = boost\n  1.0 = queryNorm\n",
      "place-49207-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n  1.0 = boost\n  1.0 = queryNorm\n",

在上面的dubug部分,如果我们检查"parsedquery“部分,它在holland和holland*的情况下是不同的。所以我认为,特殊字符*对SynonymFilterFactory不起作用。

EN

回答 2

Stack Overflow用户

发布于 2021-02-15 20:11:32

据我所知,同义词文件不支持通配符。

在您的例子中,您还可能遇到另一个问题,因为查询中的通配符通常用于搜索不完全匹配的结果。它取决于查询中使用的查询解析器。

换句话说,查询"holland*“搜索所有以"holland”开头的文档。

如果您希望Solr将通配符视为简单的字符,则应该对其进行转义。

我看到的另一个错误是,在您的字段定义中,您应该为案例(索引和查询)定义analyzer类型。

如果您为字段类型提供了一个简单的定义,如上面的示例所示,那么它将同时用于索引和查询。

票数 0
EN

Stack Overflow用户

发布于 2021-02-15 18:32:59

请尝试:

代码语言:javascript
复制
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/43452014

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档