我用了Solr 6.1
我正在设定分数,
但我在分数上有一些问题
我只是搜索GCS,qf集是:标题^100内容^70文本^50,
这三个字段类型都是text_general,
我得到的第一个结果是1050.8486,另一个是853.08655,
但是第一个内容在内容字段中是如此之短,而另一个在内容字段中是如此之多,
我只是不知道为什么第一个分数会很高
以下两个结果调试器内容:
1002.8741 =之和:\n 1002.8741 = max of:\n 1002.8741 =重量(标题:GCS in 1275年) [],结果:\n 1002.8741 =得分(doc=1275,freq=1.0 = termFreq=1.0\n),乘积:n 100.0 = boost\n 8.513557 = tfNorm (docFreq=27,docCount=137000)按:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.75 =参数b\n 6.3423285 = avgFieldLength\n 4.0 = fieldLength\n 928.3479 =权重(内容:GCS in 1275) []计算,结果:\n 928.3479 =得分(doc=1275,freq=2.0 = termFreq=2.0\n),乘积:\n 70.0 = boost\n 7.1785564 = idf(docFreq=104,docCount=137000)\n 1.8474623 = tfNorm,计算结果如下:\n 2.0 = termFreq=2.0\n 1.2 =参数k1\n 0.75 =参数b\n 176.37256 = avgFieldLength\n 16.0 =字段长度\n
811.1335 =之和:\n 811.1335 = max of:\n 127.21202 =权重(文本:GCS in 9400) [],结果为:\n 127.21202 =得分(doc=9400,freq=1.0 = termFreq=1.0\n),乘积为:n 50.0 = boost\n 7.464645 = tfNorm (docFreq=78,docCount=137000)从:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.75 =参数b\n 44.69738 = avgFieldLength\n 256.0 = fieldLength\n 811.1335 =权重(标题:GCS in 9400) [],结果:\n 811.1335 =得分(doc=9400,freq=1.0 = termFreq=1.0\n),N 100.0 = boost\n 8.513557 = idf(docFreq=27,docCount=137000)\n 0.9527551 = tfNorm,计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.75 =参数b\n 6.3423285 = avgFieldLength\n 7.111111 =fieldLength n 174.06395 =重量(含量:GCS in 9400) [],结果:n 174.06395 =得分(doc=9400,freq=7.0 = termFreq=7.0\n),乘积:\n 70.0 = boost\n 7.1785564 =tfNorm(docFreq=104,docCount=137000) 0.34639663 = tfNorm,计算结果如下:\n 7.0 = termFreq=7.0\n 1.2 =参数k1\n 0.75 =参数b\n 176.37256 = avgFieldLength\n 7281.778 =字段长度
===========================================================================
我还有另一个问题,当我使用碎片omitNorms时,它不会工作吗?为什么?我发现短内容得分多到长内容吗?模式是相同的。
第一个是来自A集合的短内容,另一个是B集合和长内容:
1158.9161 =之和:\n 1158.9161 = max of:\n 1158.9161 =重量(标题:波音52601) [],结果为:n 1158.9161 =得分(doc=52601,freq=1.0 = termFreq=1.0\n),乘积为: 100.0 = boost\n 11.589161 = tfNorm (docFreq=5,docCount=593568)计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.0 =参数b(字段省略的规范)\n 1085.6042 =重量(内容:波音52601) [],结果为:\n 1085.6042 =得分(doc=52601,freq=2.0 = termFreq=2.0\n),乘积为:\n 70.0 = boost\n 11.279006 =docCount=593568(docFreq=7,docCount=593568)= tfNorm,计算结果如下:\n 2.0 = termFreq=2.0\n 1.2 =参数k1\n 0.0 =参数b(字段省略的范数)\n
1060.8777 =之和:\n 1060.8777 = max of:\n 433.1234 =重量(文本:波音39406) [],结果:\n 433.1234 =得分(doc=39406,freq=1.0 = termFreq=1.0\n),乘积为:\n 50.0 = boost\n 8.662468 = tfNorm (docFreq=112,docCount=650450)计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.0 =参数b(外地省略的规范)\n 884.746 =重量(标题:波音39406) [],结果:\n 884.746 =得分(doc=39406,freq=1.0 = termFreq=1.0\n),乘积:\n 100.0 = boost\n 8.84746 =docCount=650450(docFreq=93,docCount=650450)= tfNorm,计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.0 =参数b(字段省略的规范)\n 1060.8777 =重量(内容:波音39406) [],结果为:\n 1060.8777 =得分(doc=39406,freq=7.0 = termFreq=7.0\n),乘积为:\n 70.0 = boost\n 8.069756 =docCount=650450(docFreq=203,docCount=650450)= tfNorm,计算值:\n 7.0 = termFreq=7.0\n 1.2 =参数k1\n 0.0 =参数b(字段省略的范数)
发布于 2017-10-05 10:33:12
下划线相似度Solr6.1使用的是BM251。
这意味着字段值长度与平均字段长度相比非常重要。更具体的是,您使用的是dismax,并且只考虑最大值。因此,探索极大值:
第一文档Max:
1002.8741 =重量(标题:GCS in 1275年) [],结果:\n 1002.8741 =得分(doc=1275,freq=1.0 = termFreq=1.0\n),乘积:\n 100.0 = boost\n 8.513557 =tfNorm(docFreq=27,docCount=137000)\n 1.177973 = tfNorm,计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.75 =参数b\n 6.3423285 =avgFieldLength 4.0 = fieldLength\n
第二个文档Max:
811.1335 =重量(标题:GCS in 9400) [],结果为:\n 811.1335 =得分(doc=9400,freq=1.0 = termFreq=1.0\n),乘积:\n 100.0 = boost\n 8.513557 =tfNorm(docFreq=27,docCount=137000)\n 0.9527551 = tfNorm,计算结果如下:\n 1.0 = termFreq=1.0\n 1.2 =参数k1\n 0.75 =参数b\n 6.3423285 = avgFieldLength\n 7.111111 = fieldLength\n
因此,短的第一个文件标题就是赢家。您可以使用dismax/edismax来考虑其他因素,而不仅仅是maximum2。
问候
1
https://stackoverflow.com/questions/46581017
复制相似问题