最近,我们将Elasticsearch版本从2.4更改为5.4。
我们在5.x版本中发现了更多类似于此查询的问题。
下面的查询用于通过文本查找类似的文档
输入查询
POST /test/_search
{
"size": 10000,
"stored_fields": [
"docid"
],
"_source": false,
"query": {
"more_like_this": {
"fields": [
"textcontent"
],
"like": [
{
"_index": "test",
"_type": "object",
"_id": "AV0c9jvZXF-b5U5aNAWB"
}
],
"max_query_terms": 5000,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}弹性搜索2.4的输出
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 1.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": .5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": .381224,
"fields": {
"docid": [
"4"
]
}
}弹性搜索5.4 {的输出
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 168.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": 164.5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": 132.381224,
"fields": {
"docid": [
"4"
]
}
}}除了文档的得分之外,两个版本的输出都是相同的。5.4版的评分超过2.4分。我们的工作取决于分数,所以如果分数改变了,那对我们来说就是个问题。请提供解决方案?
发布于 2017-07-14 04:15:29
我得到了解决方案,在5.0版中,他们将默认的相似算法从经典更改为BM25,这就是原因所在。创建索引时,只需将相似类型更改为经典类型即可。如果索引已经存在,那么只需执行以下查询来更新所有索引的设置
PUT /_all/_settings?preserve_existing=true
{
"index.similarity.default.type": "classic"
} https://stackoverflow.com/questions/45056827
复制相似问题