我在elasticsearch中有以下数据
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
]预期结果:
找出一份有特定国家SA (沙特阿拉伯)的文件在demographics_countries中排名前三
例如,:
"_id":"6838“(第一个文档)匹配,因为在上述示例文档中,SA (沙特阿拉伯)位于demographics_countries的前3位。
尝试过?:我尝试过使用top_hits进行过滤,但是它并没有像预期的那样工作。
任何建议都将不胜感激。
发布于 2020-11-27 13:06:34
使用当前的数据模型,很难做到这一点。我建议的可能不是最简单的方式,但它肯定是最快的查询最终。
我建议把你的文件重新设计成已经包括了顶级国家:
[
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
}
]忽略我为top_demographics_countries选择的值。使用这种方法,您始终可以预先计算top,然后可以使用一个简单的术语查询来检查文档是否包含该值:
{
"query": {
"bool": {
"filter": {
"term": {
"top_demographics_countries": "SA"
}
}
}
}
}与动态构建该子句相比,在保存期间计算它们一次会更便宜。
发布于 2020-11-27 13:27:37
@Evaldas是对的--最好提前提取前3名。
但是,如果您无法控制自己,并且感到不得不使用java/无痛苦,下面有一种方法:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "demographics_countries.SA"
}
},
{
"script": {
"script": {
"source": """
def tuple_list = new ArrayList();
for (def c : params.all_countries) {
def key = 'demographics_countries.'+c;
if (!doc.containsKey(key) || doc[key].size() == 0) {
continue;
}
def val = doc[key].value;
tuple_list.add([c, val]);
}
// sort tuple list by the country values
Collections.sort(tuple_list, (arr1, arr2) -> arr1[1] < arr2[1] ? 1 : -1);
// slice & take only the top 3
def top_3_countries = tuple_list.subList(0, 3).stream().map(arr -> arr[0]).collect(Collectors.toList());
return top_3_countries.size() >=3 && top_3_countries.contains(params.country_of_interest);
""",
"params": {
"country_of_interest": "SA",
"all_countries": [
"AE",
"CA",
"FR",
"GB",
"IT",
"LB",
"SA",
"TR",
"US",
"TN",
"MA",
"DZ",
"EG",
"IN"
]
}
}
}
}
]
}
}
}https://stackoverflow.com/questions/65037694
复制相似问题