在Elasticsearch中,我的索引有以下定义的字段:
"my_id": {
"type": "keyword"
},
"titles": {
"type": "keyword",
"fields": {
"fulltext": {
"type": "text"
}
}
}每个文档存储多个titles (基本上是一个字符串数组)。
假设我用以下内容索引了一个文档:
我想根据每个文档ID返回significant_terms聚合。例如.
我知道如何使用聚合跨文档执行significant_terms。但是,我无法让它在文档中的子聚合上工作。
我试图在桶中创建一个桶,其中第一个在ID上分区,内部一个返回重要的条件。significant_terms返回一个空数组。
{
"aggs": {
"titles": {
"terms": {
"field": "my_id"
},
"aggs": {
"my_common_terms": {
"significant_terms": {
"field": "titles"
}
}
}
}
}
}发布于 2020-04-08 18:08:28
重要术语aggs所做的与您所期望的相反--他们正在寻找异常事件,而不是最常见的术语!
示例:
PUT stars
{"mappings":{"properties":{"my_id":{"type":"keyword"},"titles":{"type":"keyword","fields":{"fulltext":{"type":"text"}}}}}}然后用类似的ID索引一些文档
POST stars/_doc
{
"my_id": "MH123",
"titles": [
"Star Wars: A New Hope",
"Star Wars: Return of the Jedi",
"Star Wars: \"Empire Strikes Back\""
]
}
POST stars/_doc
{
"my_id": "MH124",
"titles": [
"Star Wars: A New Hope",
"Star Wars: Return of the Jedi",
"Star Wars: \"Empire Strikes Back\""
]
}注意下一次uncommon terms字符串是如何在titles中的
POST stars/_doc
{
"my_id": "MH125",
"titles": [
"uncommon terms",
"Star Wars: A New Hope",
"Star Wars: Return of the Jedi",
"Star Wars: \"Empire Strikes Back\""
]
}现在,将min_doc_count从默认的3减少到1
GET stars/_search
{
"size": 0,
"aggs": {
"titles": {
"terms": {
"field": "my_id"
},
"aggs": {
"my_common_terms": {
"significant_terms": {
"field": "titles",
"min_doc_count": 1
}
}
}
}
}
}屈服
"aggregations" : {
"titles" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "MH123",
"doc_count" : 1,
"my_common_terms" : {
"doc_count" : 1,
"bg_count" : 3,
"buckets" : [ ]
}
},
{
"key" : "MH124",
"doc_count" : 1,
"my_common_terms" : {
"doc_count" : 1,
"bg_count" : 3,
"buckets" : [ ]
}
},
{
"key" : "MH125",
"doc_count" : 1,
"my_common_terms" : {
"doc_count" : 1,
"bg_count" : 3,
"buckets" : [
{
"key" : "uncommon terms",
"doc_count" : 1,
"score" : 2.0,
"bg_count" : 1
}
]
}
}
]
}
}还有其他方法来调整这一点,但这是如何使用有意义的术语。
你要找的是板条过滤器,这是一个良好开端。
https://stackoverflow.com/questions/61105835
复制相似问题