我在Elasticsearch-6中分析的文本有一些我不感兴趣的数字,但我不知道如何删除它们。谢谢,我对代币的搜索会带来邮编或时间或年份。几乎没有什么不同的年份,我可以把它们加到断句中去。但其他人太多了,不能用这种方法过滤掉。
我确实尝试过编写一个自定义过滤器:
"char_filter": {
"number_filter": {
"type": "pattern_replace",
"pattern": "\\d+",
"replacement": " "
} 但是,当我试图将它添加到设置中时,我收到了以下错误:
未能获得index.analysis.analyzer的设置组。设置前缀和设置index.analysis.analyzer.char_filter,因为缺少“。”
下面是我的配置的整个设置部分(注意:在我添加数字替换器之前,它已经工作了):
"settings": {
"analysis": {
"analyzer": {
"t_analyzer": {
"tokenizer": "t_tokenizer"
},
"major_words_analyzer": {
"type": "standard",
"stopwords": "_english_"
},
"char_filter": [
"number_filter"
]
},
"tokenizer": {
"t_tokenizer": {
"type": "standard"
}
},
"char_filter": {
"number_filter": {
"type": "pattern_replace",
"pattern": "\\d+",
"replacement": " "
}
}
}
}编辑:这是相关的字段设置:
},
"narrative": {
"type": "text",
"store": "true",
"analyzer": "t_analyzer",
"fielddata": "true",
"fields": {
"raw": {
"type": "text"
}
}
},
"narrativePhrases": {
"type": "text",
"analyzer": "major_words_analyzer",
"fielddata": "true",
"fields": {
"keyword": {
"type": "keyword"
}
}
}, 编辑:之后我要做的是:
POST /test_narrative/_search?size=0
{
"aggs": {
"incidents_by_month":{
"date_histogram":{
"field":"eventDate",
"interval":"month",
"min_doc_count" : 5
},
"aggs":{
"top_phrases":{
"significant_text": {
"field": "narrative",
"size": 10
}
}
}
}
}
}我仍然有返回值中的数字:
{
"key": "personally",
"doc_count": 3,
"score": 5.22625236294896,
"bg_count": 36
},
{
"key": "2011",
"doc_count": 4,
"score": 2.4786045712321703,
"bg_count": 132
}发布于 2018-10-25 20:33:45
您似乎在上面的设置中错误地放置了char_filter。
根据这个文档,char_filter是您将要创建的custom analyzer的参数之一,因此它必须在t_analyzer和/或major_words_analyzer中,这取决于您的需求。例如:
"t_analyzer":{
"tokenizer":"t_tokenizer",
"char_filter":[
"number_filter"
]
}如果您打算在两个分析器上使用char_filter,这意味着您的设置必须以以下方式进行。
PUT numberindex
{
"settings":{
"analysis":{
"analyzer":{
"t_analyzer":{
"tokenizer":"t_tokenizer",
"char_filter":[
"number_filter"
]
},
"major_words_analyzer":{
"type":"standard",
"stopwords":"_english_",
"char_filter":[
"number_filter"
]
}
},
"tokenizer":{
"t_tokenizer":{
"type":"standard"
}
},
"char_filter":{
"number_filter":{
"type":"pattern_replace",
"pattern":"\\d+",
"replacement":""
}
}
}
}
}希望能帮上忙!
https://stackoverflow.com/questions/52997001
复制相似问题