这个问题是我以前的this问题的延续。我有一些文本,我想对数字和文本进行搜索。
我的短信:-
8080.foobar.getFooLabelFrombar(test.java:91)
我想在getFooLabelFrombar,fooBar,8080和91上搜索。
早些时候,我使用了simple分析器,它将上面的文本标记为下面的标记。
"tokens": [
{
"token": "foobar",
"start_offset": 10,
"end_offset": 16,
"type": "word",
"position": 2
},
{
"token": "getfoolabelfrombar",
"start_offset": 17,
"end_offset": 35,
"type": "word",
"position": 3
},
{
"token": "test",
"start_offset": 36,
"end_offset": 40,
"type": "word",
"position": 4
},
{
"token": "java",
"start_offset": 41,
"end_offset": 45,
"type": "word",
"position": 5
}
]
}其中,foobar和getFooLabelFrombar上的搜索给出了搜索结果,但没有给出8080和91,因为简单分析器没有标记数字。
那就像prev中所说的那样。因此,我将分析器改为Standard,因为数字是可搜索的,而不是其他两个字的搜索字符串。标准分析器将创建以下令牌:
{
"tokens": [
{
"token": "8080",
"start_offset": 0,
"end_offset": 4,
"type": "<NUM>",
"position": 1
},
{
"token": "foobar.getfoolabelfrombar",
"start_offset": 5,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "test.java",
"start_offset": 36,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "91",
"start_offset": 46,
"end_offset": 48,
"type": "<NUM>",
"position": 4
}
]
}我去了ES中所有现有的分析器,但是似乎没有什么能满足我的要求。我试着创建了下面的自定义分析器,但是它也不起作用。
{
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "letter"
"filter" : ["lowercase", "extract_numbers"]
}
},
"filter" : {
"extract_numbers" : {
"type" : "keep_types",
"types" : [ "<NUM>","<ALPHANUM>","word"]
}
}
}
}请建议,我如何建立我的自定义分析器,以适应我的要求。
发布于 2017-10-22 11:41:52
用字符过滤器来用空格替换点怎么样?
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": ["replace_dots"]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
}
}
}
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "8080.foobar.getFooLabelFrombar(test.java:91)"
}它输出所需的内容:
{
"tokens" : [
{
"token" : "8080",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "foobar",
"start_offset" : 10,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "getFooLabelFrombar",
"start_offset" : 17,
"end_offset" : 35,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "test",
"start_offset" : 36,
"end_offset" : 40,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "java",
"start_offset" : 41,
"end_offset" : 45,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "91",
"start_offset" : 46,
"end_offset" : 48,
"type" : "<NUM>",
"position" : 6
}
]
}https://stackoverflow.com/questions/46873023
复制相似问题