我试图使关键字标记化多词同义词与_analyze API一起工作。然而,API正在返回单单词同义词的预期结果,而不是多词同义词。以下是我的设置和分析链:
curl -XPOST "http://localhost:9200/test" -d'
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"type": "synonym",
"synonyms": [
"foo bar, fooo bar",
"bazzz, baz"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}'现在使用_analyze API进行测试:
curl 'localhost:9200/test/_analyze?analyzer=my_synonyms&text=baz'调用返回我期望的结果( 'bazzz‘也返回相同的结果):
{
"tokens": [
{
"position": 1,
"type": "SYNONYM",
"end_offset": 3,
"start_offset": 0,
"token": "bazzz"
},
{
"position": 1,
"type": "SYNONYM",
"end_offset": 3,
"start_offset": 0,
"token": "baz"
}
]
}现在,当我尝试对多个单词同义词文本进行相同的调用时,API只返回一个类型为“word”的令牌,而不返回同义词:
curl 'localhost:9200/test/_analyze?analyzer=my_synonyms&text=foo+bar'(报表)
{
"tokens": [
{
"position": 1,
"type": "word",
"end_offset": 7,
"start_offset": 0,
"token": "foo bar"
}
]
}为什么analyze不返回类型为同义词的"foo bar“和"fooo bar”标记?
发布于 2014-08-13 02:17:47
“记号器”:“关键字”键值也需要添加到my_syn_filt过滤器声明中,如下所示:
curl -XPOST "http://localhost:9200/test" -d'
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": [
"foo bar, fooo bar",
"bazzz, baz"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}'通过上面的映射,_analyze API返回所需的同义词标记:
{
"tokens": [
{
"position": 1,
"type": "SYNONYM",
"end_offset": 7,
"start_offset": 0,
"token": "foo bar"
},
{
"position": 1,
"type": "SYNONYM",
"end_offset": 7,
"start_offset": 0,
"token": "fooo bar"
}
]
}https://stackoverflow.com/questions/25207158
复制相似问题