我已经开始尝试使用Elasticsearch吞食管道和处理器,作为一种可能更快的方法来构建我可以描述的“倒排索引”。
下面是我要做的事情:我有一个文档索引。每一份文件都类似于以下内容:
{
"id": "DOC1",
"title": "Quiz no. 1",
"questions": [
{
"question": "Who was the first person to walk on the Moon?",
"choices": [
{ "answer": "Michael Jackson", "correct": false },
{ "answer": "Neil Armstrong", "correct": true }
]
},
{
"question": "Who wrote the Macbeth?",
"choices": [
{ "answer": "William Shakespeare", "correct": true },
{ "answer": "Dante Alighieri", "correct": false },
{ "answer": "Arthur Conan Doyle", "correct": false }
]
}
]
}我试图了解是否有一个神奇的组合,重新索引,管道和处理器,可以让我自动构建一个问题索引。下面是该索引的一个示例:
[
{
"question_id": "<randomly-generated-value-1>",
"document_id": "DOC1",
"question": "Who was the first person to walk on the Moon?",
"choices": [
{ "answer": "Michael Jackson", "correct": false },
{ "answer": "Neil Armstrong", "correct": true }
]
},
{
"question_id": "<randomly-generated-value-2>",
"document_id": "DOC1",
"question": "Who wrote the Macbeth?",
"choices": [
{ "answer": "William Shakespeare", "correct": true },
{ "answer": "Dante Alighieri", "correct": false },
{ "answer": "Arthur Conan Doyle", "correct": false }
]
}
]在Elasticsearch文档中提到,您可以使用特定的管道执行再索引。查找模拟管道文档,我正在尝试几个处理器,包括foreach one,但我无法理解管道产生的文档与原始索引是否仍然是1:1,还是一个源文档可以生成多个目标文档(这正是我所需要的)。
下面是我正在尝试的模拟管道:
{
"pipeline": {
"description": "Inverts the documents index into a questions index",
"processors": [
{
"rename": {
"field": "id",
"target_field": "document_id",
"ignore_missing": false
}
},
{
"foreach": {
"field": "questions",
"processor": {
"rename": {
"field": "_ingest._value.question",
"target_field": "question"
}
}
}
},
{
"foreach": {
"field": "questions",
"processor": {
"rename": {
"field": "_ingest._value.choices",
"target_field": "choices"
}
}
}
},
{
"remove": {
"field": "questions"
}
}
]
}
}这是几乎工作。这种方法的问题是,只有一个结果文件对应于第一个问题。第二个问题在模拟管道的输出中不存在,因此我怀疑一个处理器的管道是否能够输出多个目标文档,读取一个源文档,还是被迫保持1:1的关系。
发布于 2020-03-09 13:31:31
这个答案似乎在暗示我想要达到的目标是不可能的。
https://stackoverflow.com/questions/60601635
复制相似问题