我试图从JSON文件中将数据导入Elasticsearch,该文件每行包含一个文档。只有数据。
下面是我如何创建索引并尝试插入一个文档:
DELETE /tests
PUT /tests
{}PUT /tests/test/_mapping
{
"test":{
"properties":{
"env":{"type":"keyword"},
"uid":{"type":"keyword"},
"ok":{"type":"boolean"}
}
}
}POST /tests/test
{"env":"dev", "uid":12346, "ok":true}GET /tests/_search
{"query":{"match_all":{}}}一切正常,没有错误,文档被正确地编入索引,可以在ES中找到。
现在,让我们尝试使用elasticdump来完成它。
下面是我试图导入的文件的内容:
cat ./data.json
{"env":"prod","uid":1111,"ok":true}
{"env":"prod","uid":2222,"ok":true}下面是我试图导入的方式:
elasticdump \
--input="./data.json" \
--output="http://elk:9200" \
--output-index="tests/test" \
--debug \
--limit=10000 \
--headers='{"Content-Type": "application/json"}' \
--type=data但我错了Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes。
以下是完整的输出:
root@node-tools:/data# elasticdump \
> --input="./s.json" \
> --output="http://elk:9200" \
> --output-index="tests/test" \
> --debug \
> --limit=10000 \
> --headers='{"Content-Type": "application/json"}' \
> --type=data
Tue, 16 Apr 2019 16:26:28 GMT | starting dump
Tue, 16 Apr 2019 16:26:28 GMT | got 2 objects from source file (offset: 0)
Tue, 16 Apr 2019 16:26:28 GMT [debug] | discovered elasticsearch output major version: 6
Tue, 16 Apr 2019 16:26:28 GMT [debug] | thisUrl: http://elk:9200/tests/test/_bulk, payload.body: "{\"index\":{\"_index\":\"tests\",\"_type\":\"test\"}}\nundefined\n{\"index\":{\"_index\":\"tests\",\"_type\":\"test\"}}\nundefined\n"
{ _index: 'tests',
_type: 'test',
_id: 'ndj4JmoBindjidtNmyKf',
status: 400,
error:
{ type: 'mapper_parsing_exception',
reason: 'failed to parse',
caused_by:
{ type: 'not_x_content_exception',
reason:
'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes' } } }
{ _index: 'tests',
_type: 'test',
_id: 'ntj4JmoBindjidtNmyKf',
status: 400,
error:
{ type: 'mapper_parsing_exception',
reason: 'failed to parse',
caused_by:
{ type: 'not_x_content_exception',
reason:
'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes' } } }
Tue, 16 Apr 2019 16:26:28 GMT | sent 2 objects to destination elasticsearch, wrote 0
Tue, 16 Apr 2019 16:26:28 GMT | got 0 objects from source file (offset: 2)
Tue, 16 Apr 2019 16:26:28 GMT | Total Writes: 0
Tue, 16 Apr 2019 16:26:28 GMT | dump complete我做错了什么?为什么手动插入在_batch抛出错误时工作良好。有什么想法吗?
UPD
尝试使用python的elasticsearch_loader --很好。
elasticsearch_loader \
--es-host="http://elk:9200" \
--index="tests" \
--type="test" \
json --json-lines ./data.json更多的信息可以在这里找到:https://github.com/taskrabbit/elasticsearch-dump/issues/534
发布于 2019-04-17 12:55:45
Json文档应该以_source的形式提供。
WAS:{"env":"prod","uid":1111,"ok":true}
现在:{"_source":{"env":"prod","uid":1111,"ok":true}}
这可以由elasticdump使用--transform参数动态进行:
elasticdump \
--input="./data.json" \
--output="http://elk:9200" \
--output-index="tests/test" \
--debug \
--limit=10000 \
--type=data \
--transform="doc._source=Object.assign({},doc)"感谢来自github的铁匠。这里有更多详细信息:https://github.com/taskrabbit/elasticsearch-dump/issues/534
https://stackoverflow.com/questions/55712797
复制相似问题