我有一个很大的JSON文件,其中包含如下对象的列表:
{
"_index":"pelias",
"_type":"address",
"_id":"jf808cdawi46z",
"_score":1,
"_source":{
"center_point":{
"lon":106.66307,
"lat":10.959882
},
"name":{
"default":"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam"
}
}
}
{
"_index":"pelias",
"_type":"address",
"_id":"jf808cdawi46z",
"_score":1,
"_source":{
"center_point":{
"lon":106.66307,
"lat":10.959882
},
"name":{
"default":"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam"
}
}
}我正在使用jq将其转换为csv,如下所示:
"address","lat","lon"
"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam",10.959882,106.66307
"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam",10.959882,106.66307我使用的是以下代码:
cat pelias_minify.json | jq -r -s '. | [.[] | {lat: ._source.center_point.lat, lon: ._source.center_point.lon, address: ._source.name.default}] | (map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' > pes.csv问题是这个文件的大小超过了2 2GB。我正在学习如何应用流媒体,但仍然无法理解如何使用它。有什么需要帮忙的吗?非常感谢。
更新,我尝试了这段代码,我可以流式传输文件输出:
cat pelias_minify.json | jq -cn --stream 'fromstream(0|truncate_stream(inputs)) | {lat: ._source.center_point.lat, lon: ._source.center_point.lon, address: ._source.name.default}'输出:
{"center_point":{"lon":106.66307,"lat":10.959882},"name":{"default":"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam"}}
{"center_point":{"lon":106.66307,"lat":10.959882},"name":{"default":"375/20 Bùi Quốc Khánh, Chánh Nghĩa, Bình Dương, Việt Nam"}}发布于 2021-09-13 06:00:23
正如peak所建议的那样,您的输出不需要调用jq的流解析器。如果您能够有效地过滤出CSV输出的必需字段,那么您应该就可以了。
jq -r -cn '["address","lat","lon"], (inputs | [._source.name.default,._source.center_point.lat,._source.center_point.lon]) | @csv'发布于 2021-09-13 05:42:21
由于输入文件只是一个JSON实体流,而且似乎每个输出行都只依赖于其中一个实体,因此可以通过不使用-s命令行选项并相应地调整jq程序来最简单地避免内存问题。应该不需要使用-stream选项。
https://stackoverflow.com/questions/69157508
复制相似问题