我正在为我一直从事的一个项目将几个结核病的CSV数据导入到Neo4J中。我有足够的快速存储来估计6.6TiB,但是机器只有32 to的内存,导入工具建议203 to来完成导入。
当我运行导入时,我会看到以下内容(我假设它退出了,因为它内存不足)。有任何方法可以用有限的内存导入这个大数据集吗?或者,如果没有有限的内存我有,与最大~128 if的主板,这台机器可以支持。
Available resources:
Total machine memory: 30.73GiB
Free machine memory: 14.92GiB
Max heap memory : 6.828GiB
Processors: 16
Configured max memory: 21.51GiB
High-IO: true
WARNING: estimated number of nodes 37583174424 may exceed capacity 34359738367 of selected record format
WARNING: 14.62GiB memory may not be sufficient to complete this import. Suggested memory distribution is:
heap size: 5.026GiB
minimum free and available memory excluding heap size: 202.6GiB
Import starting 2022-10-08 19:01:43.942+0000
Estimated number of nodes: 15.14 G
Estimated number of node properties: 97.72 G
Estimated number of relationships: 37.58 G
Estimated number of relationship properties: 0.00
Estimated disk space usage: 6.598TiB
Estimated required memory usage: 202.6GiB
(1/4) Node import 2022-10-08 19:01:43.953+0000
Estimated number of nodes: 15.14 G
Estimated disk space usage: 5.436TiB
Estimated required memory usage: 202.6GiB
.......... .......... .......... .......... .......... 5% ∆1h 38m 2s 867ms
neo4j@79d2b0538617:~/import$发布于 2022-10-10 19:18:40
如果您试图遵循操作手册: Neo4j管理进口,并且您的csv与该示例中的movies.csv匹配,我建议您执行一个更手动的USING PERIODIC COMMIT LOAD CSV...
neo4j/import/myfile.csv上。接下来,打开一个浏览器实例,运行以下操作(根据您的数据进行调整),并将其保留到明天:
USING PERIODIC COMMIT LOAD CSV FROM 'file:///myfile.csv' AS line
WITH line[3] AS nodeLabels, {
id: line[0],
title: line[1],
year: toInteger(line[2])
} AS nodeProps
apoc.create.node(SPLIT(line[3],';',注意:有很多方法可以解决这个问题,这取决于您的源数据和您希望创建的模型。此解决方案只为您提供一些工具,以帮助您绕过内存限制。如果它是一个简单的CSV,并且您不关心节点最初得到什么标签,并且您有标题,那么您可以跳过复杂的APOC,并且可能只执行以下操作:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///myfile.csv' AS line
CREATE (a :ImportedNode)
SET a = line每个标签的文件
原版Asker提到每个标签都有一个单独的csv。在这种情况下,拥有一个可以处理所有这些的大型单命令可能会有所帮助,而不是需要手动执行操作的每一步。
假设有两种标签类型,每种类型都具有唯一的“id”属性,另一种类型的“parent_id”引用另一种标签.
UNWIND [
{ file: 'country.csv', label: 'Country'},
{ file: 'city.csv', label: 'City'}
] AS importFile
USING PERIODIC COMMIT LOAD CSV FROM 'file:///' + importFile.file AS line
CALL apoc.merge.node([importFile.label], {id: line.id}) YIELD node
SET node = line
;
// then build the relationships
MATCH (city :City)
WHERE city.parent_id
MATCH (country :Country {id: city.parent_id)
MERGE (city)-[:IN]->(country)https://stackoverflow.com/questions/74005267
复制相似问题