我有两个CSV文件:
首先以以下格式包含~500米记录
id,姓名 10000023432,汤姆用户 13943423235,Blah人
第二个包含以下格式的~ 1.5B朋友关系
fromId,toId 1000002343213943423235
我使用OrientDB ETL工具从第一个CSV文件创建顶点。现在,我只需要在他们之间建立友谊联系。
到目前为止,我已经尝试了ETL json文件的多个配置,最近的一个是:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}但不幸的是,除了创建边缘之外,这还创建了具有"from“和" to”属性的顶点。
当我尝试删除顶点转换器时,ETL进程会抛出一个错误:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more我在这里错过了什么?
发布于 2016-04-26 12:04:32
您可以使用这些ETL变压器导入边缘:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]“合并”转换器将与相关人员记录一起加入当前的csv行(这有点奇怪,但出于某种原因,这需要将fromId与源人员关联起来)。
“字段”转换器将删除合并部分添加的csv字段。您也可以在没有“字段”转换器的情况下尝试导入,以查看不同之处。
发布于 2015-11-13 14:06:59
使用Java,您可以读取csv,然后创建边缘
String nomeYourDb = "nomeYourDb";
OServerAdmin serverAdmin;
try {
serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
if (serverAdmin.existsDatabase()) {
OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
String csvFile = "path_to_file";
BufferedReader br = null;
String line = "";
String cvsSplitBy = " "; // your separator
try {
br = new BufferedReader(new FileReader(csvFile));
int index=0;
while ((line = br.readLine()) != null) {
if(index==0){
index=1;
}
else{
String[] ids = line.split(cvsSplitBy);
String personFrom="(select from Person where id='"+ids[0]+"')";
String personTo="(select from Person where id='"+ids[1]+"')";
String query="create edge FriendsWith from "+personFrom+" to "+personTo;
g.command(new OCommandSQL(query)).execute();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
if (br != null) {
br.close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}https://stackoverflow.com/questions/33679571
复制相似问题