Graph databases 通过Java API在OrientDB中的插入性能非常慢_Graph Databases_Orientdb

Graph databases 通过Java API在OrientDB中的插入性能非常慢

orientdb

Graph databases 通过Java API在OrientDB中的插入性能非常慢,graph-databases,orientdb,Graph Databases,Orientdb,我正在评估用图形数据库替换关系数据库。我正在尝试将数据从原始数据库复制到OrientDB（版本2.0）。我通过OServer server=OServerMain.create（）创建一个嵌入式服务器，将false传递给存储。useWAL。然后创建一个非事务性图： OrientBaseGraph graph = new OrientGraphNoTx("plocal:"+db); graph.declareIntent(new OIntentMassiveInsert()); graph.g

我正在评估用图形数据库替换关系数据库。我正在尝试将数据从原始数据库复制到OrientDB（版本2.0）。我通过

OServer server=OServerMain.create（）创建一个嵌入式服务器

，将false传递给

存储。useWAL

。然后创建一个非事务性图：

OrientBaseGraph graph = new OrientGraphNoTx("plocal:"+db);
graph.declareIntent(new OIntentMassiveInsert());
graph.getRawGraph().declareIntent(new OIntentMassiveInsert());
graph.getRawGraph().setValidationEnabled(false);

我为所有表创建顶点类型。我首先为每个顶点创建所有顶点，并在调用中使用它们的属性：

OrientVertex node = graph.addVertex("class:"+lbl, properties);

然后在这些顶点之间创建边。这些边中的某些（但不是全部）具有特性

if (props!=null){
    nfrom.addEdge(linkName, nto,null,null,props);
} else {
    nfrom.addEdge(linkName, nto);
}

我试过使用和不使用edge类，没有发现任何性能改进。总之，我有328822个顶点和831293条边。总运行时间最多约为25分钟！！大部分时间（至少20分钟）用于插入边，而不是顶点

在同一台机器上，从同一个关系数据库读取相同的数据，并使用BerkeleyDB后端在Titan中写入，我在2分钟内传输数据

是什么使OrientDB比竞争对手慢10倍？我做错了什么

谢谢

当您有多条边时，使用事务图和提交每X项要好得多。此外，禁用发送日志。例如：

OrientBaseGraph graph = new OrientGraph("plocal:"+db);
try{
  graph.getRawGraph().getTransaction().setUsingLog(false);

  int saved = 0;
  while(){ // this is your loop
    ....

    saved++;

    if( saved % 5000 == 0 ){
      graph.commit();
      graph.getRawGraph().getTransaction().setUsingLog(false);
    }
  }
  graph.commit();

} finally {
  graph.close();
}

嗯，文档的Graph API上的性能调优页面说“尽可能避免事务”。我确实在测试中使用了这样一个事务系统，没有使用

setUsingLog（false）

，也没有使用更小的批处理大小。如果我按照您的建议使用5000，我会得到

com.orientechnologies.orient.core.exception.ODatabaseException:保存记录时出错。。。原因：com.orientechnologies.orient.core.tx.OTransactionOptimistic.addRecord（OTransactionOptimistic.java:287）上的java.lang.StackOverflower错误

使用-Xss4M重试，这次没有StackOverflower错误，27分钟。所以没有收获。OrientDB仍然比Titan慢10倍。你能分享一下代码吗？你有什么样的输入文件？没有输入文件，我们从关系数据库中读取数据。所以不，我不能真正共享代码，但是同事们在随机插入带有大量边的顶点时获得了类似的性能。你也可以共享Titan代码（骨架）吗？那么我会有一份有效的推荐信。我使用OrientDB（作为嵌入式数据库）的经验与它的性能相比是非常好的。