Cassandra 从具有数百万条边的节点进行遍历时超时_Cassandra_Graph Databases_Titan_Gremlin

Cassandra 从具有数百万条边的节点进行遍历时超时

cassandra gremlin

Cassandra 从具有数百万条边的节点进行遍历时超时,cassandra,graph-databases,titan,gremlin,Cassandra,Graph Databases,Titan,Gremlin,我有一个图，其中一些节点有数百万条事件边，在Cassandra DB上使用Titan 0.5.2。例如，这复制了这样的图形： mgmt = g.getManagementSystem() vidp = mgmt.makePropertyKey('vid').dataType(Integer.class).make() mgmt.buildIndex('by_vid',Vertex.class).addKey(vidp).buildCompositeIndex() mgmt.commit() d

我有一个图，其中一些节点有数百万条事件边，在Cassandra DB上使用Titan 0.5.2。例如，这复制了这样的图形：

mgmt = g.getManagementSystem()
vidp = mgmt.makePropertyKey('vid').dataType(Integer.class).make()
mgmt.buildIndex('by_vid',Vertex.class).addKey(vidp).buildCompositeIndex()
mgmt.commit()

def v0 = g.addVertex([vid: 0, type: 'start'])
def random = new Random()
for(i in 1..10000000) {
  def v = g.addVertex([vid: i, type: 'claim'])
  v.addEdge('is-a', v0)
  def n = random.nextInt(i)
  def vr = g.V('vid', n).next()
  v.addEdge('test', vr) 
  if (i%10000 == 0) { g.commit(); }
}

所以我们有10万个顶点，它们都链接到v0，并且在顶点之间有一些随机链接。这个查询：

g.V（'vid'，0）。in（'is-a'）[0]

-工作正常，

g.V（'vid'，0）。in（'is-a'）[100]

或

g.V（'vid'，0）。in（'is-a'）[1000]

。但是，如果我尝试进一步遍历，例如，

g.V（'vid'，0）.in（'is-a'）.out（'test'）[0]

，则查找会被卡住，最终我会从Cassandra获得读取超时异常：

com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception

Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after Duration[4000 ms]
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:86)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)

Caused by: com.netflix.astyanax.connectionpool.exceptions.TimeoutException: TimeoutException: [host=127.0.0.1(127.0.0.1):9160, latency=10000(10001), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:188)
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:

我在Cassandra进程上也得到了高负载，它变得没有响应（即，尝试连接到它时返回超时）。所以，我的问题是，为什么即使实际有很多节点的步骤很好，也不可能从该节点进一步遍历？我如何才能使其工作？

看起来您已经有效地模拟了一个超级节点。当你调用函数时

g.V('vid', 0).in('is-a')[0]

您只请求一个对象，这是一个快速查找。同样地：

g.V('vid', 0).in('is-a')[100]

也只请求一个对象，这仍然很快。进行查询时：

g.V('vid', 0).in('is-a').out('test')[0]

您刚刚请求“从一百万个顶点中找到从输出边连接的所有顶点，并返回第一个”。它要做的第一步是遍历所有一百万条边，然后才能返回您请求的“第一个”顶点。尝试这样做：

g.V('vid', 0).in('is-a')[0].out('test')[0]

这不会遍历所有一百万个顶点