Neo4j 为什么在使用count()运行cypher查询时要花费10倍的时间?
我从以下查询开始:Neo4j 为什么在使用count()运行cypher查询时要花费10倍的时间?,neo4j,cypher,Neo4j,Cypher,我从以下查询开始: PROFILE MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->() MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage) WHERE NOT (SBase)-[:contains]->(SPrimePackage) RETURN
PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN PContains
LIMIT 10
我得到“119毫秒内5834分贝的总命中率”。该图正确显示了9个节点,以及连接它们的8条边。然后,我运行一个几乎相同的查询,只是返回count(distinct()):
这给出了“1771毫秒内1382270总db点击”。结果是正确的:8。但是,为什么count(distinct())速度会慢得多,成本也会更高?我应该用别的方法来做吗
我正在运行Neo4j 2.3.1
编辑1
为了确保我在比较苹果和苹果,并突出问题,这里有一对类似的查询和结果:
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN SPrimePackage
LIMIT 10
注意,它返回的是“SPrimePackage”,而不是原来的“PContains”。结果是“740毫秒内总共5834 db命中”
下面是与“count()”完全相同的查询:
结果是:“2731毫秒内总命中1382270分贝”。请注意,唯一的区别是“count()”。直觉上,我希望“count()”添加一个计数步骤,但显然它所做的远不止这些。为什么“count()”会触发所有这些额外的工作?[更新]
如果比较两个(已编辑)查询的配置文件
输出,您可能会发现唯一显著的区别是查询的COUNT()
版本中存在一个操作。聚合函数在实际执行聚合函数(在本例中为COUNT()
)之前,使用Aggregation
在内存中收集所有被聚合的数据。如果不使用聚合函数,则需要额外的工作
以下查询仍然使用COUNT()
来获取计数,但大大减少了必须聚合的数据,从而减少了在聚合
步骤中需要完成的工作量:
PROFILE
MATCH (SBase:Snapshot { timestamp:1454983481.304583 })
USING INDEX SBase:Snapshot(timestamp)
WHERE (SBase)-[:contains]->()
MATCH (s:Snapshot { timestamp:1454983521.642284 })-[:contains]->(SPrimePackage)
USING INDEX s:Snapshot(timestamp)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN COUNT(DISTINCT SPrimePackage)
LIMIT 10;
上述查询假设您已经在:Snapshot(timestamp)
上创建了索引,以大大加快对2个:Snapshot
节点的搜索速度:
使用一些简单的数据,我得到的配置文件是:
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| +ProduceResults | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | COUNT(DISTINCT SPrimePackage) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +Limit | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | Literal(10) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +EagerAggregation | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +AntiSemiApply | 1 | 7 | 0 | anon[180], s -- SBase, SPrimePackage | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(Into) | 1 | 0 | 34 | anon[266] -- SBase, SPrimePackage | (SBase)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 4 | 8 | 0 | SBase, SPrimePackage | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +CartesianProduct | 4 | 8 | 0 | SBase -- anon[180], SPrimePackage, s | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 8 | 10 | anon[180], SPrimePackage -- s | (s)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +NodeIndexSeek | 2 | 2 | 4 | s | :Snapshot(timestamp) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +SemiApply | 1 | 2 | 0 | SBase | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 0 | 2 | anon[112], anon[126] -- SBase | (SBase)-[:contains]->() |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 2 | 2 | 0 | SBase | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +NodeIndexSeek | 2 | 2 | 3 | SBase | :Snapshot(timestamp) |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
除使用索引外,上述查询:
SBase
包含的所有节点,因为我们只需要查找一个包含的节点,就可以识别匹配的SBase
节点。只要找到一个(SBase)-[:contains]->()
匹配项,semiply
操作就会完成,因此第一个match
子句将导致每个SBase
有一行,而不是N行。根据你问题中的信息,我猜N大概是8计数不是问题,区别是因为它在大量点击上耗费时间。我认为您正在寻找的解决方案是基于路径的查询,但我不擅长于此,我希望有人能提供您所需要的。您好@Supamiu谢谢您的评论。如果我用count()而不是count(distinct())运行查询,我会得到“1454毫秒内的1382270总db命中率”。然而,我的计数不再是我想要的:3296。有趣的是,3296平均除以8,这是我一直在寻找的答案。哦,我希望在没有明显差异的情况下得到更好的结果。。。尝试向timestamp属性添加索引。这绝对是一个有用的答案。一旦我将“MATCH Base=(SBase:Snapshot{timestamp:…})-[:contains]->()”更改为“MATCH(SBase:Snapshot{timestamp:…})”,结果是1290毫秒内总共有3363次db命中。这是更好的方法,谢谢。但是,严格来说,问题是“为什么SPrimePackage和count(SPrimePackage)之间存在差异?”?如果您可以修改您的答案,突出该问题的答案,我将接受。
PROFILE
MATCH (SBase:Snapshot { timestamp:1454983481.304583 })
USING INDEX SBase:Snapshot(timestamp)
WHERE (SBase)-[:contains]->()
MATCH (s:Snapshot { timestamp:1454983521.642284 })-[:contains]->(SPrimePackage)
USING INDEX s:Snapshot(timestamp)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN COUNT(DISTINCT SPrimePackage)
LIMIT 10;
CREATE INDEX ON :Snapshot(timestamp);
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| +ProduceResults | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | COUNT(DISTINCT SPrimePackage) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +Limit | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | Literal(10) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +EagerAggregation | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +AntiSemiApply | 1 | 7 | 0 | anon[180], s -- SBase, SPrimePackage | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(Into) | 1 | 0 | 34 | anon[266] -- SBase, SPrimePackage | (SBase)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 4 | 8 | 0 | SBase, SPrimePackage | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +CartesianProduct | 4 | 8 | 0 | SBase -- anon[180], SPrimePackage, s | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 8 | 10 | anon[180], SPrimePackage -- s | (s)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +NodeIndexSeek | 2 | 2 | 4 | s | :Snapshot(timestamp) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +SemiApply | 1 | 2 | 0 | SBase | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 0 | 2 | anon[112], anon[126] -- SBase | (SBase)-[:contains]->() |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 2 | 2 | 0 | SBase | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +NodeIndexSeek | 2 | 2 | 3 | SBase | :Snapshot(timestamp) |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+