用于相似度计算的Neo4j Cypher查询性能较慢
我对Neo4j/Graph数据库非常陌生,并尝试复制Cypher cookbook中的教程: 随机数据集包含100种食物和1500人,所有人都通过“时间”整数属性的ATE关系与食物相关。食物和人都有标签,并有“名称”属性,该属性由自动索引索引用于相似度计算的Neo4j Cypher查询性能较慢,neo4j,cypher,Neo4j,Cypher,我对Neo4j/Graph数据库非常陌生,并尝试复制Cypher cookbook中的教程: 随机数据集包含100种食物和1500人,所有人都通过“时间”整数属性的ATE关系与食物相关。食物和人都有标签,并有“名称”属性,该属性由自动索引索引 neo4j-sh (?)$ dbinfo -g "Primitive count" { "NumberOfNodeIdsInUse": 1600, "NumberOfPropertyIdsInUse": 151600, "NumberOfRel
neo4j-sh (?)$ dbinfo -g "Primitive count"
{
"NumberOfNodeIdsInUse": 1600,
"NumberOfPropertyIdsInUse": 151600,
"NumberOfRelationshipIdsInUse": 150000,
"NumberOfRelationshipTypeIdsInUse": 1
}
neo4j-sh (?)$ index --indexes
Node indexes:
node_auto_index
Relationship indexes:
relationship_auto_index
在neo4j shell中从cookbook运行修改后的查询永远不会完成(可能是因为太多节点/关系?)
有没有可能使这样的查询执行得更快,以便“实时”使用
系统和Neo4j
Windows 7(64位)、Intel Core I7-2600K、8GB RAM、SSD驱动器上的Neo4j数据库
Neo4j社区版:2.1.0-M01(也在2.0.1稳定版上测试)
neo4j社区。选项
-Xmx2048m
-Xms2048m
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M
node_auto_indexing=true
node_keys_indexable=name
relationship_auto_indexing=true
relationship_keys_indexable=times
neo4j.属性
-Xmx2048m
-Xms2048m
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M
node_auto_indexing=true
node_keys_indexable=name
relationship_auto_indexing=true
relationship_keys_indexable=times
(503kb压缩)
配置文件输出
ColumnFilter(symKeys=["similarity", "you", "you.name", "me", "me.name"], returnItemNames=["me.name", "you.name", "similarity"], _rows=100, _db_hits=0)
Sort(descr=["SortItem(similarity,false)"], _rows=100, _db_hits=0)
Extract(symKeys=["me", "you", "similarity"], exprKeys=["me.name", "you.name"], _rows=100, _db_hits=200)
ColumnFilter(symKeys=["me", "you", " INTERNAL_AGGREGATEcb085cf5-8982-4a83-ba3d-9642de570c59"], returnItemNames=["me", "you", "similarity"], _rows=100, _db_hits=0)
EagerAggregation(keys=["me", "you"], aggregates=["(INTERNAL_AGGREGATEcb085cf5-8982-4a83-ba3d-9642de570c59,Sum(Divide(Multiply(Subtract(Literal(1),AbsFunction(Subtract(Divide(Property(r1,times(1)),H1),Divide(Property(r2,times(1)),H2)))),Add(Property(r1,times(1)),Property(r2,times(1)))),Add(H1,H2))))"], _rows=100, _db_hits=40000)
SimplePatternMatcher(g="(you)-['r2']-(food),(me)-['r1']-(food)", _rows=10000, _db_hits=0)
ColumnFilter(symKeys=["me", "you", " INTERNAL_AGGREGATE677cd11c-ae53-4d7b-8df6-732ffed28bbf", " INTERNAL_AGGREGATEb5eb877c-de01-4e7a-9596-03cd94cfa47a"], returnItemNames=["me", "H1", "H2", "you"], _rows=100, _db_hits=0)
EagerAggregation(keys=["me", "you"], aggregates=["( INTERNAL_AGGREGATE677cd11c-ae53-4d7b-8df6-732ffed28bbf,Distinct(Count(r1),r1))", "( INTERNAL_AGGREGATEb5eb877c-de01-4e7a-9596-03cd94cfa47a,Distinct(Count(r2),r2))"], _rows=100, _db_hits=0)
SimplePatternMatcher(g="(you)-['r2']-(food),(me)-['r1']-(food)", _rows=10000, _db_hits=0)
ColumnFilter(symKeys=["me", "food", "you", "r2"], returnItemNames=["me", "you"], _rows=100, _db_hits=0)
Slice(limit="Literal(100)", _rows=100, _db_hits=0)
Filter(pred="NOT(me == you)", _rows=100, _db_hits=0)
SimplePatternMatcher(g="(you)-['r2']-(food)", _rows=100, _db_hits=0)
ColumnFilter(symKeys=["food", "me", "r1"], returnItemNames=["me", "food"], _rows=1, _db_hits=0)
Filter(pred="Property(me,name(0)) == {name}", _rows=1,_db_hits=148901)
TraversalMatcher(start={"label": "Person", "producer": "NodeByLabel", "identifiers": ["me"]}, trail="(me)-[r1:ATE WHERE true AND true]->(food)", _rows=148901, _db_hits=148901)
您多次执行相同的匹配。这个效果更好吗
EXPORT name="Florida Goyette"
MATCH (me:Person { name: {name}})-[r1:ATE]->(food)<-[r2:ATE]-(you:Person)
WITH me,r1,r2,count(DISTINCT r1) AS H1,count(DISTINCT r2) AS H2,you
LIMIT 100
RETURN SUM((1-ABS(r1.times/H1-r2.times/H2))*(r1.times+r2.times)/(H1+H2)) AS similarity;
导出名称=“佛罗里达戈耶特”
MATCH(me:Person{name:{name}})-[r1:ATE]>(食物)您使用的索引类型错误。使用创建标签索引
CREATE INDEX ON :Person(name)
使用检查模式索引和约束
neo4j外壳
schema
schema ls -l :User
或
neo4j浏览器
:schema
:schema ls -l :User
可能需要对查询进行优化,但从这里开始
MATCH(me:Person{name:{name})-[r1:ATE]->(food)(food)(food)(food)no.1行361023 ms
设置适当的索引后,它现在在1400ms内返回。但是结果是不准确的。你能分享结果吗?是的,当前的版本会找到所有路径和过滤器。谢谢。我误读了索引文档并使用了“遗留”索引。我已经将-Xmx和-Xms设置为3072m(目前在我的笔记本电脑上)。Windows“资源管理器”显示了neo4j-community.exe的提交3.4GB和工作集1.3GB,我按照@jjaderberg的建议更改了索引。第一次查询在700ms内返回149900行(第一次运行后)。第二个查询以140588ms的速度返回149000行(在第一次运行之后)。我无法使您的第三个查询正常工作,我得到语法异常(SyntaxException:无效输入“”。:应为空白)a+(1-ABS(r[0]。times/H1-r[1]。times/H2))*“
在:r[0]中指出的错误。我创建的times是您想要的吗?编辑:
:schema
:schema ls -l :User
MATCH (me:Person { name: {name}})-[r1:ATE]->(food)<-[r2:ATE]-(you:Person) RETURN count(*)
MATCH (me:Person { name: {name}})-[r1:ATE]->(food)<-[r2:ATE]-(you:Person)
WITH me,count(DISTINCT r1) AS H1,count(DISTINCT r2) AS H2,you
MATCH (me)-[r1:ATE]->(food)<-[r2:ATE]-(you)
RETURN COUNT(*)
MATCH (me:Person { name: {name}})-[r1:ATE]->(food)<-[r2:ATE]-(you:Person)
WITH me,
collect([r1,r2]) as rels,
count(DISTINCT r1) AS H1,
count(DISTINCT r2) AS H2,
you
RETURN me,you,
reduce(a=0,r in rels |
a + (1-ABS(r[0].times/H1-r[1].times/H2))*
(r[0].times+r[1].times)
/(H1+H2) as similarity