Neo4j性能挑战-如何改进?
在过去的几周里,我一直在与Neo4J争论,试图解决一些极具挑战性的性能问题。在这一点上,我需要一些额外的帮助,因为我无法决定如何前进 我有一个总共约1250万个节点和6400万个关系的图。该图表的目的是分析可疑的金融行为,因此它是客户、账户、交易等 以下是性能挑战的一个示例:Neo4j性能挑战-如何改进?,neo4j,cypher,Neo4j,Cypher,在过去的几周里,我一直在与Neo4J争论,试图解决一些极具挑战性的性能问题。在这一点上,我需要一些额外的帮助,因为我无法决定如何前进 我有一个总共约1250万个节点和6400万个关系的图。该图表的目的是分析可疑的金融行为,因此它是客户、账户、交易等 以下是性能挑战的一个示例: 对总节点的查询需要96064ms才能完成,这非常长 neo4j-sh (?)$ MATCH (n) RETURN count(n); +----------+ | count(n) | +----------+ | 12
- 对总节点的查询需要96064ms才能完成,这非常长
neo4j-sh (?)$ MATCH (n) RETURN count(n); +----------+ | count(n) | +----------+ | 12519940 | +----------+ 1 row 96064 ms
- 对总关系的查询需要919449毫秒才能完成,这似乎很愚蠢
neo4j-sh (?)$ MATCH ()-[r]-() return count(r); +----------+ | count(r) | +----------+ | 64062508 | +----------+ 1 row 919449 ms
- 我有660万个事务节点。当我试图搜索金额超过8000美元的交易时,查询耗时653637毫秒,这也太长了
neo4j-sh (?)$ MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t); +----------+ | count(t) | +----------+ | 10696 | +----------+ 1 row 653637 ms
ON :Transaction(baseamount) ONLINE
ON :Transaction(type) ONLINE
ON :Transaction(amount) ONLINE
ON :Transaction(currency) ONLINE
ON :Transaction(basecurrency) ONLINE
ON :Transaction(transactionid) ONLINE (for uniqueness constraint)
查询配置文件:
neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+NodeByLabel
+------------------+---------+----------+-------------+------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+---------+----------+-------------+------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(t) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10696 | 13216382 | | Property(t,amount(62)) > { AUTODOUBLE0} |
| NodeByLabel | 6608191 | 6608192 | t, t | :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
- 我在neo4j外壳中运行这些
- 这里的性能挑战开始对我是否可以使用Neo4J产生很大的怀疑,并且似乎与平台提供的潜力相反
- 我完全承认我可能误解了某些东西(我对Neo4J比较陌生),因此非常感谢您提供关于修复什么或查看什么的指导
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
stepping : 3
microcode : 0x12
cpu MHz : 4230.625
cache size : 6144 KB
内存
$ cat /proc/meminfo
MemTotal: 16115020 kB
MemFree: 224856 kB
MemAvailable: 8807160 kB
Buffers: 124356 kB
Cached: 8429964 kB
SwapCached: 8388 kB
磁盘
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data1--vg-root 219G 32G 177G 16% /
Neo4J.属性
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties
#********************************************************************
# JVM Parameters
#********************************************************************
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow
# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
Neo4J包装器.properties
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties
#********************************************************************
# JVM Parameters
#********************************************************************
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow
# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
其他:
neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);
+----------+
| count(t) |
+----------+
| 10696 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+NodeByLabel
+------------------+---------+----------+-------------+------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+---------+----------+-------------+------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(t) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10696 | 13216382 | | Property(t,amount(62)) > { AUTODOUBLE0} |
| NodeByLabel | 6608191 | 6608192 | t, t | :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
- 将Linux的打开文件设置更改为40k
- 我没有在这台机器上运行其他任何东西,没有X Windows,没有其他DB服务器。以下是运行查询时top的一个片段:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15785 neo4j 20 0 12.192g 8.964g 2.475g S 100.2 58.3 227:50.98 java 1 root 20 0 33464 2132 1140 S 0.0 0.0 0:02.36 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
- graph.db目录中的总文件大小为:
data/graph.db$ du --max-depth=1 -h 1.9G ./schema 36K ./index 26G .
- 数据加载非常随意。有些合并需要不到60秒的时间(即使是200到300K的插入),而有些合并需要持续3小时以上(对于一个日期合并189999行的CSV文件,需要11898514ms)
- 我得到了恒定的GC线程阻塞:
2015-03-27 14:56:26.347+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 15422ms. 2015-03-27 14:56:39.011+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12363ms. 2015-03-27 14:56:57.533+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 13969ms. 2015-03-27 14:57:17.345+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 14657ms. 2015-03-27 14:57:29.955+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12309ms. 2015-03-27 14:58:14.311+0000 WARN [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 1928ms.
更新1 非常感谢你的帮助,我刚搬家,所以我的回复被耽搁了
/data/graph.db$ ls -lah neostore.*
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.id
-rw-rw-r-- 1 neo4j neo4j 110 Apr 2 13:03 neostore.labeltokenstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.id
-rw-rw-r-- 1 neo4j neo4j 874 Apr 2 13:03 neostore.labeltokenstore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.labeltokenstore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 200M Apr 2 13:03 neostore.nodestore.db
-rw-rw-r-- 1 neo4j neo4j 41 Apr 2 13:03 neostore.nodestore.db.id
-rw-rw-r-- 1 neo4j neo4j 68 Apr 2 13:03 neostore.nodestore.db.labels
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.nodestore.db.labels.id
-rw-rw-r-- 1 neo4j neo4j 2.8G Apr 2 13:03 neostore.propertystore.db
-rw-rw-r-- 1 neo4j neo4j 128 Apr 2 13:03 neostore.propertystore.db.arrays
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.arrays.id
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.id
-rw-rw-r-- 1 neo4j neo4j 720 Apr 2 13:03 neostore.propertystore.db.index
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.id
-rw-rw-r-- 1 neo4j neo4j 3.1K Apr 2 13:03 neostore.propertystore.db.index.keys
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.index.keys.id
-rw-rw-r-- 1 neo4j neo4j 1.7K Apr 2 13:03 neostore.propertystore.db.strings
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.propertystore.db.strings.id
-rw-rw-r-- 1 neo4j neo4j 47M Apr 2 13:03 neostore.relationshipgroupstore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshipgroupstore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.1G Apr 2 13:03 neostore.relationshipstore.db
-rw-rw-r-- 1 neo4j neo4j 1.6M Apr 2 13:03 neostore.relationshipstore.db.id
-rw-rw-r-- 1 neo4j neo4j 165 Apr 2 13:03 neostore.relationshiptypestore.db
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.id
-rw-rw-r-- 1 neo4j neo4j 1.3K Apr 2 13:03 neostore.relationshiptypestore.db.names
-rw-rw-r-- 1 neo4j neo4j 9 Apr 2 13:03 neostore.relationshiptypestore.db.names.id
-rw-rw-r-- 1 neo4j neo4j 3.5K Apr 2 13:03 neostore.schemastore.db
-rw-rw-r-- 1 neo4j neo4j 25 Apr 2 13:03 neostore.schemastore.db.id
JvmTop 0.8.0 alpha - 16:12:59, amd64, 4 cpus, Linux 3.16.0-33, load avg 0.30
http://code.google.com/p/jvmtop
Profiling PID 4260: org.neo4j.server.Bootstrapper
68.67% ( 14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
18.73% ( 3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
2.86% ( 0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
1.11% ( 0.23s) org.neo4j.helpers.Counter.inc()
0.87% ( 0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
0.65% ( 0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
0.63% ( 0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
0.62% ( 0.13s) scala.collection.mutable.OpenHashMap.<init>()
0.62% ( 0.13s) scala.collection.mutable.AbstractSeq.<init>()
0.62% ( 0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
0.61% ( 0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
0.61% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
0.61% ( 0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
0.61% ( 0.12s) scala.collection.AbstractTraversable.<init>()
0.61% ( 0.12s) scala.collection.immutable.List.toStream()
0.60% ( 0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
0.57% ( 0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
0.37% ( 0.08s) org.parboiled.scala.Parser$class.rule()
0.06% ( 0.01s) scala.util.DynamicVariable.value()
JVMTop0.8.0alpha-16:12:59,amd64,4个CPU,Linux 3.16.0-33,平均负载0.30
http://code.google.com/p/jvmtop
评测PID 4260:org.neo4j.server.Bootstrapper
68.67%(14.01s)org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
18.73%(3.82s)org.neo4j.kernel.impl.nioneo.store.StoreFailureException.()
2.86%(0.58s)org.neo4j.kernel.impl.cache.ReferenceCache.put()
1.11%(0.23s)org.neo4j.helpers.Counter.inc()
0.87%(0.18s)org.neo4j.kernel.impl.cache.ReferenceCache.get()
0.65%(0.13s)org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
0.63%(0.13s)org.parboiled.scala.package$.getCurrentRuleMethod()
0.62%(0.13s)scala.collection.mutable.OpenHashMap.()
0.62%(0.13s)scala.collection.mutable.AbstractSeq.()
0.62%(0.13s)org.neo4j.kernel.impl.cache.autolodingcache.get()
0.61%(0.13s)scala.collection.TraversableLike$$anonfun$map$1.apply()
0.61%(0.12s)org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
0.61%(0.12s)org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.()
0.61%(0.12s)scala.collection.AbstractTraversable.()
0.61%(0.12s)scala.collection.immutable.List.toStream()
0.60%(0.12s)org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
0.57%(0.12s)org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
0.37%(0.08s)org.parboiled.scala.Parser$class.rule()
0.06%(0.01s)scala.util.DynamicVariable.value()
不幸的是,架构索引(即使用
createindex ON:Label(property)
)创建的索引)还不支持大于/小于条件。因此,Neo4j会返回到使用给定标签扫描所有节点,并对其属性进行筛选。这当然很贵
我认为有两种不同的方法可以解决这个问题:
1) 如果您的情况始终具有预定义的最大粒度,例如10秒USDs,则可以建立类似于时间树的“金额树”(请参阅)
2) 如果您事先不知道粒度,另一个选项是为amount属性设置手动或自动索引,请参阅。最简单的事情可能是使用自动索引。在neo4j.properties
中设置以下选项:
node_auto_indexing=true
node_keys_indexable=amount
请注意,这不会自动将所有现有事务添加到该索引中,它只会将自启用自动索引以来已写入的事务添加到该索引中
您可以使用
MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
RETURN count(t)
您是否可以显示
neostore.*
所有查询都是完整扫描,并且主要受磁盘速度的限制,是否每个查询都是您运行的第一个查询?你能测试一下磁盘性能吗?还有磁盘调度程序,应该是noop还是deadline。垃圾收集看起来也不太好。您是否可以进行线程转储(kill-3)并将探查器连接到您的neo实例,然后返回报告,例如jvmtop.sh--profile
。此外,您的合并和创建性能听起来不正确。你的机器似乎出了点问题,但总的来说是完全错误的