Neo4j性能挑战-如何改进?

Neo4j性能挑战-如何改进?,neo4j,cypher,Neo4j,Cypher,在过去的几周里,我一直在与Neo4J争论,试图解决一些极具挑战性的性能问题。在这一点上,我需要一些额外的帮助,因为我无法决定如何前进 我有一个总共约1250万个节点和6400万个关系的图。该图表的目的是分析可疑的金融行为,因此它是客户、账户、交易等 以下是性能挑战的一个示例: 对总节点的查询需要96064ms才能完成,这非常长 neo4j-sh (?)$ MATCH (n) RETURN count(n); +----------+ | count(n) | +----------+ | 12

在过去的几周里,我一直在与Neo4J争论,试图解决一些极具挑战性的性能问题。在这一点上,我需要一些额外的帮助,因为我无法决定如何前进

我有一个总共约1250万个节点和6400万个关系的图。该图表的目的是分析可疑的金融行为,因此它是客户、账户、交易等

以下是性能挑战的一个示例:

  • 对总节点的查询需要96064ms才能完成,这非常长

    neo4j-sh (?)$ MATCH (n) RETURN count(n);
    +----------+
    | count(n) |
    +----------+
    | 12519940 |
    +----------+
    1 row
    96064 ms
    
  • 对总关系的查询需要919449毫秒才能完成,这似乎很愚蠢

    neo4j-sh (?)$ MATCH ()-[r]-() return count(r);
    +----------+
    | count(r) |
    +----------+
    | 64062508 |
    +----------+
    1 row
    919449 ms
    
  • 我有660万个事务节点。当我试图搜索金额超过8000美元的交易时,查询耗时653637毫秒,这也太长了

    neo4j-sh (?)$ MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);        
    +----------+
    | count(t) |
    +----------+
    | 10696    |
    +----------+
    1 row
    653637 ms 
    
相关模式

 ON :Transaction(baseamount)    ONLINE                             
 ON :Transaction(type)          ONLINE                             
 ON :Transaction(amount)        ONLINE                             
 ON :Transaction(currency)      ONLINE                             
 ON :Transaction(basecurrency)  ONLINE                             
 ON :Transaction(transactionid) ONLINE (for uniqueness constraint)
查询配置文件:

neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);  
+----------+
| count(t) |
+----------+
| 10696    |
+----------+
1 row

ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +NodeByLabel

+------------------+---------+----------+-------------+------------------------------------------+
|         Operator |    Rows |   DbHits | Identifiers |                                    Other |
+------------------+---------+----------+-------------+------------------------------------------+
|     ColumnFilter |       1 |        0 |             |                    keep columns count(t) |
| EagerAggregation |       1 |        0 |             |                                          |
|           Filter |   10696 | 13216382 |             | Property(t,amount(62)) > {  AUTODOUBLE0} |
|      NodeByLabel | 6608191 |  6608192 |        t, t |                             :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
  • 我在neo4j外壳中运行这些

  • 这里的性能挑战开始对我是否可以使用Neo4J产生很大的怀疑,并且似乎与平台提供的潜力相反

  • 我完全承认我可能误解了某些东西(我对Neo4J比较陌生),因此非常感谢您提供关于修复什么或查看什么的指导

以下是我的设置的详细信息:

系统:Linux,Ubuntu,16GB内存,3.5i5proc,256GB SSD硬盘

CPU

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 4230.625
cache size  : 6144 KB
内存

$ cat /proc/meminfo
MemTotal:       16115020 kB
MemFree:          224856 kB
MemAvailable:    8807160 kB
Buffers:          124356 kB
Cached:          8429964 kB
SwapCached:         8388 kB
磁盘

$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/data1--vg-root  219G   32G  177G  16% /
Neo4J.属性

neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow

# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
Neo4J包装器.properties

neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow

# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144
其他:

neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);  
+----------+
| count(t) |
+----------+
| 10696    |
+----------+
1 row

ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +NodeByLabel

+------------------+---------+----------+-------------+------------------------------------------+
|         Operator |    Rows |   DbHits | Identifiers |                                    Other |
+------------------+---------+----------+-------------+------------------------------------------+
|     ColumnFilter |       1 |        0 |             |                    keep columns count(t) |
| EagerAggregation |       1 |        0 |             |                                          |
|           Filter |   10696 | 13216382 |             | Property(t,amount(62)) > {  AUTODOUBLE0} |
|      NodeByLabel | 6608191 |  6608192 |        t, t |                             :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
  • 将Linux的打开文件设置更改为40k

  • 我没有在这台机器上运行其他任何东西,没有X Windows,没有其他DB服务器。以下是运行查询时top的一个片段:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                
    15785 neo4j     20   0 12.192g 8.964g 2.475g S 100.2 58.3 227:50.98 java                                                                                                                   
    1 root      20   0   33464   2132   1140 S   0.0  0.0   0:02.36 init                                                                                                                   
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd
    
  • graph.db目录中的总文件大小为:

    data/graph.db$ du --max-depth=1 -h
    1.9G    ./schema
    36K ./index
    26G .
    
  • 数据加载非常随意。有些合并需要不到60秒的时间(即使是200到300K的插入),而有些合并需要持续3小时以上(对于一个日期合并189999行的CSV文件,需要11898514ms)

  • 我得到了恒定的GC线程阻塞:

    2015-03-27 14:56:26.347+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 15422ms.
    2015-03-27 14:56:39.011+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12363ms.
    2015-03-27 14:56:57.533+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 13969ms.
    2015-03-27 14:57:17.345+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 14657ms.
    2015-03-27 14:57:29.955+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12309ms.
    2015-03-27 14:58:14.311+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 1928ms.
    
请告诉我是否需要添加其他对讨论有重要意义的内容


更新1

非常感谢你的帮助,我刚搬家,所以我的回复被耽搁了

  • Neostore文件的大小:

    /data/graph.db$ ls -lah neostore.*
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.id
    -rw-rw-r-- 1 neo4j neo4j  110 Apr  2 13:03 neostore.labeltokenstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  874 Apr  2 13:03 neostore.labeltokenstore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 200M Apr  2 13:03 neostore.nodestore.db
    -rw-rw-r-- 1 neo4j neo4j   41 Apr  2 13:03 neostore.nodestore.db.id
    -rw-rw-r-- 1 neo4j neo4j   68 Apr  2 13:03 neostore.nodestore.db.labels
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.nodestore.db.labels.id
    -rw-rw-r-- 1 neo4j neo4j 2.8G Apr  2 13:03 neostore.propertystore.db
    -rw-rw-r-- 1 neo4j neo4j  128 Apr  2 13:03 neostore.propertystore.db.arrays
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.arrays.id
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.id
    -rw-rw-r-- 1 neo4j neo4j  720 Apr  2 13:03 neostore.propertystore.db.index
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.id
    -rw-rw-r-- 1 neo4j neo4j 3.1K Apr  2 13:03 neostore.propertystore.db.index.keys
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.keys.id
    -rw-rw-r-- 1 neo4j neo4j 1.7K Apr  2 13:03 neostore.propertystore.db.strings
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.strings.id
    -rw-rw-r-- 1 neo4j neo4j  47M Apr  2 13:03 neostore.relationshipgroupstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshipgroupstore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.1G Apr  2 13:03 neostore.relationshipstore.db
    -rw-rw-r-- 1 neo4j neo4j 1.6M Apr  2 13:03 neostore.relationshipstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  165 Apr  2 13:03 neostore.relationshiptypestore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.3K Apr  2 13:03 neostore.relationshiptypestore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 3.5K Apr  2 13:03 neostore.schemastore.db
    -rw-rw-r-- 1 neo4j neo4j   25 Apr  2 13:03 neostore.schemastore.db.id
    
  • 我读到映射内存设置被另一个缓存替换,我已经注释掉了这些设置

  • Java分析器

       JvmTop 0.8.0 alpha - 16:12:59,  amd64,  4 cpus, Linux 3.16.0-33, load avg 0.30
       http://code.google.com/p/jvmtop
    
       Profiling PID 4260:            org.neo4j.server.Bootstrapper 
    
        68.67% (    14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
        18.73% (     3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
         2.86% (     0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
         1.11% (     0.23s) org.neo4j.helpers.Counter.inc()
         0.87% (     0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
         0.65% (     0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
         0.63% (     0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
         0.62% (     0.13s) scala.collection.mutable.OpenHashMap.<init>()
         0.62% (     0.13s) scala.collection.mutable.AbstractSeq.<init>()
         0.62% (     0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
         0.61% (     0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
         0.61% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
         0.61% (     0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
         0.61% (     0.12s) scala.collection.AbstractTraversable.<init>()
         0.61% (     0.12s) scala.collection.immutable.List.toStream()
         0.60% (     0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
         0.57% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
         0.37% (     0.08s) org.parboiled.scala.Parser$class.rule()
         0.06% (     0.01s) scala.util.DynamicVariable.value()
    
    JVMTop0.8.0alpha-16:12:59,amd64,4个CPU,Linux 3.16.0-33,平均负载0.30
    http://code.google.com/p/jvmtop
    评测PID 4260:org.neo4j.server.Bootstrapper
    68.67%(14.01s)org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
    18.73%(3.82s)org.neo4j.kernel.impl.nioneo.store.StoreFailureException.()
    2.86%(0.58s)org.neo4j.kernel.impl.cache.ReferenceCache.put()
    1.11%(0.23s)org.neo4j.helpers.Counter.inc()
    0.87%(0.18s)org.neo4j.kernel.impl.cache.ReferenceCache.get()
    0.65%(0.13s)org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
    0.63%(0.13s)org.parboiled.scala.package$.getCurrentRuleMethod()
    0.62%(0.13s)scala.collection.mutable.OpenHashMap.()
    0.62%(0.13s)scala.collection.mutable.AbstractSeq.()
    0.62%(0.13s)org.neo4j.kernel.impl.cache.autolodingcache.get()
    0.61%(0.13s)scala.collection.TraversableLike$$anonfun$map$1.apply()
    0.61%(0.12s)org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
    0.61%(0.12s)org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.()
    0.61%(0.12s)scala.collection.AbstractTraversable.()
    0.61%(0.12s)scala.collection.immutable.List.toStream()
    0.60%(0.12s)org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
    0.57%(0.12s)org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
    0.37%(0.08s)org.parboiled.scala.Parser$class.rule()
    0.06%(0.01s)scala.util.DynamicVariable.value()
    

  • 不幸的是,架构索引(即使用
    createindex ON:Label(property)
    )创建的索引)还不支持大于/小于条件。因此,Neo4j会返回到使用给定标签扫描所有节点,并对其属性进行筛选。这当然很贵

    我认为有两种不同的方法可以解决这个问题:

    1) 如果您的情况始终具有预定义的最大粒度,例如10秒USDs,则可以建立类似于时间树的“金额树”(请参阅)

    2) 如果您事先不知道粒度,另一个选项是为amount属性设置手动或自动索引,请参阅。最简单的事情可能是使用自动索引。在
    neo4j.properties
    中设置以下选项:

    node_auto_indexing=true
    node_keys_indexable=amount
    
    请注意,这不会自动将所有现有事务添加到该索引中,它只会将自启用自动索引以来已写入的事务添加到该索引中

    您可以使用

    MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
    RETURN count(t)
    

    您是否可以显示
    neostore.*
    所有查询都是完整扫描,并且主要受磁盘速度的限制,是否每个查询都是您运行的第一个查询?你能测试一下磁盘性能吗?还有磁盘调度程序,应该是noop还是deadline。垃圾收集看起来也不太好。您是否可以进行线程转储(kill-3)并将探查器连接到您的neo实例,然后返回报告
    ,例如jvmtop.sh--profile
    。此外,您的合并和创建性能听起来不正确。你的机器似乎出了点问题,但总的来说是完全错误的