Import neo4j管理导入非常慢

Import neo4j管理导入非常慢,import,neo4j,Import,Neo4j,我正在用yelp挑战数据集尝试neo4j,我感兴趣的一个方面是批量导入。 不幸的是,导入需要花费大量的时间,最后我得到了一个内存错误。 导入对于节点来说大部分都很顺利,然后在65%到70%的关系中开始变慢,然后在出现上述错误的情况下完成。 我在conf文件中设置了以下内容: dbms.memory.heap.initial_size=5g,dbms.memory.heap.max_size=10g,dbms.memory.pagecache.size=10g sudo neo4j-admin i

我正在用yelp挑战数据集尝试neo4j,我感兴趣的一个方面是批量导入。 不幸的是,导入需要花费大量的时间,最后我得到了一个内存错误。 导入对于节点来说大部分都很顺利,然后在65%到70%的关系中开始变慢,然后在出现上述错误的情况下完成。 我在conf文件中设置了以下内容: dbms.memory.heap.initial_size=5g,dbms.memory.heap.max_size=10g,dbms.memory.pagecache.size=10g

sudo neo4j-admin import --mode=csv --nodes:Business "node_business_headers.csv,node_business.csv" \
--nodes:Categories "node_category_headers.csv,node_category.csv" \
--nodes:User "node_user_headers.csv,node_user.csv" \
--nodes:Review "node_review_headers.csv,node_review.csv" \
--relationships:IS_FRIEND_WITH "edge_friends_headers.csv,edge_friends.csv" \
--relationships:WROTE "edge_wrote_review_headers.csv,edge_wrote_review.csv" \
--relationships:ABOUT "edge_about_business_headers.csv,edge_about_business.csv" \
--relationships:BELONG_TO "edge_belongto_category_headers.csv,edge_belongto_category.csv" \
--ignore-missing-nodes --database=mygraph.db
Neo4j version: 3.4.5
Importing the contents of these files into /var/lib/neo4j/data/databases/mygraph.db:
Nodes:
:Business
/home/user/graph_data/yelp_challenge/data/node_business_headers.csv
/home/user/graph_data/yelp_challenge/data/node_business.csv

:Categories
/home/user/graph_data/yelp_challenge/data/node_category_headers.csv
/home/user/graph_data/yelp_challenge/data/node_category.csv

:User
/home/user/graph_data/yelp_challenge/data/node_user_headers.csv
/home/user/graph_data/yelp_challenge/data/node_user.csv

:Review
/home/user/graph_data/yelp_challenge/data/node_review_headers.csv
/home/user/graph_data/yelp_challenge/data/node_review.csv
Relationships:
:IS_FRIEND_WITH
/home/user/graph_data/yelp_challenge/data/edge_friends_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_friends.csv

:WROTE
/home/user/graph_data/yelp_challenge/data/edge_wrote_review_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_wrote_review.csv

:ABOUT
/home/user/graph_data/yelp_challenge/data/edge_about_business_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_about_business.csv

:BELONG_TO
/home/user/graph_data/yelp_challenge/data/edge_belongto_category_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_belongto_category.csv

Available resources:
Total machine memory: 31.26 GB
Free machine memory: 24.63 GB
Max heap memory : 6.95 GB
Processors: 16
Configured max memory: 21.88 GB
High-IO: false

Import starting 2018-08-16 23:09:15.820+0100
Estimated number of nodes: 6.76 M
Estimated number of node properties: 36.60 M
Estimated number of relationships: 60.82 M
Estimated number of relationship properties: 0.00 
Estimated disk space usage: 2.75 GB
Estimated required memory usage: 1.08 GB

InteractiveReporterInteractions command list (end with ENTER):
c: Print more detailed information about current stage
i: Print more detailed information

(1/4) Node import 2018-08-16 23:09:15.833+0100
Estimated number of nodes: 6.76 M
Estimated disk space usage: 848.51 MB
Estimated required memory usage: 1.08 GB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

(2/4) Relationship import 2018-08-16 23:09:22.174+0100
Estimated number of relationships: 60.82 M
Estimated disk space usage: 1.93 GB
Estimated required memory usage: 1.07 GB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%


IMPORT DONE in 25m 43s 310ms. 
Data statistics is not available.
Peak memory usage: 1.07 GB
There were bad entries which were skipped and logged into /home/user/graph_data/yelp_challenge/data/import.report
WARNING Import failed. The store files in /var/lib/neo4j/data/databases/mygraph.db are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.csv.reader.Extractors$StringExtractor.extract0(Extractors.java:427)
at org.neo4j.csv.reader.Extractors$AbstractSingleValueExtractor.extract(Extractors.java:360)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:305)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:311)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:112)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
请尝试以下操作:

  • 检查是否正在创建
    import.report
    文件,以及文件是否很大
  • 在调用导入之前,请尝试将
    HEAP\u SIZE
    env变量设置为10g
  • 我从文档中看到,最好将neo4j.conf中的initial和max heap保持为相同的值,以避免不必要的垃圾收集

  • 答案应该是:答案。你的措辞似乎是在问其他问题(在这种情况下,你应该用评论来澄清)。