Solr Nutch2.3.1在注入、解析、获取和生成时挂起

Solr Nutch2.3.1在注入、解析、获取和生成时挂起,solr,hbase,nutch,gora,Solr,Hbase,Nutch,Gora,我读过各种SO线程,了解为什么在生成/注入/解析/获取时需要这么长时间(或挂起),但运气不好。以下线程中的解决方案我已经尝试过实现,但没有成功 (一) (二) 以及各种其他线程。 我使用的是Nutch2.3.1和HBase0.94.27。我一直在遵循和教程,我能够成功地建立。但当我发出任何nutch命令时,它就会挂断 以下是我在启动这些命令时获得的日志:- 注入命令 root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch injec

我读过各种SO线程,了解为什么在生成/注入/解析/获取时需要这么长时间(或挂起),但运气不好。以下线程中的解决方案我已经尝试过实现,但没有成功

(一)

(二)

以及各种其他线程。

我使用的是Nutch2.3.1和HBase0.94.27。我一直在遵循和教程,我能够成功地建立。但当我发出任何nutch命令时,它就会挂断

以下是我在启动这些命令时获得的日志:-

注入命令

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
生成命令

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
获取命令

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
解析命令

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
更新命令

root@ubuntu:~/apache-nutch-2.3.1/runtime/local# ./bin/nutch inject seed/urls.txt 
InjectorJob: starting at 2016-05-04 09:59:12
InjectorJob: Injecting urlDir: seed/urls.txt
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch generate -topN 40
GeneratorJob: starting at 2016-05-04 09:54:08
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch fetch -all
FetcherJob: starting at 2016-05-04 10:00:14
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch parse -all
ParserJob: starting at 2016-05-04 10:00:43
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: parsing all
root@ubuntu:~/apache-nutch-2.3.1/runtime/local# bin/nutch updatedb -all
DbUpdaterJob: starting at 2016-05-04 10:02:24
DbUpdaterJob: updatinging all
以下是HBase日志:-

 client /0:0:0:0:0:0:0:1:45216
2016-05-04 10:00:47,214 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000e, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,215 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:45216 which had sessionid 0x1547b2be4bc000e
2016-05-04 10:00:47,215 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1547b2be4bc000d, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2016-05-04 10:00:47,216 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:59934 which had sessionid 0x1547b2be4bc000d
2016-05-04 10:01:10,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000c, timeout of 40000ms exceeded
2016-05-04 10:01:10,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000c
2016-05-04 10:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000b, timeout of 40000ms exceeded
2016-05-04 10:01:22,003 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000b
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000e, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1547b2be4bc000d, timeout of 40000ms exceeded
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000e
2016-05-04 10:01:28,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1547b2be4bc000d
2016-05-04 10:02:25,195 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59938
2016-05-04 10:02:25,202 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59938
2016-05-04 10:02:25,204 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc000f with negotiated timeout 40000 for client /127.0.0.1:59938
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:59940
2016-05-04 10:02:25,822 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:59940
2016-05-04 10:02:25,825 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1547b2be4bc0010 with negotiated timeout 40000 for client /127.0.0.1:59940
2016-05-04 10:04:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
2016-05-04 10:04:28,372 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@25e5c862
2016-05-04 10:04:28,379 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 0 catalog row(s) and gc'd 0 unreferenced parent region(s)
2016-05-04 10:09:15,530 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=2.02 MB, free=243.82 MB, max=245.84 MB, blocks=3, accesses=27, hits=24, hitRatio=88.88%, , cachingAccesses=27, cachingHits=24, cachingHitsRatio=88.88%, , evictions=0, evicted=0, evictedPerRun=NaN
Hadoop.log

2016-05-04 10:42:18,132 INFO  crawl.InjectorJob - InjectorJob: starting at 2016-05-04 10:42:18
2016-05-04 10:42:18,134 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: seed/urls.txt
2016-05-04 10:42:18,527 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

到底是什么问题。我已正确配置了所有内容,但它仍然挂起为什么

您尝试过禁用ipv6并在Hadoop+HBase中仅使用ipv4吗?我现在使用Mongo替代HBase,原因是Hadoop+HBase太复杂了。谢谢你的评论。我现在在将数据索引到ES中时遇到一个错误。问题是。你能提供关于这方面的信息吗。