Java 点火.启动()卡在警告中;仍在等待初始分区映射交换“;

Java 点火.启动()卡在警告中;仍在等待初始分区映射交换“;,java,apache-spark,ignite,Java,Apache Spark,Ignite,当我们试图启动点火栅极时,它卡住了。当它被卡住时,我观察到日志中出现以下警告: WARN 9981874 90 2019-11-22 23:52:35 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi exchange-worker-#658%ReplicatedGrid_application_1571067740090_0645_1% Failed to connect share

当我们试图启动点火栅极时,它卡住了。当它被卡住时,我观察到日志中出现以下警告:

WARN  9981874     90     2019-11-22 23:52:35     org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi     exchange-worker-#658%ReplicatedGrid_application_1571067740090_0645_1%     Failed to connect shared memory endpoint to port (is shared memory server endpoint up and running?): 48100
WARN  9981874     90     2019-11-22 23:53:05     org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager     Executor task launch worker-2     Failed to wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.

After that it starts writing below warning continuously :

WARN  9981874     90     2019-11-22 23:53:35     org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager     Executor task launch worker-2     Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, reassign=false, discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c54ef27b-049e-466e-a594-3fffb01bd1e3, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.141.26.13:47512, /127.0.0.1:47512, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47512, /10.142.225.122:47512], discPort=47512, order=14, intOrder=9, lastExchangeTime=1574447015048, loc=true, ver=1.7.0#20160801-sha1:383273e3, isClient=false], topVer=14, nodeId8=c54ef27b, msg=null, type=NODE_JOINED, tstamp=1574446951740], crd=TcpDiscoveryNode [id=81757f48-d3e7-466d-9058-8edc84496f4f, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47511, /10.141.26.13:47511, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47511, /10.142.225.122:47511], discPort=47511, order=1, intOrder=1, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0], nodeId=c54ef27b, evt=NODE_JOINED], added=false, initFut=GridFutureAdapter [resFlag=0, res=null, startTime=1574446955042, endTime=0, ignoreInterrupts=false, state=INIT], init=false, topSnapshot=null, lastVer=null, partReleaseFut=null, affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, initTs=1574446955052, centralizedAff=false, evtLatch=0, remaining=[88ed2a8c-c0b5-4b93-b3e4-529d00d5b118, 81757f48-d3e7-466d-9058-8edc84496f4f, 6727f535-2e69-4520-a1e0-8b2a2380134d], srvNodes=[TcpDiscoveryNode [id=81757f48-d3e7-466d-9058-8edc84496f4f, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47511, /10.141.26.13:47511, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47511, /10.142.225.122:47511], discPort=47511, order=1, intOrder=1, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], TcpDiscoveryNode [id=6727f535-2e69-4520-a1e0-8b2a2380134d, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.142.225.122:47515, /127.0.0.1:47515, /10.141.26.13:47515, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47515], discPort=47515, order=5, intOrder=4, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], TcpDiscoveryNode [id=88ed2a8c-c0b5-4b93-b3e4-529d00d5b118, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.142.225.122:47516, /10.141.26.13:47516, /127.0.0.1:47516, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47516], discPort=47516, order=8, intOrder=6, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], TcpDiscoveryNode [id=c54ef27b-049e-466e-a594-3fffb01bd1e3, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.141.26.13:47512, /127.0.0.1:47512, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47512, /10.142.225.122:47512], discPort=47512, order=14, intOrder=9, lastExchangeTime=1574447015048, loc=true, ver=1.7.0#20160801-sha1:383273e3, isClient=false]], super=GridFutureAdapter [resFlag=0, res=null, startTime=1574446955042, endTime=0, ignoreInterrupts=false, state=INIT]]]
上述警告可能持续15小时,我们必须重新启动ignite服务器。有时它工作得很好。 我试着在谷歌上搜索,但没有找到任何相关的答案。请帮助理解这个问题。 谢谢

我还观察到,同一台机器上有许多节点具有相同的IP地址,但端口不同

Below is the logs where we can see this:

 - :19/11/18 18:46:00 WARN GridCachePartitionExchangeManager: Still
   waiting for initial partition map exchange
   [fut=GridDhtPartitionsExchangeFuture [dummy=false,
   forcePreload=false, reassign=false, discoEvt=DiscoveryEvent
   [evtNode=TcpDiscoveryNode [id=3e9e1886-6e9c-491c-8145-e73c3f46ce8b,
   addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/10.142.225.62:47520, /0:0:0:0:0:0:0:1%lo:47520,
   /127.0.0.1:47520, rocraappsl11.rjil.ril.com/10.141.180.83:47520,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47520,
   /10.141.173.24:47520], discPort=47520, order=25, intOrder=17,
   lastExchangeTime=1574082959655, loc=true,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], topVer=25,
   nodeId8=3e9e1886, msg=null, type=NODE_JOINED, tstamp=1574077885508],

        crd=TcpDiscoveryNode [id=cc93a5f4-9ebb-4525-ab85-faf1fdbc0512, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/0:0:0:0:0:0:0:1%lo:47511, /127.0.0.1:47511,
   /10.141.173.24:47511, /10.142.225.62:47511,
   rocraappsl11.rjil.ril.com/10.141.180.83:47511,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47511], discPort=47511,
   order=1, intOrder=1, lastExchangeTime=1574077885387, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false],
   exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
   [topVer=25, minorTopVer=0], nodeId=3e9e1886, evt=NODE_JOINED],
   added=false, initFut=GridFutureAdapter [resFlag=0, res=null,
   startTime=1574077889490, endTime=0, ignoreInterrupts=false,
   state=INIT], init=false, topSnapshot=null, lastVer=null,
   partReleaseFut=null, affChangeMsg=null, skipPreload=false,
   clientOnlyExchange=false, initTs=1574077889500, centralizedAff=false,
   evtLatch=0, remaining=[9058a0e1-3273-4332-8e22-5fb684454a19,
   cc93a5f4-9ebb-4525-ab85-faf1fdbc0512,
   543ff498-148f-47d4-975b-c89d6615298b,
   33c47d08-a4f2-401f-98b2-0506c0166597,
   939e6462-4f17-455f-ad45-9d9e20d4543e,
   ca1b4061-60da-4d3f-8145-9aa12fca7ee6,
   12fa00f8-3512-4a84-b916-16d6f3534a9f,
   be853f0a-ec96-45eb-9d30-9e407606efac], 

        srvNodes=


        [TcpDiscoveryNode [id=cc93a5f4-9ebb-4525-ab85-faf1fdbc0512, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/0:0:0:0:0:0:0:1%lo:47511, /127.0.0.1:47511,
   /10.141.173.24:47511, /10.142.225.62:47511,
   rocraappsl11.rjil.ril.com/10.141.180.83:47511,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47511], discPort=47511,
   order=1, intOrder=1, lastExchangeTime=1574077885387, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=543ff498-148f-47d4-975b-c89d6615298b, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/0:0:0:0:0:0:0:1%lo:47514,
   rocraappsl11.rjil.ril.com/10.141.180.83:47514, /127.0.0.1:47514,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47514,
   /10.141.173.24:47514, /10.142.225.62:47514], discPort=47514, order=3,
   intOrder=3, lastExchangeTime=1574077885387, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false],

        TcpDiscoveryNode [id=12fa00f8-3512-4a84-b916-16d6f3534a9f, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[rocraappsl11.rjil.ril.com/10.141.180.83:47515,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47515,
   /0:0:0:0:0:0:0:1%lo:47515, /127.0.0.1:47515, /10.141.173.24:47515,
   /10.142.225.62:47515], discPort=47515, order=4, intOrder=4,
   lastExchangeTime=1574077885397, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=9058a0e1-3273-4332-8e22-5fb684454a19, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[rocraappsl11.rjil.ril.com/10.141.180.83:47516,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47516,
   /10.141.173.24:47516, /10.142.225.62:47516,
   /0:0:0:0:0:0:0:1%lo:47516, /127.0.0.1:47516], discPort=47516,
   order=5, intOrder=5, lastExchangeTime=1574077885397, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=be853f0a-ec96-45eb-9d30-9e407606efac, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47517,
   /10.141.173.24:47517, /10.142.225.62:47517,
   /0:0:0:0:0:0:0:1%lo:47517,
   rocraappsl11.rjil.ril.com/10.141.180.83:47517, /127.0.0.1:47517],
   discPort=47517, order=6, intOrder=6, lastExchangeTime=1574077885397,
   loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false],

        TcpDiscoveryNode [id=939e6462-4f17-455f-ad45-9d9e20d4543e, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/0:0:0:0:0:0:0:1%lo:47519, /127.0.0.1:47519,
   /10.141.173.24:47519, /10.142.225.62:47519,
   rocraappsl11.rjil.ril.com/10.141.180.83:47519,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47519], discPort=47519,
   order=7, intOrder=7, lastExchangeTime=1574077885397, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=ca1b4061-60da-4d3f-8145-9aa12fca7ee6, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47518,
   /10.141.173.24:47518, /10.142.225.62:47518,
   /0:0:0:0:0:0:0:1%lo:47518,
   rocraappsl11.rjil.ril.com/10.141.180.83:47518, /127.0.0.1:47518],
   discPort=47518, order=14, intOrder=11,
   lastExchangeTime=1574077885397, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=33c47d08-a4f2-401f-98b2-0506c0166597, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/0:0:0:0:0:0:0:1%lo:47512, /10.142.225.62:47512,
   /127.0.0.1:47512, rocraappsl11.rjil.ril.com/10.141.180.83:47512,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47512,
   /10.141.173.24:47512], discPort=47512, order=18, intOrder=13,
   lastExchangeTime=1574077885397, loc=false,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false], 

        TcpDiscoveryNode [id=3e9e1886-6e9c-491c-8145-e73c3f46ce8b, addrs=[0:0:0:0:0:0:0:1%lo, 10.141.173.24, 10.141.180.83,
   10.142.225.62, 127.0.0.1, 2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2],
   sockAddrs=[/10.142.225.62:47520, /0:0:0:0:0:0:0:1%lo:47520,
   /127.0.0.1:47520, rocraappsl11.rjil.ril.com/10.141.180.83:47520,
   /2405:200:a60:f9f1:1602:ecff:fe68:f381%bond2:47520,
   /10.141.173.24:47520], discPort=47520, order=25, intOrder=17,
   lastExchangeTime=1574082959655, loc=true,
   ver=1.7.0#20160801-sha1:383273e3, isClient=false]],
   super=GridFutureAdapter [resFlag=0, res=null,
   startTime=1574077889490, endTime=0, ignoreInterrupts=false,
   state=INIT]]]

我认为这意味着您已经启动了节点,但它们无法运行/可用/完全处于活动状态,因此您的新节点无法完成加入它们的过程。通过查看他们的日志,您应该能够理解其中的原因。或者,如果不需要这些节点,就杀死它们

在这种情况下,似乎有几种:

WARN  9981874     90     2019-11-22 23:53:35     org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager     Executor task launch worker-2     Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, reassign=false, discoEvt=DiscoveryEvent [
    evtNode=TcpDiscoveryNode [id=c54ef27b-049e-466e-a594-3fffb01bd1e3, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.141.26.13:47512, /127.0.0.1:47512, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47512, /10.142.225.122:47512], discPort=47512, order=14, intOrder=9, lastExchangeTime=1574447015048, loc=true, ver=1.7.0#20160801-sha1:383273e3, isClient=false], topVer=14, nodeId8=c54ef27b, msg=null, type=NODE_JOINED, tstamp=1574446951740], 
    crd=TcpDiscoveryNode [id=81757f48-d3e7-466d-9058-8edc84496f4f, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47511, /10.141.26.13:47511, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47511, /10.142.225.122:47511], discPort=47511, order=1, intOrder=1, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0], nodeId=c54ef27b, evt=NODE_JOINED], added=false, initFut=GridFutureAdapter [resFlag=0, res=null, startTime=1574446955042, endTime=0, ignoreInterrupts=false, state=INIT], init=false, topSnapshot=null, lastVer=null, partReleaseFut=null, affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, initTs=1574446955052, centralizedAff=false, evtLatch=0, remaining=[88ed2a8c-c0b5-4b93-b3e4-529d00d5b118, 81757f48-d3e7-466d-9058-8edc84496f4f, 6727f535-2e69-4520-a1e0-8b2a2380134d], 
srvNodes=[
    TcpDiscoveryNode [id=81757f48-d3e7-466d-9058-8edc84496f4f, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47511, /10.141.26.13:47511, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47511, /10.142.225.122:47511], discPort=47511, order=1, intOrder=1, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], 
    TcpDiscoveryNode [id=6727f535-2e69-4520-a1e0-8b2a2380134d, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.142.225.122:47515, /127.0.0.1:47515, /10.141.26.13:47515, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47515], discPort=47515, order=5, intOrder=4, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], 
    TcpDiscoveryNode [id=88ed2a8c-c0b5-4b93-b3e4-529d00d5b118, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.142.225.122:47516, /10.141.26.13:47516, /127.0.0.1:47516, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47516], discPort=47516, order=8, intOrder=6, lastExchangeTime=1574446951670, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], 
    TcpDiscoveryNode [id=c54ef27b-049e-466e-a594-3fffb01bd1e3, addrs=[10.141.180.69, 10.141.26.13, 10.142.225.122, 127.0.0.1], sockAddrs=[/10.141.26.13:47512, /127.0.0.1:47512, nvmbd2bgt130d00.rjil.ril.com/10.141.180.69:47512, /10.142.225.122:47512], discPort=47512, order=14, intOrder=9, lastExchangeTime=1574447015048, loc=true, ver=1.7.0#20160801-sha1:383273e3, isClient=false]], super=GridFutureAdapter [resFlag=0, res=null, startTime=1574446955042, endTime=0, ignoreInterrupts=false, state=INIT]]]

你能分享你的配置吗?看起来您启用了一些不寻常的选项(例如共享内存)。您的拓扑结构是什么样的?有多少个节点/机器?我们有5台机器组成的群集,在IP地址为172.0.0.1的每台机器上启动Ignite server,我们正在使用本地缓存。你的意思是,一个网格已经在运行,它正在尝试启动另一个网格。它正在尝试加入一个现有网格。有没有办法获得更多详细信息,例如它正在尝试加入哪个网格?网格的IP地址或网格名称,任何可以提供任何线索的内容。