Python 使用长时间运行的命令时,仆从没有响应

Python 使用长时间运行的命令时,仆从没有响应,python,salt-stack,tcpdump,Python,Salt Stack,Tcpdump,我有一个环境,其中salt master salt minion通信显然已建立良好:4505和4506 TCP端口已打开,密钥已被接受,测试模块的一些功能正常工作: root@minion01 # telnet master01 4505 Trying 100.134.0.200... Connected to master01. Escape character is '^]'. [user@master01 ~]$ salt-key -L Accepted Keys: minion01

我有一个环境,其中salt master salt minion通信显然已建立良好:4505和4506 TCP端口已打开,密钥已被接受,测试模块的一些功能正常工作:

root@minion01 # telnet master01 4505
Trying 100.134.0.200...
Connected to master01.
Escape character is '^]'.

[user@master01 ~]$ salt-key -L
Accepted Keys:
minion01

[user@master01 ~]$ salt 'minion01' test.ping
minion01:
    True

[user@master01 ~]$ salt 'minion01' test.version
minion01:
    2015.8.8
但是,当我尝试执行一些二进制文件时,没有得到响应:

[user@master01 ~]$ salt 'minion01' cmd.script 'salt://bin/test01' args='bla'
minion01:
    Minion did not return. [No response]
我还从测试模块中找到了一个有趣的函数来调试这些问题:

如果使用调试标志运行该函数:

[user@master01 ~]$ salt 'minion01' test.rand_sleep -l debug
[DEBUG   ] Reading configuration from /salt/etc/master
[DEBUG   ] Missing configuration file: /home/user/.saltrc
[DEBUG   ] Configuration file path: /salt/etc/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /salt/etc/master
[DEBUG   ] Missing configuration file: /home/user/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: ipc:///salt/salt/cache/.salt-unix/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///salt/salt/cache/.salt-unix/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/salt/pki/master', 'master01_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /salt/salt/cache/jobs/59/be0dc5d330a8a183114e4826349b02/.minions.p
[DEBUG   ] get_iter_returns for jid 20161018122035908234 sent to set(['minion01']) will timeout at 12:20:40.920216
[DEBUG   ] Checking whether jid 20161018122035908234 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/salt/pki/master', 'master01_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] LazyLoaded no_return.output
minion01:
    Minion did not return. [No response]
如果我嗅到交通的味道。。。实际上,我看到数据包以正确的方式运行:

[root@master01 ~]# tcpdump dst 100.134.0.239 -i any and portrange 4505-4506
12:24:57.051480 IP master01.4505 > minion01.49194: Flags [P.], seq 5698:5893, ack 1, win 115, length 195
12:25:02.064182 IP master01.4505 > minion01.49194: Flags [P.], seq 5893:6104, ack 1, win 115, length 211
12:25:04.296926 IP master01.4506 > minion01.49747: Flags [S.], seq 2770281963, ack 425526514, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
12:25:04.297389 IP master01.4506 > minion01.49747: Flags [P.], seq 1:11, ack 1, win 115, length 10
12:25:04.297402 IP master01.4506 > minion01.49747: Flags [.], ack 11, win 115, length 0
12:25:04.297419 IP master01.4506 > minion01.49747: Flags [P.], seq 11:12, ack 11, win 115, length 1
12:25:04.297680 IP master01.4506 > minion01.49747: Flags [P.], seq 12:13, ack 13, win 115, length 1
12:25:04.297704 IP master01.4506 > minion01.49747: Flags [P.], seq 13:15, ack 13, win 115, length 2
12:25:04.302246 IP master01.4506 > minion01.49748: Flags [S.], seq 2644063969, ack 425575524, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
12:25:04.302570 IP master01.4506 > minion01.49748: Flags [P.], seq 1:11, ack 1, win 115, length 10
...
然后回来

[root@master01 ~]# tcpdump src 100.134.0.239 -i any and portrange 4505-4506
12:27:09.254011 IP minion01.49194 > master01.4505: Flags [.], ack 819763160, win 49640, length 0
12:27:11.606309 IP minion01.49782 > master01.4506: Flags [S], seq 462752770, win 49640, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0
12:27:11.606671 IP minion01.49782 > master01.4506: Flags [.], ack 1832475241, win 49640, length 0
12:27:11.606856 IP minion01.49782 > master01.4506: Flags [P.], seq 0:10, ack 1, win 49640, length 10
12:27:11.607084 IP minion01.49782 > master01.4506: Flags [.], ack 12, win 49640, length 0
12:27:11.607133 IP minion01.49782 > master01.4506: Flags [P.], seq 10:12, ack 12, win 49640, length 2
12:27:11.607532 IP minion01.49782 > master01.4506: Flags [.], ack 13, win 49640, length 0
12:27:11.607608 IP minion01.49782 > master01.4506: Flags [P.], seq 12:14, ack 15, win 49640, length 2
12:27:11.611069 IP minion01.49783 > master01.4506: Flags [S], seq 462854740, win 49640, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0
看来通信突然中断了,但为什么呢?测试模块中的某些功能如何工作而其他功能如何不工作

提前谢谢。如有任何线索/提示,将不胜感激

更新:如果我使用超时标志
-t
执行salt cmd.script命令,它确实可以工作。然而,在许多其他情况下,我并不需要这种选择。此案例与成功案例之间的主要区别在于缺少以下调试消息:

[DEBUG   ] Checking whether jid 20161019054212008948 is still running
即使仆从配置了自定义保留设置:

root@minion01 # cat /salt/etc/minion | grep -v ^# | grep -i keepalive
tcp_keepalive_idle: 60
tcp_keepalive_cnt: 3
tcp_keepalive_intvl: 5

顺便说一下,salt master(master01)和salt minion(minion01)之间只有一个网络元素,即内部防火墙。Everywhere已将MTU正确设置为1500。

两台机器上的
date
说明了什么?时间是否同步运行?
date
命令报告两台机器中的时间相同。它们都是针对相同的NTP服务器配置的。谢谢你的主意。
date
在这两台机器上都说了些什么?时间是否同步运行?
date
命令报告两台机器中的时间相同。它们都是针对相同的NTP服务器配置的。谢谢你的主意。