Python脚本随机失败-我可以使用什么工具来确定原因?

Python脚本随机失败-我可以使用什么工具来确定原因?,python,trace,watchdog,raspberry-pi3,Python,Trace,Watchdog,Raspberry Pi3,我有一个python脚本,是作为池自动化项目的一部分编写的。随着时间的推移,我对它做了很多修改,改进了它并添加了功能。因此,我一直没有机会让它长时间运行,直到最近我(几乎)在我想要的地方得到了它。现在我让它一直运行,它会随机失败并重新启动(通过看门狗支持) 我通过systemd在Raspberry Pi3上运行这个脚本,它包括看门狗支持,因为我希望/需要它一直运行。看门狗会在脚本失败时捕获脚本,并像预期的那样重新启动它,但我更愿意首先找出是什么导致脚本失败 该脚本连接到一个mysql数据库,获取

我有一个python脚本,是作为池自动化项目的一部分编写的。随着时间的推移,我对它做了很多修改,改进了它并添加了功能。因此,我一直没有机会让它长时间运行,直到最近我(几乎)在我想要的地方得到了它。现在我让它一直运行,它会随机失败并重新启动(通过看门狗支持)

我通过systemd在Raspberry Pi3上运行这个脚本,它包括看门狗支持,因为我希望/需要它一直运行。看门狗会在脚本失败时捕获脚本,并像预期的那样重新启动它,但我更愿意首先找出是什么导致脚本失败

该脚本连接到一个mysql数据库,获取一些关于游泳池水位的信息,以及我的游泳池泵使用了多少瓦特,然后确定是否需要填充游泳池。如果我们这样做了,我们就用一个继电器打开一个连接在水池上的喷水阀,否则我们什么也不做。我们还检查洒水器是否在运行,水池泵是否在运行,以及是否有人抛出了物理隔离开关。它有许多我们使用的状态指示灯和两个开关,还有一个LCD屏幕,通过串行与Pi通信

除了sshd和系统内容之外,这个脚本几乎是在Pi上运行的唯一东西……没有apache,没有节点red,ftp等等

我有一个ssh会话打开给Pi,这个会话永远不会失败,即使脚本失败。对pi的连续ping显示零数据包丢失,即使脚本失败。脚本失败并重新启动时,我的系统日志显示以下内容:

Jun  6 08:08:56 scruffy systemd[1]: Unit pool_control.service entered failed state.
Jun  6 08:08:57 scruffy systemd[1]: pool_control.service holdoff time over, scheduling restart.
Jun  6 08:08:57 scruffy systemd[1]: Stopping Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:57 scruffy systemd[1]: Starting Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:58 scruffy systemd[1]: Started Installing Python script for Pool Fill Control /w watchdog.
Jun  6 08:08:58 scruffy kernel: [34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
[    8.938912] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
[34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
当脚本失败并重新启动时,dmesg显示:

Jun  6 08:08:56 scruffy systemd[1]: Unit pool_control.service entered failed state.
Jun  6 08:08:57 scruffy systemd[1]: pool_control.service holdoff time over, scheduling restart.
Jun  6 08:08:57 scruffy systemd[1]: Stopping Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:57 scruffy systemd[1]: Starting Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:58 scruffy systemd[1]: Started Installing Python script for Pool Fill Control /w watchdog.
Jun  6 08:08:58 scruffy kernel: [34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
[    8.938912] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
[34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
我的程序日志没有显示任何异常:

2016-06-06 13:26:24,387 INFO Notify socket = /run/systemd/notify
2016-06-06 13:26:24,616 DEBUG PushBullet Notification Sent - Pool fill control started successfully
2016-06-06 13:26:24,617 INFO pool_fill_control.py V2.6 (2016-06-05) started
2016-06-06 13:26:25,182 DEBUG Sprinklers are not running (RACHIO).
2016-06-06 13:26:25,183 DEBUG SPRINKLER_RUN_LED should be OFF. This is a BLUE LED
2016-06-06 13:26:25,184 DEBUG Watchdog Ping Sent
2016-06-06 13:26:25,611 DEBUG get_pool_level returned 1
2016-06-06 13:26:25,764 DEBUG pool_pump_running_watts returned 12 watts in use by pump.
2016-06-06 13:26:25,765 DEBUG PUMP_RUN_LED should be OFF. This is the YELLOW LED
2016-06-06 13:26:25,766 DEBUG POOL_FILLING_LED should be OFF. This is a BLUE LED
2016-06-06 13:26:25,766 DEBUG Pool Level OK (PFC_LEVEL_OK) sent to MightyHat
脚本运行时,以下是top的输出:

top - 13:29:36 up 15:01,  3 users,  load average: 0.05, 0.07, 0.05
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  1.2 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:    947760 total,   390032 used,   557728 free,   114444 buffers
KiB Swap:   102396 total,        0 used,   102396 free.    97648 cached Mem
和meminfo:

root scruffy: log #  cat /proc/meminfo 
MemTotal:         947760 kB
MemFree:          558160 kB
MemAvailable:     864020 kB
Buffers:          114460 kB
Cached:            97640 kB
SwapCached:            0 kB
Active:           202888 kB
Inactive:          31192 kB
Active(anon):      23672 kB
Inactive(anon):     6140 kB
Active(file):     179216 kB
Inactive(file):    25052 kB
Unevictable:        1744 kB
Mlocked:            1744 kB
SwapTotal:        102396 kB
SwapFree:         102396 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:         23844 kB
Mapped:            19188 kB
Shmem:              6424 kB
Slab:             140780 kB
SReclaimable:     132312 kB
SUnreclaim:         8468 kB
KernelStack:        1000 kB
PageTables:          668 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      576276 kB
Committed_AS:      92620 kB
VmallocTotal:    1114112 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:           8192 kB
CmaFree:            3736 kB
以下是更多的系统信息:

root scruffy: log #  uptime
13:41:58 up 15:14,  3 users,  load average: 0.02, 0.04, 0.05

root scruffy: log #  uname -a
Linux scruffy 4.4.9-v7+ #884 SMP Fri May 6 17:28:59 BST 2016 armv7l GNU/Linux
以下是systemd启动/关闭脚本:

# This script starts and stops our pool fill control python script

[Unit]
Description=Installing Python script for Pool Fill Control /w watchdog
Requires=basic.target
After=multi-user.target

[Service]
Type=notify
WatchdogSec=70s
ExecStart=/usr/bin/python /root/pool_control/pool_fill_control.py
ExecStop=/root/pool_control/setupgpio.sh
Restart=always

# The number of times the service is restarted within a time period can be set
# If that condition is met, the RPi can be rebooted
#
StartLimitBurst=4
StartLimitInterval=180s
# actions can be none|reboot|reboot-force|reboot-immidiate
StartLimitAction=none

# The following are defined the /etc/systemd/system.conf file and are
# global for all services
#
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#
# They can also be set on a per process here:
# if they are not defined here, they fall back to the system.conf values
TimeoutStartSec=2s
TimeoutStopSec=2s

[Install]
WantedBy=multi-user.target
我试着在jessie的新安装上运行它,并将它移动到另一个Pi,结果都是一样的,在一段不确定的时间后,脚本失败,看门狗重新启动它

所讨论的脚本相当长,因此我不确定将其发布到此处的正确过程,但我在github上确实有此脚本:

我正在寻找关于如何对代码进行故障排除的指导,以确定是什么原因导致它失败,或者我是否有一些令人震惊的代码直接跳到具有更多python经验的人身上。我没有太多的经验,这是我的第一个(我认为真实的)Python脚本。 最终,我希望通过一个网页将其与一个内部网站连接,以复制物理功能(按钮按下、LED),但我希望脚本在进一步操作之前能够正常工作


如果您能提供帮助或指导,我们将不胜感激

仍然有同样的问题,希望有人能给我指出正确的方向!哎哟真正地这里没有一个python大师可以帮我指出正确的方向,看看为什么这个脚本在随机时间失败…?仍然有同样的问题,希望有人能给我指出正确的方向!哎哟真正地这里没有一个python大师能帮我找到正确的方向,看看为什么这个脚本会在随机时间失败。。。?