Python线程/子进程;子进程仍在运行时线程对象无效
我正在编写一些监视工具,其中一个工具处理NFS安装的文件系统列表,并尝试向每个共享写入一个测试文件。我为每个正在测试的文件服务器生成一个线程。在某个超时之后,我想终止所有仍在运行的线程。我不需要收集线程的输出,我不需要加入或连接到任何线程,我只需要停止超时后仍在运行的任何线程 因此,我故意在每个线程中运行一个sleep(300),这样我就知道当我循环遍历所有活动线程并尝试杀死它们时,每个线程中生成的进程在最后仍然在运行。这适用于5-20个线程,但最终在一个线程上失败,表示self.pid不再有效。发生这种情况的线程是随机的,不一定与上次运行的线程相同Python线程/子进程;子进程仍在运行时线程对象无效,python,multithreading,python-2.7,Python,Multithreading,Python 2.7,我正在编写一些监视工具,其中一个工具处理NFS安装的文件系统列表,并尝试向每个共享写入一个测试文件。我为每个正在测试的文件服务器生成一个线程。在某个超时之后,我想终止所有仍在运行的线程。我不需要收集线程的输出,我不需要加入或连接到任何线程,我只需要停止超时后仍在运行的任何线程 因此,我故意在每个线程中运行一个sleep(300),这样我就知道当我循环遍历所有活动线程并尝试杀死它们时,每个线程中生成的进程在最后仍然在运行。这适用于5-20个线程,但最终在一个线程上失败,表示self.pid不再有效
class NFSWriteTestThread(threading.Thread):
def __init__(self, filer, base_mnt_point, number):
super(NFSWriteTestThread, self).__init__()
self.tname = filer
self.tnum = number
self.filer = filer
self.mntpt = base_mnt_point
self.process = None
self.pid = None
def run(self):
start = time.time()
# self.process = subprocess.Popen(['/bin/dd', 'if=/dev/zero', 'bs=1M', 'count=5', 'of=' + self.testfile], shell=False)
self.process = subprocess.Popen(['/bin/sleep', '300'], shell=False)
time.sleep(1)
logger.debug("DEBUG: %s=%d" % (self.tname, self.process.pid))
self.pid = self.process.pid
logger.info(" NFS write test command initiaited on '%s', pid=%d" % (self.filer, self.pid))
self.process.wait()
# self.output, self.error = self.process.communicate()
end = time.time()
logger.info(" NFS write test for '%s' completed in %d seconds" % (self.filer, end - start))
return
def getThreadName(self):
return self.tname
def getThreadNum(self):
return self.tnum
def getThreadPID(self):
if self.pid:
return self.pid
else:
return "unknown"
def isAlive(self):
if not self.process:
logger.debug("Error: self.process is invalid (%s)" % type(self.process))
# if self.process.poll():
# logger.info("NFS write test operation for thread '%s' is still active" % self.filer)
# else:
# logger.info("NFS write test operation for thread '%s' is inactive" % self.filer)
return
def terminate(self):
os.kill(self.process.pid, signal.SIGTERM)
return
def kill(self):
os.kill(self.process.pid, signal.SIGKILL)
return
def initLogging(config):
logfile = os.path.join(config['logdir'], config['logfilename'])
fformat = logging.Formatter('%(asctime)s %(message)s', "%Y-%m-%d %H:%M:%S %Z")
cformat = logging.Formatter('%(asctime)s %(message)s', "%Y-%m-%d %H:%M:%S %Z")
clogger = None
flogger = None
if config['debug']:
loglevel = logging.DEBUG
if not os.path.exists(config['logdir']):
os.makedirs(config['logdir'])
os.chmod(config['logdir'], 0700)
os.chown(config['logdir'], 0, 0)
try:
logger = logging.getLogger('main')
logger.setLevel(logging.DEBUG)
# Define a file logger
flogger = logging.FileHandler(logfile, 'w')
flogger.setLevel(logging.DEBUG)
flogger.setFormatter(fformat)
logger.addHandler(flogger)
# Define a console logger if verbose
if config['verbose']:
clogger = logging.StreamHandler()
clogger.setLevel(logging.DEBUG)
clogger.setFormatter(cformat)
logger.addHandler(clogger)
except Exception, error:
print "Error: Unable to initialize file logging: %s" % error
sys.exit(1)
logger.info("Script initiated.")
logger.info("Using the following configuration:")
for key, value in sorted(config.iteritems()):
logger.info(" %20s = '%-s'" % (key, value))
return logger
def parseConfigFile(cfg):
if not os.path.isfile(cfg['cfgfile']) or not os.access(cfg['cfgfile'], os.R_OK):
print "Error: '%s' does not exist or is not readable, terminating." % cfg['cfgfile']
sys.exit(1)
config = SafeConfigParser()
config.read(cfg['cfgfile'])
_cfg = dict(config.items(cfg['cfgfilestanza']))
_cfgfilers = config.get(cfg['cfgfilestanza'], 'managed_filers')
_tmpfilers = _cfgfilers.split(',')
# populate a list containing all filers which will be meged into the global cfg[] dict
_cfg['filers'] = []
for _f in _tmpfilers:
_cfg['filers'].append(_f.strip())
return _cfg
logger = initLogging(cfg)
cfg.update(parseConfigFile(cfg))
threads = []
numThreads = 0
for filer in cfg['filers']:
numThreads += 1
logger.debug(" spawning NFS wite test thread for '%s', thread number %s" % (filer, numThreads))
t = NFSWriteTestThread(filer, cfg['base_mnt_point'], numThreads)
t.start()
threads.append(t)
# time.sleep(1)
logger.info("spawned %d NFS write test child threads" % numThreads)
logger.info("sleeping for %d seconds" % cfg['timeout'])
time.sleep(cfg['timeout'])
if (threading.activeCount() > 1):
logger.info("there are %d NFS write test threads active after the timeout:" % (threading.activeCount() - 1))
for thr in threading.enumerate():
logger.debug("theadname=%s" % thr.name)
if re.match("MainThread", thr.getName()):
pass
else:
logger.info("thread '%s' (thread %d) is still alive" % (thr.getThreadName(), thr.getThreadNum()))
# thr.isAlive()
logger.info("killing thread for '%s' (pid=XX) with SIGTERM" % (thr.getThreadName()))
# logger.info("killing thread for '%s' (pid=%d) with SIGTERM" % (thr.getThreadName(), thr.getThreadPID()))
thr.kill()
logger.info("Script complete")
sys.exit(0)
在这里您可以看到输出:
2014-11-10 09:00:22 CST there are 173 NFS write test threads active after the timeout:
2014-11-10 09:00:22 CST theadname=Thread-165
2014-11-10 09:00:22 CST thread 'hostname1' (thread 165) is still alive
2014-11-10 09:00:22 CST killing thread for 'hostname1' (pid=XX) with SIGTERM
2014-11-10 09:00:22 CST theadname=Thread-97
2014-11-10 09:00:22 CST thread 'hostname2' (thread 97) is still alive
2014-11-10 09:00:22 CST NFS write test for 'hostname1' completed in 60 seconds
2014-11-10 09:00:22 CST killing thread for 'hostname2' (pid=XX) with SIGTERM
2014-11-10 09:00:22 CST theadname=Thread-66
2014-11-10 09:00:22 CST thread 'hostname3' (thread 66) is still alive
2014-11-10 09:00:22 CST NFS write test for 'hostname2' completed in 60 seconds
2014-11-10 09:00:22 CST killing thread for 'hostname3' (pid=XX) with SIGTERM
2014-11-10 09:00:22 CST theadname=Thread-121
2014-11-10 09:00:22 CST thread 'hostname4' (thread 121) is still alive
2014-11-10 09:00:22 CST killing thread for 'hostname4' (pid=XX) with SIGTERM
Traceback (most recent call last):
2014-11-10 09:00:22 CST NFS write test for 'hostname3' completed in 60 seconds
File "./NFSWriteTestCheck.py", line 199, in <module>
thr.kill()
File "./NFSWriteTestCheck.py", line 84, in kill
os.kill(self.process.pid, signal.SIGKILL)
AttributeError: 'NoneType' object has no attribute
2014-11-10 09:00:22 CST超时后有173个NFS写测试线程处于活动状态:
2014-11-10 09:00:22 CST theadname=Thread-165
2014-11-10 09:00:22 CST线程“hostname1”(线程165)仍处于活动状态
2014-11-10 09:00:22带有SIGTERM的“主机名1”(pid=XX)的CST终止线程
2014-11-10 09:00:22 CST theadname=Thread-97
2014-11-10 09:00:22 CST线程“hostname2”(线程97)仍处于活动状态
2014-11-10 09:00:22“hostname1”的CST NFS写入测试在60秒内完成
2014-11-10 09:00:22带有SIGTERM的“主机名2”(pid=XX)的CST终止线程
2014-11-10 09:00:22 CST theadname=Thread-66
2014-11-10 09:00:22 CST线程“hostname3”(线程66)仍处于活动状态
2014-11-10 09:00:22“主机名2”的CST NFS写入测试在60秒内完成
2014-11-10 09:00:22带有SIGTERM的“主机名3”(pid=XX)的CST终止线程
2014-11-10 09:00:22 CST theadname=Thread-121
2014-11-10 09:00:22 CST线程“hostname4”(线程121)仍处于活动状态
2014-11-10 09:00:22带有SIGTERM的“主机名4”(pid=XX)的CST终止线程
回溯(最近一次呼叫最后一次):
2014-11-10 09:00:22“hostname3”的CST NFS写入测试在60秒内完成
文件“/NFSWriteTestCheck.py”,第199行,在
thr.kill()
kill中第84行的文件“/NFSWriteTestCheck.py”
os.kill(self.process.pid,signal.SIGKILL)
AttributeError:“非类型”对象没有属性
显示此错误时,进程仍在运行,并在shell中使用ps进行验证。为什么线程对象不再有效?在抛出此错误时,线程的执行应在此点:
self.process.wait()
我在这里搔搔头,想知道我是不是碰到了虫子什么的 你不能停止线程,你只能中断线程正在做的事情。在您的情况下,线程正在等待
子进程。call
。设置事件无效,因为线程不等待事件。这里的解决方案是杀死子进程,这意味着您将需要Popen对象
我将实现直接放在run方法中,这样Popen对象就很方便了
class NFSWriteTestThread(threading.Thread):
def __init__(self, filer, mntpoint, number):
super(NFSWriteTestThread, self).__init__()
self.name = filer
self.filer = filer
self.mntpt = mntpoint
self.tnum = number
self._proc = None
def run(self):
testfile = "%s/%s/test/test.%s" % (mountpoint, filer, filer)
testcmd = "/bin/bash -c '/bin/dd if=/dev/zero bs=1024 count=1024 of=" + testfile + " >/dev/null 2>/dev/null; sleep 120'"
self._proc = subprocess.Popen(testcmd, shell=True)
self._proc.wait()
return
def getName(self):
return self.name
def kill(self):
if self._proc:
self._proc.terminate()
def stopped(self):
if self._proc:
return self._proc.poll() is not None
else:
return True
我曾经尝试过这种方法,但它一直抱怨说“AttributeError:'Popen'对象没有属性'terminate'”。我将更新我的原始代码以显示我的当前版本。
terminate
是python 2.6的新版本。它也应该在2.7中。例如,如果您使用的是Redhat5,那么默认的python是2.4,而不是2.4terminate
只是让linux和windows工作相同的一种方法,你可以导入信号
,然后self.\u proc.send\u signal(signal.SIGTERM)
我在这方面玩得很开心。我放弃了终止()和kill()的尝试,现在正在使用os.kill,它可以工作。。但是,在评估process.pid时,我得到了不一致的结果。。然而,当我100%知道进程仍在运行时,通过ps验证,并且线程现在正在运行睡眠(300),即使进程仍在运行,self.pid也会偶尔被取消设置。