Python 芹菜。延迟挂起(最近,不是身份验证问题)
我正在运行芹菜2.2.4/DJ芹菜2.2.4,使用RabbitMQ 2.1.1作为后端。我最近上线了两台新的芹菜服务器——我在两台机器上运行了2个工人,总共18个线程,在我新的增强型机器(36g RAM+双超线程四核)上,我运行了10个工人,每个工人8个线程,总共180个线程——我的任务都很小,所以这应该没问题 节点在过去几天一直运行良好,但今天我注意到Python 芹菜。延迟挂起(最近,不是身份验证问题),python,django,rabbitmq,celery,Python,Django,Rabbitmq,Celery,我正在运行芹菜2.2.4/DJ芹菜2.2.4,使用RabbitMQ 2.1.1作为后端。我最近上线了两台新的芹菜服务器——我在两台机器上运行了2个工人,总共18个线程,在我新的增强型机器(36g RAM+双超线程四核)上,我运行了10个工人,每个工人8个线程,总共180个线程——我的任务都很小,所以这应该没问题 节点在过去几天一直运行良好,但今天我注意到.delaay()正在挂起。当我打断它时,我看到一个指向这里的回溯: File "/home/django/deployed/rele
.delaay()
正在挂起。当我打断它时,我看到一个指向这里的回溯:
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 324, in delay
return self.apply_async(args, kwargs)
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 449, in apply_async
publish.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/kombu/compat.py", line 108, in close
self.backend.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/channel.py", line 194, in close
(20, 41), # Channel.close_ok
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
self.channel_id, allowed_methods)
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_method
self.method_reader.read_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 212, in read_method
self._next_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 127, in _next_method
frame_type, channel, payload = self.source.read_frame()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 109, in read_frame
frame_type, channel, size = unpack('>BHI', self._read(7))
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 200, in _read
s = self.sock.recv(65536)
我已经检查了Rabbit日志,我认为尝试连接的过程如下:
=INFO REPORT==== 12-Jun-2011::22:58:12 ===
accepted TCP connection on 0.0.0.0:5672 from x.x.x.x:48569
我将芹菜日志级别设置为INFO
,但我在芹菜日志中没有看到任何特别有趣的内容,除了2名员工无法连接到代理:
[2011-06-12 22:41:08,033: ERROR/MainProcess] Consumer: Connection to broker lost. Trying to re-establish connection...
所有其他节点都能够连接而不会出现问题
我知道去年有一个类似性质的帖子(),但我很确定这是不同的。有没有可能是因为大量的工作人员在amqplib
中创造了某种竞争条件——我发现线程似乎表明amqplib
不是线程安全的,不确定这对芹菜是否重要
EDIT:我在两个节点上都尝试了celeryctl purge
——一个节点成功,但另一个节点失败,出现以下AMQP错误:
AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
amqplib.client_0_8.exceptions.AMQPConnectionException:
(530, u"NOT_ALLOWED - cannot redeclare exchange 'XXXXX' in vhost 'XXXXX'
with different type, durable or autodelete value", (40, 10), 'Channel.exchange_declare')
在这两个节点上,inspect stats
挂起,并执行上面的“无法关闭连接”回溯。我在这里不知所措
EDIT2:我可以使用exchange删除有问题的交换。从camqadm
删除,现在第二个节点也挂起:(
EDIT3:最近也发生了变化的一件事是,我向rabbitmq添加了一个额外的vhost,我的登台节点连接到该vhost。希望这将为某人节省大量时间……但这肯定不会让我感到尴尬:
运行rabbit的服务器上的/var
已满。在我添加的所有节点中,rabbit正在进行更多的日志记录,它已满了/var
——我无法写入/var/lib/rabbitmq
,因此没有消息通过