Python 3.x 芹菜画布:如何将任务结果列表中的元素分发给一个链,然后再链接其他元素
我目前正在学习芹菜,并尝试构建一个类似DAG的数据处理。我的想法是用芹菜画布创建一个管道。这个管道应该包含对所有对象列表完成的任务,或者应用于一个对象并应用于分布式对象。我实现了一个数据类,它将包含我的对象和一些虚拟任务,只是为了尝试管道体系结构。我使用的docker容器为我运行redis,无需任何额外配置。我还为dataclass编写了一个自定义JSON En/De编码器。我知道示例任务没有意义,只是为了展示我问题的mvp 数据类:Python 3.x 芹菜画布:如何将任务结果列表中的元素分发给一个链,然后再链接其他元素,python-3.x,celery,directed-acyclic-graphs,celery-canvas,Python 3.x,Celery,Directed Acyclic Graphs,Celery Canvas,我目前正在学习芹菜,并尝试构建一个类似DAG的数据处理。我的想法是用芹菜画布创建一个管道。这个管道应该包含对所有对象列表完成的任务,或者应用于一个对象并应用于分布式对象。我实现了一个数据类,它将包含我的对象和一些虚拟任务,只是为了尝试管道体系结构。我使用的docker容器为我运行redis,无需任何额外配置。我还为dataclass编写了一个自定义JSON En/De编码器。我知道示例任务没有意义,只是为了展示我问题的mvp 数据类: @dataclass class Car: car_
@dataclass
class Car:
car_id:int
color:str
tires:str
doors:int
## https://stackoverflow.com/questions/43092113/create-a-class-that-support-json-serialization-for-use-with-celery
import json
import collections
import six
def is_iterable(arg):
return isinstance(arg, collections.Iterable) and not isinstance(arg, six.string_types)
class GenericJSONEncoder(json.JSONEncoder):
def default(self, obj):
try:
return super().default(obj)
except TypeError:
pass
cls = type(obj)
result = {
'__custom__': True,
'__module__': cls.__module__,
'__name__': cls.__name__,
'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
}
return result
class GenericJSONDecoder(json.JSONDecoder):
def decode(self, str):
result = super().decode(str)
return GenericJSONDecoder.instantiate_object(result)
@staticmethod
def instantiate_object(result):
if not isinstance(result, dict): # or
if is_iterable(result):
return [GenericJSONDecoder.instantiate_object(v) for v in result]
else:
return result
if not result.get('__custom__', False):
return {k: GenericJSONDecoder.instantiate_object(v) for k, v in result.items()}
import sys
module = result['__module__']
if module not in sys.modules:
__import__(module)
cls = getattr(sys.modules[module], result['__name__'])
if hasattr(cls, '__json_decode__'):
return cls.__json_decode__(result['data'])
instance = cls.__new__(cls)
data = {k: GenericJSONDecoder.instantiate_object(v) for k, v in result['data'].items()}
instance.__dict__.update(data)
return instance
def dumps(obj, *args, **kwargs):
return json.dumps(obj, *args, cls=GenericJSONEncoder, **kwargs)
def loads(obj, *args, **kwargs):
return json.loads(obj, *args, cls=GenericJSONDecoder, **kwargs)
数据类的Json De/Encoder:
@dataclass
class Car:
car_id:int
color:str
tires:str
doors:int
## https://stackoverflow.com/questions/43092113/create-a-class-that-support-json-serialization-for-use-with-celery
import json
import collections
import six
def is_iterable(arg):
return isinstance(arg, collections.Iterable) and not isinstance(arg, six.string_types)
class GenericJSONEncoder(json.JSONEncoder):
def default(self, obj):
try:
return super().default(obj)
except TypeError:
pass
cls = type(obj)
result = {
'__custom__': True,
'__module__': cls.__module__,
'__name__': cls.__name__,
'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
}
return result
class GenericJSONDecoder(json.JSONDecoder):
def decode(self, str):
result = super().decode(str)
return GenericJSONDecoder.instantiate_object(result)
@staticmethod
def instantiate_object(result):
if not isinstance(result, dict): # or
if is_iterable(result):
return [GenericJSONDecoder.instantiate_object(v) for v in result]
else:
return result
if not result.get('__custom__', False):
return {k: GenericJSONDecoder.instantiate_object(v) for k, v in result.items()}
import sys
module = result['__module__']
if module not in sys.modules:
__import__(module)
cls = getattr(sys.modules[module], result['__name__'])
if hasattr(cls, '__json_decode__'):
return cls.__json_decode__(result['data'])
instance = cls.__new__(cls)
data = {k: GenericJSONDecoder.instantiate_object(v) for k, v in result['data'].items()}
instance.__dict__.update(data)
return instance
def dumps(obj, *args, **kwargs):
return json.dumps(obj, *args, cls=GenericJSONEncoder, **kwargs)
def loads(obj, *args, **kwargs):
return json.loads(obj, *args, cls=GenericJSONDecoder, **kwargs)
我的任务:
@app.task
def get_cars_from_db():
return [Car(car_id=1,color=None,tires=None,doors=2),Car(car_id=2,color=None,tires=None,doors=4),Car(car_id=3,color=None,tires=None,doors=4),Car(car_id=1,color=None,tires=None,doors=4)]
@app.task
def paint_car(car:Car):
car.color = "blue"
return car
@app.task
def filter_out_two_door(car:Car):
if car.doors==2:
return None
return car
@app.task
def filter_none(cars:[Car]):
return [c for c in car if c]
@app.task
def change_tires(car:Car):
car.tires = "winter"
return car
@app.task
def write_back_whatever(cars:[Car]):
print(cars)
@app.task
def dmap(args_iter, celery_task):
"""
Takes an iterator of argument tuples and queues them up for celery to run with the function.
"""
print(args_iter)
print(celery_task)
return group(celery_task(arg) for arg in args_iter)
我的芹菜配置:
from celery import Celery,subtask,group
from kombu.serialization import register, registry
from utils.json_encoders import dumps, loads
register("pipelineJSON",dumps,loads,content_type='application/x-pipelineJSON',content_encoding="utf-8")
registry.enable('pipelineJSON')
app = Celery('pipeline', broker='redis://localhost:6379/0',backend='redis://localhost:6379/0')
app.conf["accept_content"]=["application/x-pipelineJSON","pipelineJSON"]
app.conf["result_serializer"]="pipelineJSON"
app.conf["task_serializer"]="pipelineJSON"
现在,我尝试构建并执行以下工作流:
paint_and_filter = paint_car.s() | filter_out_two_door.s()
workflow = get_cars_from_db.s() | dmap.s(paint_and_filter) | filter_none.s() |
dmap.s(change_tires.s()) | write_back_whatever.s()
workflow.get()
我的问题是,我无法将get from db任务的列表结果传递给另一个链。我读了stackoverflow和github,偶然发现了dmap
,但没有成功地使它工作。在我提供的示例代码中,工人抛出以下执行选项:
return group(celery_task(arg) for arg in args_iter)
TypeError: 'dict' object is not callable
我还尝试将芹菜任务(arg)包装到子任务中,如下所示:
return group(subtask(celery_task)(arg) for arg in args_iter)
这将在工作进程上创建以下错误:
File "/Users/utils/json_encoders.py", line 23, in default
'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
kombu.exceptions.EncodeError: 'mappingproxy' object has no attribute '__dict__'
我试着画一幅我要归档的图片:
我正在使用芹菜5.02和Python 3.8.3
如果有人能帮我,我会非常感激的。如何使此dmap
工作?对于我试图归档的内容,是否有其他或更好的解决方案?提前谢谢