[源码分析]并行分布式任务队列 Celery 之 子进程处理消息

0x00 摘要

Celery是一个简单、灵活且可靠的,处理大量消息的分布式系统,专注于实时处理的异步任务队列,同时也支持任务调度。在前文中,我们介绍了Celery 多线程模型,本文介绍子进程如何处理消息。

通过本文,大家可以梳理如下流程:

  • 父进程如何发送消息给子进程;
  • 子进程如何接受到父进程消息;
  • 子进程如何一步一步解析消息,从而把运行任务需要的各种信息一层一层剥离出来;
  • 子进程在得到任务信息后,如何运行任务;
  • 为什么 Celery 要有各种复杂繁琐的封装?

0x01 来由

我们首先回顾前文。之前 Celery work 中有 apply_async 函数调用到Pool,就是有用户的任务消息来到时,Celery 准备调用到 Pool

def apply_async(self, func, args=(), kwds={},...):
if self.threads:
self._taskqueue.put(([(TASK, (result._job, None,
func, args, kwds))], None))
else:
self._quick_put((TASK, (result._job, None, func, args, kwds)))
return result

然后,在 billiard/pool.py 这里可以见到,Pool 会 以self._taskqueue做为媒介,把消息传递到 TaskHandler 之中,进而将会调用到子进程。

class Pool(object):
'''
Class which supports an async version of applying functions to arguments.
'''
Worker = Worker
Supervisor = Supervisor
TaskHandler = TaskHandler
TimeoutHandler = TimeoutHandler
ResultHandler = ResultHandler def __init__(self, processes=None, initializer=None, initargs=(),...): self._task_handler = self.TaskHandler(self._taskqueue,
self._quick_put,
self._outqueue,
self._pool,
self._cache)
if threads:
self._task_handler.start()

此时逻辑如上文图例所示:

                           +
Consumer |
message |
v strategy +------------------------------------+
+------------+------+ | strategies |
| on_task_received | <--------+ | |
| | |[myTest.add : task_message_handler] |
+------------+------+ +------------------------------------+
|
|
+------------------------------------------------------------------------------------+
strategy |
|
|
v Request [myTest.add]
+------------+-------------+ +---------------------+
| task_message_handler | <-------------------+ | create_request_cls |
| | | |
+------------+-------------+ +---------------------+
| _process_task_sem
|
+------------------------------------------------------------------------------------+
Worker | req[{Request} myTest.add]
v
+--------+-----------+
| WorkController |
| |
| pool +-------------------------+
+--------+-----------+ |
| |
| apply_async v
+-----------+----------+ +---+-------------------+
|{Request} myTest.add | +---------------> | TaskPool |
+----------------------+ +----+------------------+
myTest.add |
|
+--------------------------------------------------------------------------------------+
|
v
+----+------------------+
| billiard.pool.Pool |
+-------+---------------+
|
|
Pool +---------------------------+ |
| TaskHandler | |
| | | self._taskqueue.put
| _taskqueue | <---------------+
| |
+------------+--------------+
|
| put(task)
|
+--------------------------------------------------------------------------------------+
|
Sub process |
v
self._inqueue

手机如下:

于是我们顺着 taskqueue 就来到了TaskHandler。

0x02 父进程 TaskHandler

本部分介绍父进程如何传递 任务消息 给 子进程。

此时依然是父进程。代码位置是:\billiard\pool.py。具体堆栈为:

_send_bytes, connection.py:314
send, connection.py:233
body, pool.py:596
run, pool.py:504
_bootstrap_inner, threading.py:926
_bootstrap, threading.py:890

变量为:

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>
additional_info = {PyDBAdditionalThreadInfo} State:2 Stop:None Cmd: 107 Kill:False
cache = {dict: 1} {0: <%s: 0 ack:False ready:False>}
daemon = {bool} True
name = {str} 'Thread-16'
outqueue = {SimpleQueue} <billiard.queues.SimpleQueue object at 0x000001E2C07DD6C8>
pool = {list: 8} [<SpawnProcess(SpawnPoolWorker-1, started daemon)>, <SpawnProcess(SpawnPoolWorker-2, started daemon)>, <SpawnProcess(SpawnPoolWorker-3, started daemon)>, <SpawnProcess(SpawnPoolWorker-4, started daemon)>, <SpawnProcess(SpawnPoolWorker-5, started daemon)>, <SpawnProcess(SpawnPoolWorker-6, started daemon)>, <SpawnProcess(SpawnPoolWorker-7, started daemon)>, <SpawnProcess(SpawnPoolWorker-8, started daemon)>]
taskqueue = {Queue} <queue.Queue object at 0x000001E2C07DD208>
_args = {tuple: 0} ()
_children = {WeakKeyDictionary: 0} <WeakKeyDictionary at 0x1e2c0883448>
_daemonic = {bool} True
_kwargs = {dict: 0} {}
_name = {str} 'Thread-16'
_parent = {_MainThread} <_MainThread(MainThread, started 13408)>
_pid = {NoneType} None
_start_called = {bool} True
_started = {Event} <threading.Event object at 0x000001E2C0883D88>
_state = {int} 0
_stderr = {LoggingProxy} <celery.utils.log.LoggingProxy object at 0x000001E2C07DD188>
_target = {NoneType} None
_tstate_lock = {lock} <locked _thread.lock object at 0x000001E2C081FDB0>
_was_started = {bool} True

2.1 发送消息

当父进程接受到任务消息之后,就调用 put(task) 给在 父进程 和 子进程 之间的管道发消息。

注意,因为之前的赋值代码是:

self._taskqueue = Queue()

def _setup_queues(self):
self._inqueue = Queue()
self._outqueue = Queue()
self._quick_put = self._inqueue.put
self._quick_get = self._outqueue.get

就是说,TaskHandler 内部,如果接到消息,就 通过 self._inqueue.put 这个管道的函数 给 自己的 子进程发消息。 self._taskqueue 就是一个中间变量媒介而已

所以此时变量如下:

put = {method} <bound method _ConnectionBase.send of <billiard.connection.PipeConnection object at 0x000001E2C07DD2C8>>

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>

task = {tuple: 2}
0 = {int} 2
1 = {tuple: 5} (0, None, <function _trace_task_ret at 0x000001E2BFCA3438>, ('myTest.add', 'dee72291-5614-4106-a7bf-007023286e9e', {'lang': 'py', 'task': 'myTest.add', 'id': 'dee72291-5614-4106-a7bf-007023286e9e', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen17456@DESKTOP-0GO3RPO', 'reply_to': '21660796-c7e7-3736-9d42-e1be6ff7eaa8', 'correlation_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'hostname': 'celery@DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {})
__len__ = {int} 2 taskqueue = {Queue} <queue.Queue object at 0x000001E2C07DD208>

具体代码如下,可以看到就是给管道发消息,并且通知 result handler 和 其他worker

class TaskHandler(PoolThread):

    def __init__(self, taskqueue, put, outqueue, pool, cache):
self.taskqueue = taskqueue
self.put = put
self.outqueue = outqueue
self.pool = pool
self.cache = cache
super(TaskHandler, self).__init__() def body(self):
cache = self.cache
taskqueue = self.taskqueue
put = self.put for taskseq, set_length in iter(taskqueue.get, None):
task = None
i = -1
try:
for i, task in enumerate(taskseq):
try:
put(task) break self.tell_others()

2.2 通知其他

tell_others 的作用是通知 result handler, 以及其他 worker。

def tell_others(self):
outqueue = self.outqueue
put = self.put
pool = self.pool try:
# tell result handler to finish when cache is empty
outqueue.put(None) # tell workers there is no more work
for p in pool:
put(None)

0x03 子进程 worker

本部分介绍 Worker 子进程 如何接受任务,并且执行任务。

既然任务消息已经通过管道发送给子进程,现在执行来到了 子进程,注意此时 self 是 billiard.pool.Worker。

3.1 子进程 loop

在worker中,消息 loop 具体逻辑(多次解析消息)是:

  • 调用 wait_for_job 来等待父进程写入管道的消息;
  • 得到了用户消息 req 之后,解析出来 :type_, args = req
  • 如果需要发送 ACK,就发送;
  • 对于解析出来的 args,再次解析:job, i, fun, args, kwargs = args_,得到 job,子进程需要执行的函数,函数的参数等等;
  • 如果需要 wait_for_syn ,就处理;
  • 通过 fun 来 间接调用用户自定义函数 result = (True, prepare_result( fun(*args, **kwargs))),并且返回result。需要注意的是,这里的 fun 是 _trace_task_ret,用户自定的函数由 _trace_task_ret 内部调用;
  • 进行后续处理,比如给父进程发送 READY;

代码如下:

def workloop(self, debug=debug, now=monotonic, pid=None):
pid = pid or os.getpid()
put = self.outq.put
inqW_fd = self.inqW_fd
synqW_fd = self.synqW_fd
maxtasks = self.maxtasks
prepare_result = self.prepare_result wait_for_job = self.wait_for_job
_wait_for_syn = self.wait_for_syn def wait_for_syn(jid):
i = 0
while 1:
if i > 60:
error('!!!WAIT FOR ACK TIMEOUT: job:%r fd:%r!!!',
jid, self.synq._reader.fileno(), exc_info=1)
req = _wait_for_syn()
if req:
type_, args = req # 解析用户传递来的消息 req
if type_ == NACK:
return False
assert type_ == ACK
return True
i += 1 completed = 0
try:
while maxtasks is None or (maxtasks and completed < maxtasks):
req = wait_for_job()
if req:
type_, args_ = req
assert type_ == TASK
job, i, fun, args, kwargs = args_ # 再次解析,得到变量。这里的 fun 是 `_trace_task_ret`,用户自定的函数由 `_trace_task_ret` 内部调用
put((ACK, (job, i, now(), pid, synqW_fd)))
if _wait_for_syn:
confirm = wait_for_syn(job)
if not confirm:
continue # received NACK result = (True, prepare_result(fun(*args, **kwargs))) put((READY, (job, i, result, inqW_fd))) completed += 1
if max_memory_per_child > 0:
used_kb = mem_rss()
if used_kb > 0 and used_kb > max_memory_per_child:
warning(MAXMEM_USED_FMT.format(
used_kb, max_memory_per_child))
return EX_RECYCLE if maxtasks:
return EX_RECYCLE if completed == maxtasks else EX_FAILURE
return EX_OK
finally:
self._ensure_messages_consumed(completed=completed)

此时变量如下,req 变量就是父进程通过管道传过来的消息,子进程初步会解析成 args_

prepare_result = {method} <bound method Worker.prepare_result of <billiard.pool.Worker object at 0x000001BFAE5AE308>>

put = {method} <bound method _SimpleQueue.put of <billiard.queues.SimpleQueue object at 0x000001BFAE1BE7C8>>

type_ = 2 // 在 pool.py中有定义 TASK = 2

req = {tuple: 2} (2, (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

self = {Worker} <billiard.pool.Worker object at 0x000001BFAE5AE308>

kwargs = {dict: 0} {}

args_ = (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

对于前面的逻辑图,我们往下扩展逻辑如下:

                                                               +
|
|
v
+----+------------------+
| billiard.pool.Pool |
+-------+---------------+
|
|
Pool +---------------------------+ |
| TaskHandler | |
| | | self._taskqueue.put
| _taskqueue | <---------------+
| |
+------------+--------------+
|
| put(task)
|
+--------------------------------------------------------------------------------------+
|
billiard.pool.Worker | get Sub process
v
+----------+-----------------------------+
| workloop |
| |
| |
| wait_for_job |
| |
+----------------------------------------+

手机如下:

3.2 得到父进程消息

wait_for_job 函数最终辗转调用到了_make_recv_method,就是使用管道 conn 的 读取函数来处理。

读取到的就是从父进程传递过来的消息 req,具体见前面。

回顾父进程的写入消息内容

put = {method} <bound method _ConnectionBase.send of <billiard.connection.PipeConnection object at 0x000001E2C07DD2C8>>

self = {TaskHandler} <TaskHandler(Thread-16, started daemon 14980)>

task = {tuple: 2}
0 = {int} 2
1 = {tuple: 5} (0, None, <function _trace_task_ret at 0x000001E2BFCA3438>, ('myTest.add', 'dee72291-5614-4106-a7bf-007023286e9e', {'lang': 'py', 'task': 'myTest.add', 'id': 'dee72291-5614-4106-a7bf-007023286e9e', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen17456@DESKTOP-0GO3RPO', 'reply_to': '21660796-c7e7-3736-9d42-e1be6ff7eaa8', 'correlation_id': 'dee72291-5614-4106-a7bf-007023286e9e', 'hostname': 'celery@DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {})
__len__ = {int} 2

可以看到,父进程写入的内容在子进程被读取出来。具体 子进程是通过 _make_recv_method来读取消息,就是使用管道 conn 的 读取函数来处理

这里是子进程了。

    def _make_recv_method(self, conn):
get = conn.get if hasattr(conn, '_reader'):
_poll = conn._reader.poll
if hasattr(conn, 'get_payload') and conn.get_payload:
get_payload = conn.get_payload def _recv(timeout, loads=pickle_loads):
return True, loads(get_payload())
else:
def _recv(timeout): # noqa
if _poll(timeout):
return True, get()
return False, None
else:
def _recv(timeout): # noqa
try:
return True, get(timeout=timeout)
except Queue.Empty:
return False, None
return _recv

3.3 解析消息

子进程读取消息之后,进行解析。job, i, fun, args, kwargs = args_

其实就是把之前 args_ 的内容一一解析。

args_ = (6, None, <function _trace_task_ret at 0x000001BFAE53EA68>, ('myTest.add', '2c6d431f-a86a-4972-886b-472662401d20', {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen14656@DESKTOP-0GO3RPO', 'reply_to': '3c9cc3a7-65d6-349b-ba66-399dc47b7cad', 'correlation_id': '2c6d431f-a86a-4972-886b-472662401d20', 'hostname': 'DESKTOP-0GO3RPO', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [2, 8], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}, b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8'), {}))

所以得到 :

job = {int} 6

i = {NoneType} None

fun = {function} <function _trace_task_ret at 0x000001BFAE53EA68>

kwargs = {dict: 0} {}

args = {tuple: 6}
0 = {str} 'myTest.add'
1 = {str} '2c6d431f-a86a-4972-886b-472662401d20'
2 = {dict: 26} {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20',
3 = {bytes: 81} b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]'
4 = {str} 'application/json'
5 = {str} 'utf-8'
__len__ = {int} 6

这样,子进程就知道自己需要调用什么函数(这里就是 myTest.add ),函数有什么参数(这里就是 (2, 8)

我们理一下消息读取解析流程:

  • 父进程写入 task
  • 子进程读取为 req
  • 子进程解析 req 为 type_,args_
  • 子进程解析 args_ 为:job, i, fun, args, kwargs。这里的 fun 是 _trace_task_ret,用户自定的函数由 _trace_task_ret 内部调用。
  • 在 args 之中,才包含用户自定义函数和其参数;

3.3.1 回调函数在父进程中的配置

刚刚提到,第一次解析出来的 fun 是 _trace_task_ret,用户自定的函数由 _trace_task_ret 内部调用。

我们需要看看回调函数 fun 在父进程中哪里配置。

由前文我们知道,当接受到任务时候,task_message_handler 会通过 Rqeust 类来使用多进程

注意:这个图 中的 Worker scope 是 celery/apps/worker.py,属于 Celery 之中逻辑范畴,不是子进程相关概念。Celery 中有多个同名类,这点很让人纠结。

                         +
Consumer |
message |
v strategy +------------------------------------+
+------------+------+ | strategies |
| on_task_received | <--------+ | |
| | |[myTest.add : task_message_handler] |
+------------+------+ +------------------------------------+
|
|
+------------------------------------------------------------------------------------+
strategy |
|
|
v Request [myTest.add]
+------------+-------------+ +---------------------+
| task_message_handler | <-------------------+ | create_request_cls |
| | | |
+------------+-------------+ +---------------------+
| _process_task_sem
|
+--------------------------------------------------------------------------------------+
Worker | req[{Request} myTest.add]
v
+--------+-----------+
| WorkController |
| | apply_async
| pool +-------------------------+
+--------+-----------+ |
| |
| v
+-----------+----------+ +---+-------+
|{Request} myTest.add | +---------------> | TaskPool |
+----------------------+ +-----------+
myTest.add

手机如下:

此时调用的 apply_async 其实就是pool.apply_async的方法。

在 Request 类的 execute_using_pool中,我们发现,pool.apply_async 的参数正是 trace_task_ret,所以就知道了,trace_task_ret 必然就是父进程传递的参数

class Request:
"""A request for task execution.""" def execute_using_pool(self, pool, **kwargs):
"""Used by the worker to send this task to the pool.
""" result = pool.apply_async(
trace_task_ret, # 就是这里
args=(self._type, task_id, self._request_dict, self._body,
self._content_type, self._content_encoding), # 这里才包含了用户自定义的函数
accept_callback=self.on_accepted,
timeout_callback=self.on_timeout,
callback=self.on_success,
error_callback=self.on_failure,
soft_timeout=soft_time_limit or task.soft_time_limit,
timeout=time_limit or task.time_limit,
correlation_id=task_id,
)
# cannot create weakref to None
self._apply_result = maybe(ref, result)
return result

3.4 调用函数

由上面知道,Pool 的 调用函数是:_trace_task_ret,即 _trace_task_ret 是 一个对用户函数的统一外层封装,对于 Pool 来说,调用 _trace_task_ret 即可,_trace_task_ret 内部会调用用户函数

为什么不直接调用用户函数 myTest.add?而是使用 _trace_task_ret 再封装一层?从名字带上 trace 就能看出来,这里就是扩展性,调试,trace 和 运行速度的一个综合妥协

核心代码为两处:

3.3.1 获取 Celery 应用

第一处重点为:获取事先在子进程就设置好的 Celery 应用,代码如下:

app = app or current_app._get_current_object()

这里就有一个问题:Celery 应用是在父进程中,子进程如何得到。

虽然在一些多进程机制中,父进程的变量是会复制到子进程中,但是这并不是一定的,所以必然有一个父进程把 Celery 应用 设置给子进程的机制。

具体关于 父进程是如何给子进程配置 Celery应用,以及子进程如何得到这个应用的详细解析,请参见前文。

3.3.2 获取任务

第二处重点在于:如何获取实现注册好的任务task。代码如下:

R, I, T, Rstr = trace_task(app.tasks[name], uuid, args, kwargs, request, app=app)

其中,app.tasks为事先注册的变量,就是 Celery 之中的所有任务,其中包括内置任务和用户任务。

于是 app.tasks[name] 就是通过任务名字来得到对应的任务本身

app.tasks = {TaskRegistry: 9}
NotRegistered = {type} <class 'celery.exceptions.NotRegistered'>
'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest at 0x1bfae596d48>
'celery.chord' = {chord} <@task: celery.chord of myTest at 0x1bfae596d48>
'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest at 0x1bfae596d48>
'celery.chunks' = {chunks} <@task: celery.chunks of myTest at 0x1bfae596d48>
'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest at 0x1bfae596d48>
'celery.group' = {group} <@task: celery.group of myTest at 0x1bfae596d48>
'celery.map' = {xmap} <@task: celery.map of myTest at 0x1bfae596d48>
'celery.chain' = {chain} <@task: celery.chain of myTest at 0x1bfae596d48>
'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest at 0x1bfae596d48>

此时逻辑如下:

                                                                   +
|
|
v
+-------+---------------+
| billiard.pool.Pool |
+-------+---------------+
|
|
+---------------------------+ |
| TaskHandler | |
| | | self._taskqueue.put
| _taskqueue | <-------------------------------+
| |
+------------+--------------+
|
| put(task) Pool
|
+-------------------------------------------------------------------------------------+
|
| get billiard.pool.Worker Sub process
v
+----------------+------+ +--------------------------------------------------+
| workloop | | app.tasks |
| | | |
| wait_for_job | |'celery.chord' = @task: celery.chord of myTest |
| | |'celery.chunks' = @task: celery.chunks of myTest |
| app.tasks[name] <-------------+'celery.group' = @task: celery.group of myTest> |
| | | ...... |
| | | |
+-----------------------+ +--------------------------------------------------+

手机如下:

3.3.3 调用任务

既然得到了要调用哪一个任务,我们就看看如何调用。

3.3.3.1 获取任务

由上面可知,回调函数是从父进程传过来的,即

fun = {function} <function _trace_task_ret at 0x000001BFAE53EA68>

_trace_task_ret 的定义在celery\app\trace.py。

逻辑为:

  • 获取 Celery 应用 到 app。

  • 提取消息内容等,更新 Request,比如:

    • request = {dict: 26}
      'lang' = {str} 'py'
      'task' = {str} 'myTest.add'
      'id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
      'shadow' = {NoneType} None
      'eta' = {NoneType} None
      'expires' = {NoneType} None
      'group' = {NoneType} None
      'group_index' = {NoneType} None
      'retries' = {int} 0
      'timelimit' = {list: 2} [None, None]
      'root_id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
      'parent_id' = {NoneType} None
      'argsrepr' = {str} '(2, 8)'
      'kwargsrepr' = {str} '{}'
      'origin' = {str} 'gen17060@DESKTOP-0GO3RPO'
      'reply_to' = {str} '5a520373-7712-3326-9ce8-325df14aa2ad'
      'correlation_id' = {str} 'a8928c1e-1e56-4502-9929-80a01b1bbfd8'
      'hostname' = {str} 'DESKTOP-0GO3RPO'
      'delivery_info' = {dict: 4} {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}
      'args' = {list: 2} [2, 8]
      'kwargs' = {dict: 0} {}
      'is_eager' = {bool} False
      'callbacks' = {NoneType} None
      'errbacks' = {NoneType} None
      'chain' = {NoneType} None
      'chord' = {NoneType} None
      __len__ = {int} 26
  • 从task 名字得倒 用户Task

  • 利用 request 调用 用户Task。

具体代码如下:

def trace_task(task, uuid, args, kwargs, request={}, **opts):
"""Trace task execution."""
try:
if task.__trace__ is None:
task.__trace__ = build_tracer(task.name, task, **opts)
return task.__trace__(uuid, args, kwargs, request) # 调用在strategy更新时写入的方法 def _trace_task_ret(name, uuid, request, body, content_type,
content_encoding, loads=loads_message, app=None,
**extra_request): app = app or current_app._get_current_object() # 获取Celery 应用 embed = None
if content_type:
accept = prepare_accept_content(app.conf.accept_content)
args, kwargs, embed = loads(
body, content_type, content_encoding, accept=accept,
)
else:
args, kwargs, embed = body request.update({
'args': args, 'kwargs': kwargs,
'hostname': hostname, 'is_eager': False,
}, **embed or {}) R, I, T, Rstr = trace_task(app.tasks[name],
uuid, args, kwargs, request, app=app) # 调用trace_task执行task return (1, R, T) if I else (0, Rstr, T) trace_task_ret = _trace_task_ret

此时变量为:

accept = {set: 1} {'application/json'}
app = {Celery} <Celery myTest at 0x1bfae596d48>
args = {list: 2} [2, 8]
body = {bytes: 81} b'[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]'
content_encoding = {str} 'utf-8'
content_type = {str} 'application/json'
embed = {dict: 4} {'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}
extra_request = {dict: 0} {}
kwargs = {dict: 0} {}
loads = {method} <bound method SerializerRegistry.loads of <kombu.serialization.SerializerRegistry object at 0x000001BFAE329408>>
name = {str} 'myTest.add'
request = {dict: 26} {'lang': 'py', 'task': 'myTest.add', 'id': '2c6d431f-a86a-4972-886b-472662401d20', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2c6d431f-a86a-4972-886b-472662401d20',
uuid = {str} '2c6d431f-a86a-4972-886b-472662401d20'
3.3.3.2 调用任务

调用时候用到了trace_task,其定义如下:

def trace_task(task, uuid, args, kwargs, request=None, **opts):
"""Trace task execution."""
request = {} if not request else request
try:
if task.__trace__ is None:
task.__trace__ = build_tracer(task.name, task, **opts)
return task.__trace__(uuid, args, kwargs, request)

在update_stragegy时传入的方法是,

task.__trace__ = build_tracer(name, task, loader, self.hostname,
app=self.app)

build_tracer函数的部分解析是,

def build_tracer(name, task, loader=None, hostname=None, store_errors=True,
Info=TraceInfo, eager=False, propagate=False, app=None,
monotonic=monotonic, truncate=truncate,
trace_ok_t=trace_ok_t, IGNORE_STATES=IGNORE_STATES): fun = task if task_has_custom(task, '__call__') else task.run # 获取task对应的run函数 ...
def trace_task(uuid, args, kwargs, request=None):
# R - is the possibly prepared return value.
# I - is the Info object.
# T - runtime
# Rstr - textual representation of return value
# retval - is the always unmodified return value.
# state - is the resulting task state. # This function is very long because we've unrolled all the calls
# for performance reasons, and because the function is so long
# we want the main variables (I, and R) to stand out visually from the
# the rest of the variables, so breaking PEP8 is worth it ;) R = I = T = Rstr = retval = state = None
task_request = None
time_start = monotonic()
...
# -*- TRACE -*-
try:
R = retval = fun(*args, **kwargs) # 执行对应的函数
state = SUCCESS
except Reject as exc:
...
return trace_task

此时调用的 fun 函数才是task本来应该执行的函数(myTest.add),此时就执行了对应task并获得了函数执行的返回结果

至此,一个消费的过程就完成了。

从下文开始,我们介绍 Celery 的一些辅助功能,比如负载均衡,容错等等。

0xFF 参考

celery源码分析-Task的初始化与发送任务

Celery 源码解析三: Task 对象的实现

分布式任务队列 Celery —— 详解工作流

[源码分析]并行分布式任务队列 Celery 之 子进程处理消息的更多相关文章

  1. [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat

    [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat 目录 [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat 0 ...

  2. [源码解析] 并行分布式任务队列 Celery 之 Task是什么

    [源码解析] 并行分布式任务队列 Celery 之 Task是什么 目录 [源码解析] 并行分布式任务队列 Celery 之 Task是什么 0x00 摘要 0x01 思考出发点 0x02 示例代码 ...

  3. [源码解析] 并行分布式任务队列 Celery 之 消费动态流程

    [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 目录 [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 0x00 摘要 0x01 来由 0x02 逻辑 in komb ...

  4. [源码解析] 并行分布式任务队列 Celery 之 多进程模型

    [源码解析] 并行分布式任务队列 Celery 之 多进程模型 目录 [源码解析] 并行分布式任务队列 Celery 之 多进程模型 0x00 摘要 0x01 Consumer 组件 Pool boo ...

  5. [源码解析] 并行分布式任务队列 Celery 之 EventDispatcher & Event 组件

    [源码解析] 并行分布式任务队列 Celery 之 EventDispatcher & Event 组件 目录 [源码解析] 并行分布式任务队列 Celery 之 EventDispatche ...

  6. [源码解析] 并行分布式任务队列 Celery 之 负载均衡

    [源码解析] 并行分布式任务队列 Celery 之 负载均衡 目录 [源码解析] 并行分布式任务队列 Celery 之 负载均衡 0x00 摘要 0x01 负载均衡 1.1 哪几个 queue 1.1 ...

  7. [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Mingle

    [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Mingle 目录 [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Ming ...

  8. [源码解析] 并行分布式框架 Celery 之架构 (2)

    [源码解析] 并行分布式框架 Celery 之架构 (2) 目录 [源码解析] 并行分布式框架 Celery 之架构 (2) 0x00 摘要 0x01 上文回顾 0x02 worker的思考 2.1 ...

  9. [源码解析] 并行分布式框架 Celery 之架构 (1)

    [源码解析] 并行分布式框架 Celery 之架构 (1) 目录 [源码解析] 并行分布式框架 Celery 之架构 (1) 0x00 摘要 0x01 Celery 简介 1.1 什么是 Celery ...

随机推荐

  1. Spring IoC总结

    Spring 复习 1.Spring IoC 1.1 基本概念 1.1.1 DIP(Dependency Inversion Principle) 字面意思依赖反转原则,即调用某个类的构造器创建对象时 ...

  2. docker封装Spring Cloud(单机版)

    一.概述 微服务统一在一个git项目里面,项目的大致结构如下: ./ ├── auth-server │ ├── pom.xml │ └── src ├── common │ ├── pom.xml ...

  3. RabbitMQ(三) SpringBoot2.x 集成 RabbitMQ

    3-1 RabbitMQ 整合 SpringBoot2.x 生产者发送消息 创建 SpringBoot 项目application.properties 配置 spring.rabbitmq.host ...

  4. 学习java之基础语法(一)

    学习java之基础语法(一) 基本语法 编写 Java 程序时,应注意以下几点: 大小写敏感:Java 是大小写敏感的,这就意味着标识符 Hello 与 hello 是不同的. 类名:对于所有的类来说 ...

  5. roarctf_2019_realloc_magic

    目录 roarctf_2019_realloc_magic 总结 题目分析 checksec 函数分析 解题思路 初步解题思路 存在的问题 问题解决方案 最终解决思路 编写exp exp说明 roar ...

  6. 测试平台系列(3) 给Hello World添加日志

    给Hello World添加日志 回顾 通过上篇内容,我们已经使用「Flask」完成了我们的第一个接口.我们可以看到,使用「Flask」来编写接口是十分简单的.那么接下来,我们丰富一下上面的例子. 需 ...

  7. redis过期key监听事件

    目录 redis安装 docker拉取 启动 redis 配置 命令监听 问题 程序监听 具体监听类 效果 总结 redis常用语缓存操作,但是redis功能不仅仅于此.今天我们来看看redis的ke ...

  8. Excel查分系统搭建小技巧

    推荐一个教师必备工具"Yichafen",是一个在线查分系统,全国8000所高校都在用,三分钟极速创建发布查分系统 在工作学习中,我们经常会遇到查分系统这样的问题.培根说过:读书足 ...

  9. redhat安装python3.7

    下载并解压: 1 wget https://www.python.org/ftp/python/3.7.2/Python-3.7.2.tgz 2 tar -xzvf Python-3.7.2.tgz ...

  10. IPFS挖矿的成本有哪些?

    IPFS作为区块链新贵,近来风头一时无量.截止3月9日,Filecoin以257亿的流通市值超越门罗币,稳居区块链流通排行榜. 无论什么投资,其门槛一定在成本.今天就和大家细说投资市面上常见实体矿机的 ...