Configuring and Running Django + Celery in Docker Containers

 Justyna Ilczuk  Oct 25, 2016  0 Comments

After reading this blog post, you will be able to configure Celery with Django, PostgreSQL, Redis, and RabbitMQ, and then run everything in Docker containers.

Today, you'll learn how to set up a distributed task processing system for quick prototyping. You will configure Celery with Django, PostgreSQL, Redis, and RabbitMQ, and then run everything in Docker containers. You'll need some working knowledge of Docker for this tutorial, which you can get in one my previous posts here.

Django is a well-known Python web framework, and Celery is a distributed task queue. You'll use PostgreSQL as a regular database to store jobs, RabbitMQ as message broker, and Redis as a task storage backend.

Motivation

When you build a web application, sooner or later you'll have to implement some kind of offline task processing.

Example:

Alice wants to convert her cat photos from .jpg to .png or create a .pdf from her collection of .jpg cat files. Doing either of these tasks in one HTTP request will take too long to execute and will unnecessarily burden the web server - meaning we can't serve other requests at the same time. The common solution is to execute the task in the background - often on another machine - and poll for the result.

A simple setup for an offline task processing could look like this:

1. Alice uploads a picture.
2. Web server schedules job on worker.
3. Worker gets job and converts photo.
4. Worker creates some result of the task (in this case, a converted photo).
5. Web browser polls for the result.
6. Web browser gets the result from the server.

This setup looks clear, but it has a serious flaw - it doesn't scale well. What if Alice has a lot of cat pictures and one server wouldn't be enough to process them all at once? Or, if there was some other very big job and all other jobs would be blocked by it? Does she care if all of the images are processed at once? What if processing fails at some point?

Frankly, there is a solution that won't kill your machine every time you get a bigger selection of images. You need something between the web server and worker: a broker. The web server would schedule new tasks by communicating with the broker, and the broker would communicate with workers to actually execute these tasks. You probably also want to buffer your tasks, retry if they fail, and monitor how many of them were processed.

You would have to create queues for tasks with different priorities, or for those suitable for different kinds of workers.

All of this can be greatly simplified by using Celery - an open-source, distributed tasks queue. It works like a charm after you configure it - as long as you do so correctly.

How Celery is built

Celery consists of:

  • Tasks, as defined in your app
  • A broker that routes tasks to workers and queues
  • Workers doing the actual work
  • A storage backend

You can watch a more in-depth introduction to Celery here or jump straight to Celery's getting started guide.

Your setup

Start with the standard Django project structure. It can be created with django-admin, by running in shell:

$ django-admin startproject myproject

Which creates a project structure:

.
└── myproject
├── manage.py
└── myproject
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py

At the end of this tutorial, it'll look like this:

.
├── Dockerfile
├── docker-compose.yml
├── myproject
│ ├── manage.py
│ └── myproject
│ ├── celeryconf.py
│ ├── __init__.py
│ ├── models.py
│ ├── serializers.py
│ ├── settings.py
│ ├── tasks.py
│ ├── urls.py
│ ├── views.py
│ └── wsgi.py
├── requirements.txt
├── run_celery.sh
└── run_web.sh

Creating containers

Since we are working with Docker 1.12, we need a proper Dockerfile to specify how our image will be built.

Custom container

Dockerfile

# use base python image with python 2.7
FROM python:2.7 # add requirements.txt to the image
ADD requirements.txt /app/requirements.txt # set working directory to /app/
WORKDIR /app/ # install python dependencies
RUN pip install -r requirements.txt # create unprivileged user
RUN adduser --disabled-password --gecos '' myuser

Our dependencies are:

requirements.txt

Django==1.9.8
celery==3.1.20
djangorestframework==3.3.1
psycopg2==2.5.3
redis==2.10.5

I've frozen versions of dependencies to make sure that you will have a working setup. If you wish, you can update any of them, but it's not guaranteed to work.

Choosing images for services

Now we only need to set up RabbitMQ, PostgreSQL, and Redis. Since Docker introduced its official library, I use its official images whenever possible. However, even these can be broken sometimes. When that happens, you'll have to use something else.

Here are images I tested and selected for this project:

Using docker-compose to set up a multicontainer app

Now you'll use docker-compose to combine your own containers with the ones we chose in the last section.

docker-compose.yml

version: '2'

services:
# PostgreSQL database
db:
image: postgres:9.4
hostname: db
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=postgres
ports:
- "5432:5432" # Redis
redis:
image: redis:2.8.19
hostname: redis # RabbitMQ
rabbit:
hostname: rabbit
image: rabbitmq:3.6.0
environment:
- RABBITMQ_DEFAULT_USER=admin
- RABBITMQ_DEFAULT_PASS=mypass
ports:
- "5672:5672" # we forward this port because it's useful for debugging
- "15672:15672" # here, we can access rabbitmq management plugin # Django web server
web:
build:
context: .
dockerfile: Dockerfile
hostname: web
command: ./run_web.sh
volumes:
- .:/app # mount current directory inside container
ports:
- "8000:8000"
# set up links so that web knows about db, rabbit and redis
links:
- db
- rabbit
- redis
depends_on:
- db # Celery worker
worker:
build:
context: .
dockerfile: Dockerfile
command: ./run_celery.sh
volumes:
- .:/app
links:
- db
- rabbit
- redis
depends_on:
- rabbit

Configuring the web server and worker

You've probably noticed that both the worker and web server run some starting scripts. Here they are (make sure they're executable):

run_web.sh

#!/bin/sh

# wait for PSQL server to start
sleep 10 cd myproject
# prepare init migration
su -m myuser -c "python manage.py makemigrations myproject"
# migrate db, so we have the latest db schema
su -m myuser -c "python manage.py migrate"
# start development server on public ip interface, on port 8000
su -m myuser -c "python manage.py runserver 0.0.0.0:8000"

run_celery.sh

#!/bin/sh

# wait for RabbitMQ server to start
sleep 10 cd myproject
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery worker -A myproject.celeryconf -Q default -n default@%h"

The first script - run_web.sh - will migrate the database and start the Django development server on port 8000. 
The second one , run_celery.sh, will start a Celery worker listening on a queue default.

At this stage, these scripts won't work as we'd like them to because we haven't yet configured them. Our app still doesn't know that we want to use PostgreSQL as the database, or where to find it (in a container somewhere). We also have to configure Redis and RabbitMQ.

But before we get to that, there are some useful Celery settings that will make your system perform better. Below are the complete settings of this Django app.

myproject/settings.py

import os

from kombu import Exchange, Queue

BASE_DIR = os.path.dirname(os.path.dirname(__file__))

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'megg_yej86ln@xao^+)it4e&ueu#!4tl9p1h%2sjr7ey0)m25f' # SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
TEMPLATE_DEBUG = True
ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = (
'rest_framework',
'myproject',
'django.contrib.sites',
'django.contrib.staticfiles', # required by Django 1.9
'django.contrib.auth',
'django.contrib.contenttypes', ) MIDDLEWARE_CLASSES = (
) REST_FRAMEWORK = {
'DEFAULT_PERMISSION_CLASSES': ('rest_framework.permissions.AllowAny',),
'PAGINATE_BY': 10
} ROOT_URLCONF = 'myproject.urls' WSGI_APPLICATION = 'myproject.wsgi.application' # Localization ant timezone settings TIME_ZONE = 'UTC'
USE_TZ = True CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC" LANGUAGE_CODE = 'en-us'
USE_I18N = True
USE_L10N = True # Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.7/howto/static-files/
STATIC_URL = '/static/' # Database Condocker-composeuration
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': os.environ.get('DB_ENV_DB', 'postgres'),
'USER': os.environ.get('DB_ENV_POSTGRES_USER', 'postgres'),
'PASSWORD': os.environ.get('DB_ENV_POSTGRES_PASSWORD', 'postgres'),
'HOST': os.environ.get('DB_PORT_5432_TCP_ADDR', 'db'),
'PORT': os.environ.get('DB_PORT_5432_TCP_PORT', ''),
},
} # Redis REDIS_PORT = 6379
REDIS_DB = 0
REDIS_HOST = os.environ.get('REDIS_PORT_6379_TCP_ADDR', 'redis') RABBIT_HOSTNAME = os.environ.get('RABBIT_PORT_5672_TCP', 'rabbit') if RABBIT_HOSTNAME.startswith('tcp://'):
RABBIT_HOSTNAME = RABBIT_HOSTNAME.split('//')[1] BROKER_URL = os.environ.get('BROKER_URL',
'')
if not BROKER_URL:
BROKER_URL = 'amqp://{user}:{password}@{hostname}/{vhost}/'.format(
user=os.environ.get('RABBIT_ENV_USER', 'admin'),
password=os.environ.get('RABBIT_ENV_RABBITMQ_PASS', 'mypass'),
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')) # We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10 # Celery configuration # configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
) # Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False # By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600 # Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1 # Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json'] CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000

Those settings will configure the Django app so that it will discover the PostgreSQL database, Redis cache, and Celery.

Now, it's time to connect Celery to the app. Create a file celeryconf.py and paste in this code:

myproject/celeryconf.py

import os

from celery import Celery
from django.conf import settings os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") app = Celery('myproject') CELERY_TIMEZONE = 'UTC' app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

That should be enough to connect Celery to our app, so the run_X scripts will work. You can read more about first steps with Django and Celery here.

Defining tasks

Celery looks for tasks inside the tasks.py file in each Django app. Usually, tasks are created either with a decorator, or by inheriting the Celery Task Class.

Here's how you can create a task using decorator:

@app.task
def power(n):
"""Return 2 to the n'th power"""
return 2 ** n

And here's how you can create a task by inheriting after the Celery Task Class:

class PowerTask(app.Task):
def run(self, n):
"""Return 2 to the n'th power"""
return 2 ** n

Both are fine and good for slightly different use cases.

myproject/tasks.py

from functools import wraps

from myproject.celeryconf import app
from .models import Job # decorator to avoid code duplication def update_job(fn):
"""Decorator that will update Job with result of the function""" # wraps will make the name and docstring of fn available for introspection
@wraps(fn)
def wrapper(job_id, *args, **kwargs):
job = Job.objects.get(id=job_id)
job.status = 'started'
job.save()
try:
# execute the function fn
result = fn(*args, **kwargs)
job.result = result
job.status = 'finished'
job.save()
except:
job.result = None
job.status = 'failed'
job.save()
return wrapper # two simple numerical tasks that can be computationally intensive @app.task
@update_job
def power(n):
"""Return 2 to the n'th power"""
return 2 ** n @app.task
@update_job
def fib(n):
"""Return the n'th Fibonacci number.
"""
if n < 0:
raise ValueError("Fibonacci numbers are only defined for n >= 0.")
return _fib(n) def _fib(n):
if n == 0 or n == 1:
return n
else:
return _fib(n - 1) + _fib(n - 2) # mapping from names to tasks TASK_MAPPING = {
'power': power,
'fibonacci': fib
}

Building an API for scheduling tasks

If you have tasks in your system, how do you run them? In this section, you'll create a user interface for job scheduling. In a backend application, the API will be your user interface. Let's use the Django REST Framework for your API.

To make it as simple as possible, your app will have one model and only one ViewSet (endpoint with many HTTP methods).

Create your model, called Job, in myproject/models.py.

from django.db import models

class Job(models.Model):
"""Class describing a computational job""" # currently, available types of job are:
TYPES = (
('fibonacci', 'fibonacci'),
('power', 'power'),
) # list of statuses that job can have
STATUSES = (
('pending', 'pending'),
('started', 'started'),
('finished', 'finished'),
('failed', 'failed'),
) type = models.CharField(choices=TYPES, max_length=20)
status = models.CharField(choices=STATUSES, max_length=20) created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
argument = models.PositiveIntegerField()
result = models.IntegerField(null=True) def save(self, *args, **kwargs):
"""Save model and if job is in pending state, schedule it"""
super(Job, self).save(*args, **kwargs)
if self.status == 'pending':
from .tasks import TASK_MAPPING
task = TASK_MAPPING[self.type]
task.delay(job_id=self.id, n=self.argument)

Then create a serializerview, and URL configuration to access it.

myproject/serializers.py

from rest_framework import serializers

from .models import Job

class JobSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = Job

myproject/views.py

from rest_framework import mixins, viewsets

from .models import Job
from .serializers import JobSerializer class JobViewSet(mixins.CreateModelMixin,
mixins.ListModelMixin,
mixins.RetrieveModelMixin,
viewsets.GenericViewSet):
"""
API endpoint that allows jobs to be viewed or created.
"""
queryset = Job.objects.all()
serializer_class = JobSerializer

myproject/urls.py

from django.conf.urls import url, include
from rest_framework import routers from myproject import views router = routers.DefaultRouter()
# register job endpoint in the router
router.register(r'jobs', views.JobViewSet) # Wire up our API using automatic URL routing.
# Additionally, we include login URLs for the browsable API.
urlpatterns = [
url(r'^', include(router.urls)),
url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework'))
]

For completeness, there is also myproject/wsgi.py, defining WSGI config for the project:

import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

and manage.py

#!/usr/bin/env python
import os
import sys if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") from django.core.management import execute_from_command_line execute_from_command_line(sys.argv)

Leave __init__.py empty.

That's all. Uh... lots of code. Luckily, everything is on GitHub, so you can just fork it.

Running the setup

Since everything is run from Docker Compose, make sure you have both Docker and Docker Compose installed before you try to start the app:

$ cd /path/to/myproject/where/is/docker-compose.yml
$ docker-compose build
$ docker-compose up

The last command will start five different containers, so just start using your API and have some fun with Celery in the meantime.

Accessing the API

Navigate in your browser to 127.0.0.1:8000 to browse your API and schedule some jobs.

Scale things out

Currently, we have only one instance of each container. We can get information about our group of containers with the docker-compose ps command.

$ docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------------
dockerdjangocelery_db_1 /docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp
dockerdjangocelery_rabbit_1 /docker-entrypoint.sh rabb ... Up 0.0.0.0:15672->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp
dockerdjangocelery_redis_1 /entrypoint.sh redis-server Up 6379/tcp
dockerdjangocelery_web_1 ./run_web.sh Up 0.0.0.0:8000->8000/tcp
dockerdjangocelery_worker_1 ./run_celery.sh Up

Scaling out a container with docker-compose is extremely easy. Just use the docker-compose scale command with the container name and amount:

$ docker-compose scale worker=5
Creating and starting dockerdjangocelery_worker_2 ... done
Creating and starting dockerdjangocelery_worker_3 ... done
Creating and starting dockerdjangocelery_worker_4 ... done
Creating and starting dockerdjangocelery_worker_5 ... done

Output says that docker-compose just created an additional four worker containers for us. We can double-check it with the docker-compose ps command again:

$ docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------------
dockerdjangocelery_db_1 /docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp
dockerdjangocelery_rabbit_1 /docker-entrypoint.sh rabb ... Up 0.0.0.0:15672->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp
dockerdjangocelery_redis_1 /entrypoint.sh redis-server Up 6379/tcp
dockerdjangocelery_web_1 ./run_web.sh Up 0.0.0.0:8000->8000/tcp
dockerdjangocelery_worker_1 ./run_celery.sh Up
dockerdjangocelery_worker_2 ./run_celery.sh Up
dockerdjangocelery_worker_3 ./run_celery.sh Up
dockerdjangocelery_worker_4 ./run_celery.sh Up
dockerdjangocelery_worker_5 ./run_celery.sh Up

You'll see there five powerful Celery workers. Nice!

Summary

Congrats! You just married Django with Celery to build a distributed asynchronous computation system. I think you'll agree it was pretty easy to build an API, and even easier to scale workers for it! However, life isn't always so nice to us, and sometimes we have to troubleshoot.

Contribution

Original article written by Justyna Ilczuk, updated by Michał Kobus.

ENGINEERING | DOCKER | CELERY | DJANGO | DOCKER COMPOSEShare:

Configuring and Running Django + Celery in Docker Containers的更多相关文章

  1. Running Elixir in Docker Containers

    转自:https://www.poeticoding.com/running-elixir-in-docker-containers/ One of the wonderful things abou ...

  2. Understanding how uid and gid work in Docker containers

    转自:https://medium.com/@mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf Un ...

  3. Removing Docker Containers and Images

    Removing Docker Containers and Images In a recent post aboutDocker, we looked into some things that ...

  4. [Docker] Run Short-Lived Docker Containers

    Learn the benefits of running one-off, short-lived Docker containers. Short-Lived containers are use ...

  5. [Docker] Run, Stop and Remove Docker Containers

    In this lesson, we'll find out the basics of running Docker containers. We'll go over how to downloa ...

  6. [Docker] Prune Old Unused Docker Containers and Images

    In this lesson, we will look at docker container prune to remove old docker containers. We can also ...

  7. Django+Celery+xadmin实现异步任务和定时任务

    Django+Celery+xadmin实现异步任务和定时任务 关注公众号"轻松学编程"了解更多. 一.celery介绍 1.简介 [官网]http://www.celerypro ...

  8. django+celery+redis环境搭建

    初次尝试搭建django+celery+redis环境,记录下来,慢慢学习~ 1.安装apache 下载httpd-2.0.63.tar.gz,解压tar zxvf httpd-2.0.63.tar. ...

  9. django celery redis 定时任务

    0.目的 在开发项目中,经常有一些操作时间比较长(生产环境中超过了nginx的timeout时间),或者是间隔一段时间就要执行的任务. 在这种情况下,使用celery就是一个很好的选择.   cele ...

随机推荐

  1. bzoj 3566: [SHOI2014]概率充电器 数学期望+换根dp

    题意:给定一颗树,树上每个点通电概率为 $q[i]$%,每条边通电的概率为 $p[i]$%,求期望充入电的点的个数. 期望在任何时候都具有线性性,所以可以分别求每个点通电的概率(这种情况下期望=概率 ...

  2. COGS 1583. [POJ3237]树的维护

    二次联通门 : COGS 1583. [POJ3237]树的维护 /* COGS 1583. [POJ3237]树的维护 树链剖分 + 边权化点权 线段树 单点修改 + 区间取相反数 + 查询区间最大 ...

  3. P2502 [HAOI2006]旅行——暴力和并查集的完美结合

    P2502 [HAOI2006]旅行 一定要看清题目数据范围再决定用什么算法,我只看着是一个蓝题就想到了记录最短路径+最小生成树,但是我被绕进去了: 看到只有5000的边,我们完全可以枚举最小边和最大 ...

  4. Codeforces 1239E. Turtle 折半

    原文链接www.cnblogs.com/zhouzhendong/p/CF1239E.html 前言 咕了这么久之后,我的博客复活了! 题解 结论1 存在一个最优解\(A\)数组,满足\(\foral ...

  5. 解决Spring Boot 拦截器注入service为空的问题

    问题:在自定义拦截器中,使用了@Autowaire注解注入了封装JPA方法的Service,结果发现无法注入,注入的service为空 0.原因分析 拦截器加载的时间点在springcontext之前 ...

  6. StarUML自动生成Java代码

    下载一个starUML 链接:https://pan.baidu.com/s/1pIGNVmhtwBxMrCG9LHdkCQ 提取码:c4i6 复制这段内容后打开百度网盘手机App,操作更方便哦 添加 ...

  7. crontab 定时访问指定url,定时脚本

    一.contab格式说明 二.定时访问url: 1.连接远程主机,连接成功后,输入命令  crontab -e : 2.参照VI编辑器.按字母 i 进入编辑模式,输入需要执行的脚本:(在这里之前要检查 ...

  8. Spring Boot|监控-Actuator

    Spring Boot 为我们提供了一个生产级特性-Actuator,包含很多实际有用的API,下面我们就一起来看看这些API. 一.Actuator 首先在程序中引入Actuator <!-- ...

  9. ssh端口映射总结

    版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/boliang319/article/det ...

  10. php中函数 isset(), empty(), is_null() 的区别

    NULL:当你在你的脚本中写下这样一行代码 $myvariable; //此处你想定义一个变量,但未赋值.会有Notice: Undefined variable echo $myvariable + ...