https://www.syncano.io/blog/configuring-running-django-celery-docker-containers-pt-1/

Update: Fig has been replaced by Docker Compose, and is now deprecated. Docker Compose should serve as a drop-in replacement for fig.sh used in this article.

Today, you'll learn how to setup a distributed task processing system for quick prototyping. You will configure Celery with Django, Postgres, Redis, and Rabbitmq, and then run everything in Docker containers. You'll need some working knowledge of Docker for this tutorial, which you can get in one my previous posts here.

Django is a well-known Python web framework, and Celery is a distributed task queue. You'll use Postgres as a regular database to store jobs, Rabbitmq to route tasks to different queues, and Redis as a task storage backend.

Note: Although I don't demonstrate it in this post, Redis can be used in a variety of different ways: 
- As a key value store - As a cache - To publish and/or subscribe - For distributed locking

Motivation

When you build a web application, sooner or later you'll have to implement some kind of offline task processing.

Example: 
A user wants to convert her cat photos from .jpg to .png or create a .pdf from her collection of .jpg cat files. Doing either of these tasks in one HTTP request will take too long to execute and will unnecessarily burden the web server - meaning we can't serve other requests at the same time. The common solution is to execute the task in the background - often on another machine - and poll for the result.

A simple setup for an offline task processing could look like this:

  • User uploads cat picture
  • Web server schedules job on worker
  • Worker gets job and converts photo
  • Worker creates some result of the task (in this case, a converted photo)
  • Web browser polls for the result
  • Web browser gets the result from db/redis

This setup looks nice, but it has one flaw - it doesn't scale. What if she has a lot of cat pictures and one server wouldn't be enough? Or if there was some other very big job and all other jobs would be blocked by it? This is why you need to be prepared to scale.

To scale, you need something between the web server and worker: a broker. The web server would schedule new tasks by communicating with the broker, and the broker would communicate with the workers. You probably also want to buffer your tasks, retry if they fail, and monitor how many of them were processed.

You would have to create queues for tasks with different priorities or for those suitable for a different kind of worker.

All of this can be greatly simplified by using Celery - an open source distributed tasks queue. It works like a charm after you configure it -as long as you do so correctly.

How Celery is built

Celery consists of:

  • Tasks, as defined in your app
  • A broker that routes tasks to workers and queues
  • Workers doing the actual work
  • A storage backend

You can watch a more in-depth introduction to Celery here or jump straight to Celery's getting started guide.

Rapid prototyping

Sooner or later, you will end up with a pretty complex distributed system - and distributed systems have fallacies that you should be aware of:

  • Messages travel with a finite speed
  • Services are occasionally unavailable or unreliable
  • When you run tasks in parallel, they can run into race condition
  • Deadlocks
  • Data corruption
  • Lost tasks

With Docker, it's much easier to test solutions on a system level - by prototyping different task designs and the interactions between them.

Your setup

Start with the standard Django project structure. It can be created with django-admin, if you have it installed.

$ tree -I *.pyc
.
├── Dockerfile
├── fig.yml
├── myproject
│ ├── manage.py
│ └── myproject
│ ├── celeryconf.py
│ ├── __init__.py
│ ├── models.py
│ ├── serializers.py
│ ├── settings.py
│ ├── tasks.py
│ ├── urls.py
│ ├── views.py
│ └── wsgi.py
├── README.md
├── requirements.txt
├── run_celery.sh
└── run_web.sh

Creating containers

Since we are working with Docker, we need a proper Dockerfile to specify how our image will be built.

Custom container

Dockerfile

# use base python image with python 2.7
FROM python:2.7 # add requirements.txt to the image
ADD requirements.txt /app/requirements.txt # set working directory to /app/
WORKDIR /app/ # install python dependencies
RUN pip install -r requirements.txt # create unprivileged user
RUN adduser --disabled-password --gecos '' myuser

Our dependencies are: 
requirements.txt

django==1.7.2
celery==3.1.17
Djangorestframework==3.0.3
psycopg2==2.5.4
redis==2.10.3

I've frozen versions of dependencies to make sure that you will have a working setup. If you wish, you can update any of them, but it's not guaranteed to work.

Choosing images for services

Now we only need to set up Rabbitmq, Postgresql, and Redis. Since Docker introduced its official library, I use their official images whenever possible. However, even these can be broken sometimes. When that happens, you'll have to use something else.

Here are the images I tested and selected for this project:

Using Fig.sh to set up multicontainer app

Now you'll use fig.sh to combine your own containers with the ones we chose in the last section. If you're not familiar with Fig.sh, check out my post on making your Docker workflow awesome with fig.

fig.yml

# database container
db:
image: postgres:9.4
environment:
- POSTGRES_PASSWORD=mysecretpassword
# redis container
redis:
image: redis:2.8.19
# rabbitmq container
rabbitmq:
image: tutum/rabbitmq
environment:
- RABBITMQ_PASS=mypass
ports:
- "5672:5672" # we forward this port because it's useful for debugging
- "15672:15672" # here, we can access rabbitmq management plugin
# container with Django web server
web:
build: . # build using default Dockerfile
command: ./run_web.sh
volumes:
- .:/app # mount current directory inside container
ports:
- "8000:8000"
# set up links so that web knows about db, rabbit and redis
links:
- db:db
- rabbitmq:rabbit
- redis:redis
# container with redis worker
worker:
build: .
command: ./run_celery.sh
volumes:
- .:/app
links:
- db:db
- rabbitmq:rabbit
- redis:redis

Configuring the webserver and worker

You've probably noticed that both the worker and web server run some starting scripts. Here they are:

run_web.sh

#!/bin/sh

cd myproject
# migrate db, so we have the latest db schema
su -m myuser -c "python manage.py migrate"
# start development server on public ip interface, on port 8000
su -m myuser -c "python manage.py runserver 0.0.0.0:8000"

run_celery.sh

#!/bin/sh

cd myproject
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery worker -A myproject.celeryconf -Q default -n default@%h"

The first script - runweb.sh - will migrate the database and start django development server on port 8000. 
Ths second one , run
celery.sh, will start a celery worker listening on a queuedefault.

At this stage, these scripts won't work as we'd like them to because we haven't yet configured them. Our app still doesn't know that we want to use Postgres as database and where to find it (in a container somewhere). We also have to configure Redis and Rabbitmq.

But before we get to that, there are some useful Celery settings that will make your system perform better. Below are complete settings of this django app.

myproject/settings.py

import os

from kombu import Exchange, Queue

BASE_DIR = os.path.dirname(os.path.dirname(__file__))

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'megg_yej86ln@xao^+)it4e&ueu#!4tl9p1h%2sjr7ey0)m25f' # SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
TEMPLATE_DEBUG = True
ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = (
'django.contrib.staticfiles', 'rest_framework',
'myproject',
) MIDDLEWARE_CLASSES = (
) REST_FRAMEWORK = {
'DEFAULT_PERMISSION_CLASSES': ('rest_framework.permissions.AllowAny',),
'PAGINATE_BY': 10
} ROOT_URLCONF = 'myproject.urls' WSGI_APPLICATION = 'myproject.wsgi.application' # Localization ant timezone settings TIME_ZONE = 'UTC'
USE_TZ = True CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC" LANGUAGE_CODE = 'en-us'
USE_I18N = True
USE_L10N = True # Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.7/howto/static-files/
STATIC_URL = '/static/' # Database Configuration
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': os.environ.get('DB_ENV_DB', 'postgres'),
'USER': os.environ.get('DB_ENV_POSTGRES_USER', 'postgres'),
'PASSWORD': os.environ.get('DB_ENV_POSTGRES_PASSWORD', ''),
'HOST': os.environ.get('DB_PORT_5432_TCP_ADDR', ''),
'PORT': os.environ.get('DB_PORT_5432_TCP_PORT', ''),
},
} # Redis REDIS_PORT = 6379
REDIS_DB = 0
REDIS_HOST = os.environ.get('REDIS_PORT_6379_TCP_ADDR', '127.0.0.1') RABBIT_HOSTNAME = os.environ.get('RABBIT_PORT_5672_TCP', 'localhost:5672') if RABBIT_HOSTNAME.startswith('tcp://'):
RABBIT_HOSTNAME = RABBIT_HOSTNAME.split('//')[1] BROKER_URL = os.environ.get('BROKER_URL',
'')
if not BROKER_URL:
BROKER_URL = 'amqp://{user}:{password}@{hostname}/{vhost}/'.format(
user=os.environ.get('RABBIT_ENV_USER', 'admin'),
password=os.environ.get('RABBIT_ENV_RABBITMQ_PASS', 'mypass'),
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')) # We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10 # Celery configuration # configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
) # Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False # By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600 # Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1 # Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json'] CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000

Those settings will configure django app so that it will discover PostgreSQL database, redis cache and celery.

Now, it's time to connect Celery to the app. Create file celeryconf.py and paste in this code:

myproject/celeryconf.py

import os

from celery import Celery
from django.conf import settings os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") app = Celery('myproject') CELERY_TIMEZONE = 'UTC' app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

That should be enough to connect Celery to our app so the run_X scripts will work. You can read more about first steps with Django and Celery here.

Defining tasks

Celery looks for tasks inside the tasks.py file in each Django app. Usually, tasks are created either with decorator or by inheriting after the Celery Task class.

Here's how you can create a task using decorator:

@app.task
def power(n):
"""Return 2 to the n'th power"""
return 2 ** n

And here's how you can create a task by inheriting after the Celery Task class:

class PowerTask(app.Task):
def run(self, n):
"""Return 2 to the n'th power"""
return 2 ** n

Both are fine and good for slightly different use cases.

myproject/tasks.py

from functools import wraps

from myproject.celeryconf import app
from .models import Job # decorator to avoid code duplication def update_job(fn):
"""Decorator that will update Job with result of the function""" # wraps will make the name and docstring of fn available for introspection
@wraps(fn)
def wrapper(job_id, *args, **kwargs):
job = Job.objects.get(id=job_id)
job.status = 'started'
job.save()
try:
# execute the function fn
result = fn(*args, **kwargs)
job.result = result
job.status = 'finished'
job.save()
except:
job.result = None
job.status = 'failed'
job.save()
return wrapper # two simple numerical tasks that can be computationally intensive @app.task
@update_job
def power(n):
"""Return 2 to the n'th power"""
return 2 ** n @app.task
@update_job
def fib(n):
"""Return the n'th Fibonacci number.
"""
if n < 0:
raise ValueError("Fibonacci numbers are only defined for n >= 0.")
return _fib(n) def _fib(n):
if n == 0 or n == 1:
return n
else:
return _fib(n - 1) + _fib(n - 2) # mapping from names to tasks TASK_MAPPING = {
'power': power,
'fibonacci': fib
}

Building an API for scheduling tasks

If you have tasks in your system, how do you run them? In this section, you'll create a user interface for job scheduling. In a backend application, the API will be your user interface. Let's use the Django REST Framework for your API.

To make it as simple as possible, your app will have one model and only one ViewSet (endpoint with many HTTP methods).

Create your model, called Job, in myproject/models.py.

from django.db import models

class Job(models.Model):
"""Class describing a computational job""" # currently, available types of job are:
TYPES = (
('fibonacci', 'fibonacci'),
('power', 'power'),
) # list of statuses that job can have
STATUSES = (
('pending', 'pending'),
('started', 'started'),
('finished', 'finished'),
('failed', 'failed'),
) type = models.CharField(choices=TYPES, max_length=20)
status = models.CharField(choices=STATUSES, max_length=20) created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
argument = models.PositiveIntegerField()
result = models.IntegerField(null=True) def save(self, *args, **kwargs):
"""Save model and if job is in pending state, schedule it"""
super(Job, self).save(*args, **kwargs)
if self.status == 'pending':
from .tasks import TASK_MAPPING
task = TASK_MAPPING[self.type]
task.delay(job_id=self.id, n=self.argument)

Then create a serializerview and url configuration to access it.

myproject/serializers.py

from rest_framework import serializers

from .models import Job

class JobSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = Job

myproject/views.py

from rest_framework import mixins, viewsets

from .models import Job
from .serializers import JobSerializer class JobViewSet(mixins.CreateModelMixin,
mixins.ListModelMixin,
mixins.RetrieveModelMixin,
viewsets.GenericViewSet):
"""
API endpoint that allows jobs to be viewed or created.
"""
queryset = Job.objects.all()
serializer_class = JobSerializer

myproject/urls.py

from django.conf.urls import url, include
from rest_framework import routers from myproject import views router = routers.DefaultRouter()
# register job endpoint in the router
router.register(r'jobs', views.JobViewSet) # Wire up our API using automatic URL routing.
# Additionally, we include login URLs for the browsable API.
urlpatterns = [
url(r'^', include(router.urls)),
url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework'))
]

For completeness, there is also myproject/wsgi.py defining wsgi config for the project:

import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

and manage.py

#!/usr/bin/env python
import os
import sys if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings") from django.core.management import execute_from_command_line execute_from_command_line(sys.argv)

_init_.py is traditionally empty.

That's all. Uh... lots of code. Luckily everything is on github, so you can just fork it.

Running the setup

Since everything is run from Fig, make sure you have both Docker and Fig installed before you try to start the app:

$ cd /path/to/myproject/where/is/fig.yml
$ fig build
$ fig up

The last command will start five different containers, so just start using your API and have some fun with Celery in the mean time.

Accessing the API

Navigate in your browser to 127.0.0.1:8000 to browse your API and schedule some jobs.

Put this demo gif in the queue.

Scale things out

Currently we have only one instance of each container. We can get information about our group of containers with the fig ps command.

✗ fig ps

            Name                           Command               State                        Ports
-------------------------------------------------------------------------------------------------------------------------
dockerdjangocelery_db_1 /docker-entrypoint.sh postgres Up 5432/tcp
dockerdjangocelery_rabbitmq_1 /run.sh Up 0.0.0.0:15672->15672/tcp, 0.0.0.0:5672->5672/tcp
dockerdjangocelery_redis_1 /entrypoint.sh redis-server Up 6379/tcp
dockerdjangocelery_web_1 ./run_web.sh Up 0.0.0.0:8000->8000/tcp
dockerdjangocelery_web_run_5 bash Up 8000/tcp
dockerdjangocelery_worker_1 ./run_Celery.sh Up

Scaling out a container with Fig is extremely easy. Just use the fig scale command with the container name and amount:

✗ fig scale worker=5

Starting dockerdjangocelery_worker_2...
Starting dockerdjangocelery_worker_3...
Starting dockerdjangocelery_worker_4...
Starting dockerdjangocelery_worker_5...

Output says that Fig just created an additional four worker containers for us. We can double check it with the fig ps command again:

➜  docker-django-celery git:(master) ✗ fig ps

            Name                           Command               State                        Ports
-------------------------------------------------------------------------------------------------------------------------
dockerdjangocelery_db_1 /docker-entrypoint.sh postgres Up 5432/tcp
dockerdjangocelery_rabbitmq_1 /run.sh Up 0.0.0.0:15672->15672/tcp, 0.0.0.0:5672->5672/tcp
dockerdjangocelery_redis_1 /entrypoint.sh redis-server Up 6379/tcp
dockerdjangocelery_web_1 ./run_web.sh Up 0.0.0.0:8000->8000/tcp
dockerdjangocelery_web_run_5 bash Up 8000/tcp
dockerdjangocelery_worker_1 ./run_celery.sh Up
dockerdjangocelery_worker_2 ./run_celery.sh Up
dockerdjangocelery_worker_3 ./run_celery.sh Up
dockerdjangocelery_worker_4 ./run_celery.sh Up
dockerdjangocelery_worker_5 ./run_celery.sh Up

You'll see there five powerful Celery workers. Nice!

Summary

You just married Django with Celery to build a distributed asynchronous computation system. I think you'll agree it was pretty easy to build an API and even easier to scale workers for it! However, life isn't always so nice to us, and sometimes we have to troubleshoot.

Docker distrubution in django的更多相关文章

  1. [译]如何使用 Docker 组件开发 Django 项目?

    原文地址:Django Development With Docker Compose and Machine 以下为译文 Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包 ...

  2. 使用Docker官方的Django包【转】

    官方Django docker,并没有安装Django 所以需要 在requirements.txt中配置Django 具体安装流程可以参考:http://www.logme.cn/blog/51/u ...

  3. docker简单使用+django+uwsgi+nginx项目部署

    使用docker 搭建 centos7 环境: 主机环境:windows 10专业版 一.安装docker Hub.docker.com官网下载 docker for windows 安装完成后,任务 ...

  4. python open-falcon docker.WEB developers---flask,---django.

    http://www.verydemo.com/demo_c281_i2477.html (python Gevent – 高性能的Python并发框架) http://www.django-rest ...

  5. 使用 Docker 让部署 Django 项目更加轻松

    作者:HelloGitHub-追梦人物 文中涉及的示例代码,已同步更新到 HelloGitHub-Team 仓库 之前一系列繁琐的部署步骤让我们感到痛苦.这些痛苦包括: 要去服务器上执行 n 条命令 ...

  6. docker中部署django项目~~Dockfile方式和compose方式

    1.  背景:   本机win10上,后端django框架代码与前端vue框架代码联调通过. 2.  目的:   在centos7系统服务器上使用docker容器部署该项目. 3.  方案一:仅使用基 ...

  7. nginx反向代理docker容器化django

    1.新建Dockerfile FROM python:3.8.5 MAINTAINER ChsterChen ENV PYTHONUNBUFFERED 1 COPY pip.conf /root/.p ...

  8. Docker 快速部署 Django项目到云服务器

    项目结构: 1,dockerfile FROM python:3.7 RUN mkdir -p /usr/src/app WORKDIR /usr/src/app COPY pip.conf /roo ...

  9. Docker 使用指南 (六)—— 使用 Docker 部署 Django 容器栈

    版权声明:本文由田飞雨原创文章,转载请注明出处: 文章原文链接:https://www.qcloud.com/community/article/98 来源:腾云阁 https://www.qclou ...

随机推荐

  1. [转]Windows 8.1删除这台电脑中视频/文档/下载等六个文件夹的方法

    Windows 8.1 已将“计算机”正式更名为“这台电脑”,当我们双击打开“这台电脑”后,也会很明显得发现另外一些变化:Windows 8.1  默认将视频.图片.文档.下载.音乐.桌面等常用文件夹 ...

  2. 关于Hellas和Greece

    一直以来我就好奇,为什么希腊的中文名字“希腊”和英文名字”Greece”听起来都不像(就像“德国”不像“Germany”一样),而且,为什么在很多体育比赛中看到希腊运动员的衣服上都是“Hellas”, ...

  3. IOS自学

    初识IOS 开发工具:xcode , 第一步学习c 打开xcode 新建一个object #include<stdio.h>//引入一个库,支持pringf输出功能 /* this is ...

  4. mybatis缓存学习笔记

    mybatis有两级缓存机制,一级缓存默认开启,可以在手动关闭:二级缓存默认关闭,可以手动开启.一级缓存为线程内缓存,二级缓存为线程间缓存. 一提缓存,必是查询.缓存的作用就是查询快.写操作只能使得缓 ...

  5. python作为一种胶水和c/c++

    如果需要用 Python 调用 C/C++ 编写的第三方库,只需要一个脚本语言来粘合它们.这个时候,用 Python ctypes 可以很方便地实现调用. StackOverflow 上的 Calli ...

  6. iOS 监听textfield的输入(转)

    1:首先 [textField addTarget:self action:@selector(textFieldDidChange:) forControlEvents:UIControlEvent ...

  7. Beta版本冲刺———第五天

    会议照片: 项目燃尽图: 1.项目进展: 困难:基本计划中增加的功能已经完成,但是在"如何保存每次游戏的分数,并将其排序列在排行榜中"遇到麻烦,现在小组都在一起协商攻克中.

  8. [Google Guava]学习--新集合类型Multiset

    Guava提供了一个新集合类型Multiset,它可以多次添加相等的元素,且和元素顺序无关.Multiset继承于JDK的Cllection接口,而不是Set接口. Multiset主要方法介绍: a ...

  9. ASP.Net Web Form<一> aspx文件编译及呈现

    对比复习下JSP 1.jsp的本质是Servlet ,会在第一次被访问时会被翻译成一个类文件,从此对这个页面的访问都是由这个类文件执行后进行输出. aspx 本质是IHttpHandler 2.jsp ...

  10. 51nod 1352 扩展欧几里德

    给出N个固定集合{1,N},{2,N-1},{3,N-2},...,{N-1,2},{N,1}.求出有多少个集合满足:第一个元素是A的倍数且第二个元素是B的倍数. 提示: 对于第二组测试数据,集合分别 ...