How to manage concurrency in Django models

The days of desktop systems serving single users are long gone — web applications nowadays are serving millions of users at the same time. With many users comes a wide range of new problems — concurrency problems.

In this article I’m going to present two approaches for managing concurrency in Django models.

 

Photo byDenys Nevozhai


The Problem

To demonstrate common concurrency issues we are going to work on a bank account model:

class Account(models.Model):
    id = models.AutoField(
primary_key=True,
)
user = models.ForeignKey(
User,
)
balance = models.IntegerField(
default=0,
)

To get started we are going to implement a naive deposit and withdrawmethods for an account instance:

def deposit(self, amount):
self.balance += amount
self.save()
def withdraw(self, amount):
if amount > self.balance:
raise errors.InsufficientFunds()
    self.balance -= amount
self.save()

This seems innocent enough and it might even pass unit tests and integration tests on localhost. But, what happens when two users perform actions on the same account at the same time?

  1. User A fetches the account — balance is 100$.
  2. User B fetches the account — balance is 100$.
  3. User B withdraws 30$ — balance is updated to 100$ — 30$ = 70$.
  4. User A deposits 50$ — balance is updated to 100$ + 50$ = 150$.

What happened here?

User B asked to withdraw 30$ and user A deposited 50$ — we expect the balance to be 120$, but we ended up with 150$.

Why did it happen?

At step 4, when user A updated the balance, the amount he had stored in memory was stale (user B had already withdrawn 30$).

To prevent this situation from happening we need to make sure the resource we are working on is not altered while we are working on it.


Pessimistic approach

The pessimistic approach dictates that you should lock the resource exclusively until you are finished with it. If nobody else can acquire a lock on the object while you are working on it, you can be sure the object was not changed.

To acquire a lock on a resource we use a database lock for several reasons:

  1. (relational) databases are very good at managing locks and maintaining consistency.
  2. The database is the lowest level in which data is accessed — acquiring the lock at the lowest level will protect the data from other processesmodifying the data as well. For example, direct updates in the DB, cron jobs, cleanup tasks, etc.
  3. A Django app can run on multiple processes (e.g workers). Maintaining locks at the app level will require a lot of (unnecessary) work.

To lock an object in Django we use select_for_update.

Let’s use the pessimistic approach to implement a safe deposit and withdraw:

@classmethod
def deposit(cls, id, amount):
with transaction.atomic():
account = (
cls.objects
.select_for_update()
.get(id=id)
) account.balance += amount
account.save()
    return account
@classmethod
def withdraw(cls, id, amount):
with transaction.atomic():
account = (
cls.objects
.select_for_update()
.get(id=id)
) if account.balance < amount:
raise errors.InsufficentFunds()
       account.balance -= amount
account.save() return account

What do we have here:

  1. We use select_for_update on our queryset to tell the database to lock the object until the transaction is done.
  2. Locking a row in the database requires a database transaction — we use Django’s decorator transaction.atomic() to scope the transaction.
  3. We use a classmethod instead of an instance method — to acquire the lock we need to tell the database to lock it. To achieve that we need to be the ones fetching the object from the database. When operating on self the object is already fetched and we don’t have any guaranty that it was locked.
  4. All the operations on the account are executed within the database transaction.

Let’s see how the scenario from earlier is prevented with our new implementation:

  1. User A asks to withdraw 30$:
    - User A acquires a lock on the account.
    - Balance is 100$.
  2. User B asks to deposit 50$:
    - Attempt to acquire lock on account fails (locked by user A).
    - User B waits for the lock to release.
  3. User A withdraw 30$ :
    - Balance is 70$.
    - Lock of user A on account is released.
  4. User B acquires a lock on the account.
    - Balance is 70$.
    - New balance is 70$ + 50$ = 120$.
  5. Lock of user B on account is released, balance is 120$.

Bug prevented!

What you need to know about select_for_update:

  • In our scenario user B waited for user A to release the lock. Instead of waiting we can tell Django not to wait for the lock to release and raise a DatabaseError instead. To do that we can set the nowait argument of select_for_update to True, …select_for_update(nowait=True).
  • Select related objects are also locked — When using select_for_update with select_related, the related objects are also locked.
    For example, If we were to select_related the user along with the account, both the user and the account will be locked. If during deposit, for example, someone is trying to update the first name, that update will fail because the user object is locked.
    If you are using PostgreSQL or Oracle this might not be a problem soon thanks to a new feature in the upcoming Django 2.0. In this version, select_for_update has an “of” option to explicitly state which of the tables in the query to lock.

I used the bank account example in the past to demonstrate common patterns we use in Django models. You are welcome to follow up in this article:


Optimistic Approach

Unlike the pessimistic approach, the optimistic approach does not require a lock on the object. The optimistic approach assumes collisions are not very common, and dictates that one should only make sure there were no changes made to the object at the time it is updated.

How can we implement such a thing with Django?

First, we add a column to keep track of changes made to the object:

version = models.IntegerField(
default=0,
)

Then, when we update an object we make sure the version did not change:

def deposit(self, id, amount):
updated = Account.objects.filter(
id=self.id,
version=self.version,
).update(
balance=balance + amount,
version=self.version + 1,
)
   return updated > 0
def withdraw(self, id, amount):       
if self.balance < amount:
raise errors.InsufficentFunds() updated = Account.objects.filter(
id=self.id,
version=self.version,
).update(
balance=balance - amount,
version=self.version + 1,
) return updated > 0

Let’s break it down:

  1. We operate directly on the instance (no classmethod).
  2. We rely on the fact that the version is incremented every time the object is updated.
  3. We update only if the version did not change:
    - If the object was not modified since we fetched it than the object is updated.
    - If it was modified than the query will return zero records and the object will not be updated.
  4. Django returns the number of updated rows. If `updated` is zero it means someone else changed the object from the time we fetched it.

How is optimistic locking work in our scenario:

  1. User A fetch the account — balance is 100$, version is 0.
  2. User B fetch the account — balance is 100$, version is 0.
  3. User B asks to withdraw 30$:
    - Balance is updated to 100$ — 30$ = 70$.
    - Version is incremented to 1.
  4. User A asks to deposit 50$:
    - The calculated balance is 100$ + 50$ = 150$.
    - The account does not exist with version 0 -> nothing is updated.

What you need to know about the optimistic approach:

  • Unlike the pessimistic approach, this approach requires an additional field and a lot of discipline.
    One way to overcome the discipline issue is to abstract this behavior. django-fsm implements optimistic locking using a version field as described above. django-optimistic-lock seem to do the same. We haven’t used any of these packages but we’ve taken some inspiration from them.
  • In an environment with a lot of concurrent updates this approach might be wasteful.
  • This approach does not protect from modifications made to the object outside the app. If you have other tasks that modify the data directly (e.g no through the model) you need to make sure they use the version as well.
  • Using the optimistic approach the function can fail and return false. In this case we will most likely want to retry the operation. Using the pessimistic approach with nowait=False the operation cannot fail — it will wait for the lock to release.

Which one should I use?

Like any great question, the answer is “it depends”:

  • If your object has a lot of concurrent updates you are probably better off with the pessimistic approach.
  • If you have updates happening outside the ORM (for example, directly in the database) the pessimistic approach is safer.
  • If your method has side effects such as remote API calls or OS calls make sure they are safe. Some things to consider — can the remote call take a long time? Is the remote call idempotent (safe to retry)?
 
Like what you read? Give Haki Benita a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.

How to manage concurrency in Django models的更多相关文章

  1. django models的点查询/跨表查询/双下划线查询

    django models 在日常的编程中,我们需要建立数据库模型 而往往会用到表与表之间的关系,这就比单表取数据要复杂一些 在多表之间发生关系的情形下,我们如何利用models提供的API的特性获得 ...

  2. Django - models.py 应用

    Django - models.py 应用 编写 models.py 文件 from django.db import models # Create your models here. class ...

  3. day20 Django Models 操作,多表,多对多

    1 Django models 获取数据的三种方式: 实践: viwes def business(request): v1 = models.Business.objects.all() v2 = ...

  4. django models 数据库操作

    django models 数据库操作 创建模型 实例代码如下 from django.db import models class School(models.Model): pass class ...

  5. Django models 操作高级补充

    Django models 操作高级补充 字段参数补充: 外键 约束取消 ..... ORM中原生SQL写法: raw connection extra

  6. Django models Form model_form 关系及区别

    Django models Form model_form

  7. Django models .all .values .values_list 几种数据查询结果的对比

    Django models .all .values .values_list 几种数据查询结果的对比

  8. django models进行数据库增删查改

    在cmd 上运行 python manage.py shell   引入models的定义 from app.models import  myclass   ##先打这一行    ------这些是 ...

  9. django models数据类型

    Django Models的数据类型 AutoField IntegerField BooleanField true/false CharField maxlength,必填 TextField C ...

随机推荐

  1. node.js实现国标GB28181流媒体点播(即实时预览)服务解决方案

    背景 28181协议全称为GB/T28181<安全防范视频监控联网系统信息传输.交换.控制技术要求>,是由公安部科技信息化局提出,由全国安全防范报警系统标准化技术委员会(SAC/TC100 ...

  2. 记录-移动端网页触摸内容滑动js插件

    需求: 在webapp中需要左右滑动手机,移动主页的轮播图.也可用在引导页(欢迎页)的大图左右滑动 可用: 百度:swiper插件 在项目中导入插件,这里只有部分代码,具体百度swiper <l ...

  3. jar -cmf file1 file2 file3命令

    jar -cmf file1 file2 file3中的参数c.m.f和file1.file2.file3是一一对应的. 也就是说,file1是输出的.jar文件,file2是往META-INF/MA ...

  4. NOI-linux下VIM的个人常用配置

    路径:/etc/vim/vimrc 打开终端:Ctrl+Alt+T 输入:sudo vim或gedit /etc/vim/vimrc (推荐用gedit,更好操作) 以下是我的配置: "我的 ...

  5. apche安装教程

    从Apache官网下载windows安装版的Apache服务器了, 现在分享给大家.   1 进入apache服务器官网http://httpd.apache.org/,这里我们以下载稳定版的 htt ...

  6. Intel Quick Sync Video Encoder 2

    这边博客主要记录在预研quick sync中涉及到的一些性能质量相关的关键参数设置. github: https://github.com/MarkRepo/qsve 1. VPP处理过程伪代码: M ...

  7. zabbix监控入门初步

    1.Zabbix是什么? Zabbix是一个基于Web界面的分布式系统监控的企业级开源软件.可以监视各种系统与设备的参数,保障服务器及设备的安全运营. 2.Zabbix的功能和特性 (1)安装与配置简 ...

  8. Swap file "/etc/.hosts.swp" already exists! [O]pen Read-Only, (E)dit anyway, (R)ecover, (D)elete it,

    非正常关闭vi编辑器时会生成一个.swp文件 非正常关闭vi编辑器时会生成一个.swp文件 关于swp文件 使用vi,经常可以看到swp这个文件,那这个文件是怎么产生的呢,当你打开一个文件,vi就会生 ...

  9. vs2008 发布网站时丢失文件问题

    右键指定的文件->属性, 将生成操作更改成为"内容"就可以了.

  10. Sql Server2008——存储过程编程简单例子

    主要介绍: 存储过程的定义方法及其使用方法. 实例介绍: 1 创建学生表Student create database Stu use Stu go CREATE TABLE Student ( Sn ...