Async IO

I was recently reading a series on “Write Sequential Non-Blocking IO Code With Fibers in NodeJS” by Venkatesh.

Venki was essentially trying to emphasize that writing non-blocking code in NodeJS (either via callbacks, or using promises) can get hairy really fast. For example, this code demonstrates that aptly:

var express = require('express');

var app = express();

app.get('/users/:fbId', function(req, res) {

  var id = req.params.id;

  var key = 'user:' + id;

  client.get(key, function(err, reply) {

    if (err !== null) {

      res.send(500);

      return;

    }

    if (reply === null) {

      res.send(404);

      return;

    }

    res.send(200, {id: id, name: reply});

  });

});

The exact code is available on GitHub (so is the promises driven version, but I won’t bother inlining it.)

What we actually wanted to write (if it were possible, was):

var express = require('express');

var app = express();

app.get('/users/:fbId', function(req, res) {

  var id = req.params.id;

  var key = 'user:' + id;

  try {

    var reply = client.get(key);

    if (reply === null) {

      res.send(404);

      return;

    }

    res.send(200, {id: id, name: reply});

  }

  catch(err) {

      res.send(500);

  }

});

The magic would happen in line number 9 (above.) Instead of having to provide a cascade of callbacks (what if we wanted to do another lookup after we got the value back from the first), we could just write them serially, one after the other.

Well. Apparently we can!

Fibers

A fiber is a particularly lightweight thread of execution. Like threads, fibers share address space. However, fibers use co-operative multitasking while threads use pre-emptive multitasking. Threads often depend on the kernel’s thread scheduler to preempt a busy thread and resume another thread; fibers yield themselves to run another fiber while executing.

Fibers allow exactly this kind of black magic in NodeJS. It is still callbacks internally, but we are exposed to none of it in our application code. Sure you will end up writing a bunch of wrappers (or have some tool generate them for us), but we would have the sweet sweet pleasure of writing async IO code without having to jump through all the hoops. This is how the wrapper code for redis client looks like:

var Fiber = require('fibers');

var client = require('./redis-client');

exports.get = function(key) {

  var err, reply;

  var fiber = Fiber.current;

  client.get(key, function(_err, _reply) {

    err = _err;

    reply = _reply;

    fiber.run();

  });

  Fiber.yield();

  if (err != null) {

    throw err;

  }

  return reply;

};

(the real code is here in case you are curious)

I liked how the code looked. Having survided a ‘promising’ node.js project, I was definitely curious about this new style. Maybe this can be the saving grace (before generators and yield take over the JS world) for real world server side JavaScript.

Fibers you say

But the code (and the underlying technique which makes it tick) sounded very familiar, and reminded me of a similar technique which is used in Go to allow writing beautiful async IO code. For example, the same function from above in Go:

m.Get("/users/:id", func(db *DB, params martini.Params) (int, []byte) {

  str := params["id"]

  id, err := strconv.Atoi(str)

  if err != nil {

    return http.StatusBadRequest, []byte{}

  }

  u, err := db.LoadUser(id)

  if err != nil {

    return http.StatusNotFound, []byte{}

  }

  return http.StatusOK, encoder.Must(enc.Encode(u))

})

Sure, there is a little more happening in here (Go is statically typed), buts its the exact same thing as the fibers example, without all the manual wrapping. Any call which does IO (like line 8) blocks the currently executing goroutine (just like a fiber, a lightweight thread.) The natural question to ask is, if the goroutine gets blocked, how do other requests get processed? Its quite simple actually. The Go runtime automatically schedules any other goroutine which is ready to run (their IO call is done) on the thread on which the current goroutine was running.

Since goroutines are light weight (stack size is just 4 KB in Go 1.3beta1 compared to the much larger ~2 MB thread stacks), it is not unusual to have hundreds of thousands of goroutines actively running in a single process, all humming along together. The best part, since the threads have to do less context switching (the same physical thread can continue running on the processor core, just the instruction pointer keeps changing as the goroutines shuffle in and out, just as in method calls), we are able to extract a lot more efficiency from the same unit of hardware than otherwise. Otherwise IO calls, which would otherwise cause the thread to block and wait, could cripple the system and bring it down to its knees. Read this article for more context on this.

Performance

A fellow ThoughtWorker asked me, “Does performance matter when choosing a framework?”

I know where he was coming from, and how we shouldn’t make decisions purely based on performance (we would all be doing assembly if that was the case.) While it is true that as a startup (or even in the case of a well established player), building the MVP and getting it to the users is paramount, you really dont want to face the situation where you suddenly have a huge influx of users (say it goes viral) and you are caught between a ROCK (scale horizontally by throwing compute units at the problem) and a HARD PLACE (have to rewrite the solution in a technology more amenable to scaling.) Both of these options are expensive, and can potentially be a deal breaker.

Therefore, provided everything else is more or less equal, choosing the more performant one is never a bad thing.

With this context, I decided to compare the two solutions for their performance, given that they more or less looked the same. I decided to allow the system under test to use as many cores as they wanted, and then hit them with 100 concurrent users, each of which is going full tilk for around 20 seconds (used the awesome wrktool for benchmarking.)

The results:

Golang
Stdlib	134566 (3.81ms)
Gorilla	125092 (4.28ms)
Martini	51330 (9.51ms)

node.js
Stdlib	54510 (7.78ms)
Callbacks*	36107 (10.84ms)
Fibers*	27372 (18.76ms)
Promises*	22665 (17.15ms)

* The Callbacks, Fibers and Promises versions are created using Express. The Stdlib versions use the http support in the corresponding standard libraries.

All the numbers are in req/s as given by wrk (higher is better.) The latency details are in brackets (lower is better.) Clicking the numbers will take you to the corresponding code in the GitHub repo (the README has the detailed numbers.)

The tests were done on an updated Ubuntu 14.04 box with a Intel i7 4770 processor, 16 GB of RAM and a SSD.

As you can see, the fibers method of doing async IO in node.js comes with a perceivable loss in throughput compared to the pure callbacks based approach, but looks relatively better than the promises version for this micro-benchmark.

At the same time, the default way of doing IO in Golang does very well for itself. More than 134,000 req/s with a 3.81 ms 99th percentile latency. All this without having to go through crazy callbacks/promises hoops. How cool is that?

How the tests were run?

Software versions

Go 1.3beta1
node.js 0.10.28
wrk 3.1.0

Command used to run

A more detailed description is available in the README but I will explain a simple version here:

Start the program (by say running ./start_martini.sh)
Run the benchmark (by running ./bench.sh)
Record the result
Rince and repeat 3 times and take the best run

Notes

All cores on the Intel i7 4770 were set to the performance governor
Redis was not tweaked
ulimit was not raised

Summary

This is part 1 in a multipart series looking at how async IO (and programming in general) is done in various languages/platforms. We will be going indepth into one language/platform with the every new article in the series. Future parts will look at Scala, Clojure, Java, C#, Python and Ruby based frameworks and try and present a holistic view of the async world.

But one thing is very clear, async IO is here to stay. Not embrassing it would be foolhardy given the need to stay lean. Hope these articles help you understand gravity of the decision.

While some might argue that what we did in Golang was not really async, as the call was blocking in nature. But the net result achieved, and the reason why Go is still able to provide an awesome throughput despite blocking IO calls, is because the Go runtime essentially does the heavy lifting for you. When one goroutine is busy waiting for the results of a IO call to come back, other goroutines can take their place and not waste CPU cycles. The fact that this mechanism allows us to get away with fewer threads that would be required otherwise, is the icing on top.

Async IO的更多相关文章

一款DMA性能优化记录：异步传输和指定实时信号做async IO
关键词:DMA.sync.async.SIGIO.F_SETSIG. DMA本身用于减轻CPU负担,进行CPU off-load搬运工作. 在DMA驱动内部实现有同步和异步模式,异步模式使用dma_a ...
[Functional Programming] Async IO Functor
We will see a peculiar example of a pure function. This function contained a side-effect, but we dub ...
关于Blocking IO,non-Blokcing IO，async IO的区别和理解
来源:http://shmilyaw-hotmail-com.iteye.com/blog/1896683 概括来说,一个IO操作可以分为两个部分:发出请求.结果完成.如果从发出请求到结果返回,一直B ...
ORACLE数据库异步IO介绍
异步IO概念 Linux 异步 I/O (AIO)是 Linux 内核中提供的一个增强的功能.它是Linux 2.6 版本内核的一个标准特性,当然我们在2.4 版本内核的补丁中也可以找到它.AIO 背 ...
为什么我们要使用Async、Await关键字
前不久,在工作中由于默认(xihuan)使用Async.Await关键字受到了很多质问,所以由此引发这篇博文“为什么我们要用Async/Await关键字”,请听下面分解: Async/Await关键字 ...
[翻译]各个类型的IO - 阻塞, 非阻塞,多路复用和异步
同事推荐,感觉写的不错就试着翻译了下. 原文链接: https://www.rubberducking.com/2018/05/the-various-kinds-of-io-blocking-non ...
《Linux/UNIX系统编程手册》第63章 IO多路复用、信号驱动IO以及epoll
关键词:fasync_helper.kill_async.sigsuspend.sigaction.fcntl.F_SETOWN_EX.F_SETSIG.select().poll().poll_wa ...
linux io的cfq代码理解
内核版本: 3.10内核. CFQ,即Completely Fair Queueing绝对公平调度器,原理是基于时间片的角度去保证公平,其实如果一台设备既有单队列,又有多队列,既有快速的NVME,又有 ...
Haskell语言学习笔记（85）Async
安装 async $ cabal install async async-2.2.1 installed async / wait / concurrently async :: IO a -> ...

随机推荐

Hitachi Content Platform学习
相关资料:https://community.hds.com/groups/developer-network-for-hitachi-content-platform/content?filterI ...
Android基础学习第一篇—Project目录结构
写在前面的话: 1. 最近在自学Android,也是边看书边写一些Demo,由于知识点越来越多,脑子越来越记不清楚,所以打算写成读书笔记,供以后查看,也算是把自己学到所理解的东西写出来,献丑,如有不对 ...
Delphi容器类之---Tlist，TStringlist，THashedStringlist的效率比较
转载自:http://www.ylzx8.cn/windows/delphi/73200.html 本人在做一个测试,服务器是IOCP的,我假定最大链接数是50000个. 测试背景:如果每个链接之间的 ...
Python学习笔记(3)
1.元组元组的定义符号是() ,元素定义与列表完全一致.不同的是元组的内容是不可变的. 2.字典字典里面的内容是无序的. 字典的元素组成形式是 key:value key的定义规则:key是不 ...
解决文件上传插件Uploadify在火狐浏览器下，Session丢失的问题
因为在火狐浏览器下Flash发送的请求不会带有cookie,所以导致后台的session失效. 解决的方法就是手动传递SessionID到后台. $("#fileresultfiles&qu ...
mysql中ip和整数的转换
INET_ATON(expr) 给出一个作为字符串的网络地址的点地址表示,返回一个代表该地址数值的整数.地址可以是4或8比特地址. mysql> SELECT INET_ATON('209.20 ...
相机标定简介与MatLab相机标定工具箱的使用（未涉及原理公式推导）
相机标定一.相机标定的目的确定空间物体表面某点的三维几何位置与其在图像中对应点之间的相互关系,建立摄像机成像的几何模型,这些几何模型参数就是摄像机参数. 二.通用摄像机模型世界坐标系.摄像机坐标 ...
再探JS数组原生方法—没想到你是这样的数组
最近作死又去做了一遍javascript-puzzlers上的44道变态题,这些题号称"JS语言专业八级"的水准,建议可以去试试,这里我不去解析这44道题了, ...
CozyRSS开发记录16-RssContentView显示
CozyRSS开发记录16-RssContentView显示 1.RssContentView的布局和绑定继续参照原型图来写xaml: 然后在RSSContentFrameViewModel里提供绑 ...
Axure 自适应视图
假设B为A的子视图继承: A更新文字内容.交互事件.禁用: 位置.尺寸.样式.交互样式时, B都会继承响应更新变化 B更新文字内容.交互事件.禁用时,A也会更新 B更新位置.尺寸.样式.交互 ...