Scaling to 1 million active GraphQL subscriptions (live queries)

Hasura is a GraphQL engine on Postgres that provides instant GraphQL APIs with authorization. Read more at hasura.ioand on github.com/hasura/graphql-engine.

Hasura allows 'live queries' for clients (over GraphQL subscriptions). For example, a food ordering app can use a live query to show the live-status of an order for a particular user.

This document describes Hasura's architecture which lets you scale to handle a million active live queries.

TL;DR:

The setup: Each client (a web/mobile app) subscribes to data or a live-result with an auth token. The data is in Postgres. 1 million rows are updated in Postgres every second (ensuring a new result pushed per client). Hasura is the GraphQL API provider (with authorization).

The test: How many concurrent live subscriptions (clients) can Hasura handle? Does Hasura scale vertically and/or horizontally?

single instance configuration No. of active live queries CPU load average
1xCPU, 2GB RAM 5000 60%
2xCPU, 4GB RAM 10000 73%
4xCPU, 8GB RAM 20000 90%

At 1 million live queries, Postgres is under about 28% load with peak number of connections being around 850.

Notes on configuration:

  • AWS RDS postgres, Fargate, ELB were used with their default configurations and without any tuning.
  • RDS Postgres: 16xCPU, 64GB RAM, Postgres 11
  • Hasura running on Fargate (4xCPU, 8GB RAM per instance) with default configurations

GraphQL and subscriptions

GraphQL makes it easy for app developers to query for precisely the data they want from their API.

For example, let’s say we’re building a food delivery app. Here’s what the schema might look like on Postgres:

For an app screen showing a “order status” for the current user for their order, a GraphQL query would fetch the latest order status and the location of the agent delivering the order.

Underneath it, this query is sent as a string to a webserver that parses it, applies authorization rules and makes appropriate calls to things like databases to fetch the data for the app. It sends this data, in the exact shape that was requested, as a JSON.

Enter live-queries: Live queries is the idea of being able to subscribe to the latest result of a particular query. As the underlying data changes, the server should push the latest result to the client.

On the surface this is a perfect fit with GraphQL because GraphQL clients support subscriptions that take care of dealing with messy websocket connections. Converting a query to a live query might look as simple as replacing the word query with subscription for the client. That is, if the GraphQL server can implement it.

Implementing GraphQL live-queries

Implementing live-queries is painful. Once you have a database query that has all the authorization rules, it might be possible to incrementally compute the result of the query as events happen. But this is practically challenging to do at the web-service layer. For databases like Postgres, it is equivalent to the hard problem of keeping a materialized view up to date as underlying tables change. An alternative approach is to refetch all the data for a particular query (with the appropriate authorization rules for the specific client). This is the approach we currently take.

Secondly, building a webserver to handle websockets in a scalable way is also sometimes a little hairy, but certain frameworks and languages do make the concurrent programming required a little more tractable.

Refetching results for a GraphQL query

To understand why refetching a GraphQL query is hard, let’s look at how a GraphQL query is typically processed:

The authorization + data fetching logic has to run for each “node” in the GraphQL query. This is scary, because even a slightly large query fetching a collection could bring down the database quite easily. The N+1 query problem, also common with badly implemented ORMs, is bad for your database and makes it hard to optimally query Postgres. Data loader type patterns can alleviate the problem, but will still query the underlying Postgres database multiple times (reduces to as many nodes in the GraphQL query from as many items in the response).

For live queries, this problem becomes worse, because each client’s query will translate into an independent refetch. Even though the queries are the “same”, since the authorization rules create different session variables, independent fetches are required for each client.

Hasura approach

Is it possible to do better? What if declarative mapping from the data models to the GraphQL schema could be used to create a single SQL query to the database? This would avoid multiple hits to the database, whether there are a large number of items in the response or the number of nodes in the GraphQL query are large.

Idea #1: “Compile” a GraphQL query to a single SQL query

Part of Hasura is a transpiler that uses the metadata of mapping information for the data models to the GraphQL schema to “compile” GraphQL queries to the SQL queries to fetch data from the database.

GraphQL query → GraphQL AST → SQL AST → SQL

This gets rid of the N+1 query problem and allows the database to optimise data-fetching now that it can see the entire query.

But this in itself isn't enough as resolvers also enforce authorization rules by only fetching the data that is allowed. We will therefore need to embed these authorization rules into the generated SQL.

Idea #2: Make authorization declarative

Authorization when it comes to accessing data is essentially a constraint that depends on the values of data (or rows) being fetched combined with application-user specific “session variables” that are provided dynamically. For example, in the most trivial case, a row might container a user_id that denotes the data ownership. Or documents that are viewable by a user might be represented in a related table, document_viewers. In other scenarios the session variable itself might contain the data ownership information pertinent to a row, for eg, an account manager has access to any account [1,2,3…] where that information is not present in the current database but present in the session variable (probably provided by some other data system).

To model this, we implemented an authorization layer similar to Postgres RLS at the API layer to provide a declarative framework to configure access control. If you’re familiar with RLS, the analogy is that the “current session” variables in a SQL query are now HTTP “session-variables” coming from cookies or JWTs or HTTP headers.

Incidentally, because we had started the engineering work behind Hasura many years ago, we ended up implementing Postgres RLS features at the application layer before it landed in Postgres. We even had the same bug in our equivalent of the insert returning clause that Postgres RLS fixed 

一篇来自hasura graphql-engine 百万级别live query 的实践的更多相关文章

  1. epoll的内部实现 & 百万级别句柄监听 & lt和et模式非常好的解释

    epoll是Linux高效网络的基础,比如event poll(例如nodejs),是使用libev,而libev的底层就是epoll(只不过不同的平台可能用epoll,可能用kqueue). epo ...

  2. 百万级别数据Excel导出优化

    前提 这篇文章不是标题党,下文会通过一个仿真例子分析如何优化百万级别数据Excel导出. 笔者负责维护的一个数据查询和数据导出服务是一个相对远古的单点应用,在上一次云迁移之后扩展为双节点部署,但是发现 ...

  3. Facebook兆级别图片存储及每秒百万级别图片查询原理

    前言 Facebook(后面简称fb)是世界最大的社交平台,需要存储的数据时刻都在不断剧增(占比最大为图片,每天存储约20亿张,大概是微信的三倍). 那么问题来了,fb是如何存储兆级别的图片?并且又是 ...

  4. 使用表类型(Table Type-SqlServer)实现百万级别的数据一次性毫秒级别插入

    使用表类型(Table Type)实现百万级别的数据一次性插入 思路 1 创建表类型(TaBleType)         2 创建添加存储过程         3 使用C#语言构建一个DataTab ...

  5. JAVA使用POI如何导出百万级别数据(转)

    https://blog.csdn.net/happyljw/article/details/52809244   用过POI的人都知道,在POI以前的版本中并不支持大数据量的处理,如果数据量过多还会 ...

  6. Hasura GraphQL schema 生成是如何工作的

    不像大部分的graphql 引擎,使用标准的graphql 规范的处理模型,Hasura graphql 不存在resolver 的概念(实际上是有的,只是转换为了sql语法) 以下是Hasura g ...

  7. 通过torodb && hasura graphql 让mongodb 快速支持graphql api

    torodb 可以方便的将mongo 数据实时同步到pg,hasura graphql 可以方便的将pg 数据暴露为graphql api,集成在一起真的很方便 环境准备 docker-compose ...

  8. hasura graphql server 集成gatsby

    hasura graphql server 社区基于gatsby-source-graphql 开发了gatsby-postgres-graphql 插件, 可以快速的开发丰富的网站 基本使用 安装h ...

  9. hasura graphql server event trigger 试用

    hasura graphql server 是一个很不错的graphql 引擎,当前版本已经支持event triiger 了 使用此功能我们可以方便的集成webhook功能,实现灵活,稳定,快捷的消 ...

随机推荐

  1. java注解注意点

    注意:以后工作中代码中 不允许出现警告 自定义注解 1:自定义注解并没有发挥它的作用,而Eclipse自带的注解通过反射另外有一套代码,可以发挥它的作用,例如:跟踪代码...... 2:如果自定义的代 ...

  2. 四种方法获取可执行程序的文件路径(.NET Core / .NET Framework)

    原文:四种方法获取可执行程序的文件路径(.NET Core / .NET Framework) 本文介绍四种不同的获取可执行程序文件路径的方法.适用于 .NET Core 以及 .NET Framew ...

  3. 解决打开IE报错“无法启动...丢失api-ms-win-core-path-l1-1-0.dll”的问题

    打开IE突然发现报错 试了各种方法都不行 最终看这篇文章,才解决:https://www.yijile.com/log/577.html 打开IE设置选项,选择管理加载项,如图讲该选项禁用,就不报错. ...

  4. Java GC的工作原理详解

    JVM学习笔记之JVM内存管理和JVM垃圾回收的概念,JVM内存结构由堆.栈.本地方法栈.方法区等部分组成,另外JVM分别对新生代下载地址  和旧生代采用不同的垃圾回收机制. 首先来看一下JVM内存结 ...

  5. MySQL 统计上一周从周一到周日的用户

    这个功能按理说很常见,奇怪的是很难搜索到一个合适的.稍微整理了下,具体的就不展开了,注意这个表中的时间为毫秒,这条语句拷贝复制就能用.照顾大部分的无脑码农. SELECT case when FROM ...

  6. Java 之 Jedis

    一.客户端 Jedis 1.Jedis Jedis 是一款java操作 redis 数据库的工具. 2.使用步骤 (1)下载 Jedis 的 jar 包 (2)使用: //1. 获取连接 Jedis ...

  7. CCProxy代理

    只要局域网内有一台机器能够上网,其他机器就可以通过这台机器上安装的CCProxy来代理共享上网,最大程度的减少了硬件费用和上网费用.只需要在服务器上CCProxy代理服务器软件里进行帐号设置,就可以方 ...

  8. 使用Prometheus监控Linux系统各项指标

    首先在Linux系统上安装一个探测器node explorer, 下载地址https://prometheus.io/docs/guides/node-exporter/ 这个探测器会定期将linux ...

  9. Flask项目初始化

    数据库实现命令初始化 1.实现命令主脚本 # coding=utf-8 from functools import wraps from getpass import getpass import s ...

  10. 【Python】异常

    捕获异常 try: num = int(input("请输入一个整数:")) result = 8 / num print(result) except ValueError: p ...