SimdJsonSharp: Parsing gigabytes of JSON per second

C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.

UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:

  1. Fully managed netcoreapp3.0 library (100% port from C to C#)
  2. netstandard2.0 library with native lib (autogenerated bindings for C)

Benchmarks

The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.

1. Parse doubles

Open canada.json and parse all coordinates as System.Double:

|          Method |     fileName |    fileSize |      Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
| SimdJson | canada.json | 2,251.05 Kb | 4,733 ms | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
| JsonNet | canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |

2. Count all tokens

|            Method |           fileName |    fileSize |         Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
| SimdJson | apache_builds.json | 127.28 Kb | 99.28 us | 1.00 |
| Utf8JsonReader | apache_builds.json | 127.28 Kb | 226.42 us | 2.28 |
| JsonNet | apache_builds.json | 127.28 Kb | 461.30 us | 4.64 |
| SpanJsonUtf8 | apache_builds.json | 127.28 Kb | 168.08 us | 1.69 |
| | | | | |
| SimdJson | canada.json | 2,251.05 Kb | 4,494.44 us | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 6,308.01 us | 1.40 |
| JsonNet | canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 6,679.82 us | 1.49 |
| | | | | |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 1,572.78 us | 1.00 |
| Utf8JsonReader | citm_catalog.json | 1,727.20 Kb | 3,786.10 us | 2.41 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 5,903.38 us | 3.75 |
| SpanJsonUtf8 | citm_catalog.json | 1,727.20 Kb | 3,021.13 us | 1.92 |
| | | | | |
| SimdJson | github_events.json | 65.13 Kb | 46.01 us | 1.00 |
| Utf8JsonReader | github_events.json | 65.13 Kb | 113.80 us | 2.47 |
| JsonNet | github_events.json | 65.13 Kb | 214.01 us | 4.65 |
| SpanJsonUtf8 | github_events.json | 65.13 Kb | 89.09 us | 1.94 |
| | | | | |
| SimdJson | gsoc-2018.json | 3,327.83 Kb | 2,209.42 us | 1.00 |
| Utf8JsonReader | gsoc-2018.json | 3,327.83 Kb | 4,010.10 us | 1.82 |
| JsonNet | gsoc-2018.json | 3,327.83 Kb | 6,729.44 us | 3.05 |
| SpanJsonUtf8 | gsoc-2018.json | 3,327.83 Kb | 2,759.59 us | 1.25 |
| | | | | |
| SimdJson | instruments.json | 220.35 Kb | 257.78 us | 1.00 |
| Utf8JsonReader | instruments.json | 220.35 Kb | 594.22 us | 2.31 |
| JsonNet | instruments.json | 220.35 Kb | 980.42 us | 3.80 |
| SpanJsonUtf8 | instruments.json | 220.35 Kb | 409.47 us | 1.59 |
| | | | | |
| SimdJson | truenull.json | 12.00 Kb | 16,032.6 ns | 1.00 |
| Utf8JsonReader | truenull.json | 12.00 Kb | 58,365.2 ns | 3.64 |
| JsonNet | truenull.json | 12.00 Kb | 60,977.3 ns | 3.80 |
| SpanJsonUtf8 | truenull.json | 12.00 Kb | 24,069.2 ns | 1.50 |

3. Json minification:

|                Method |           fileName |    fileSize |         Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
| SimdJsonNoValidation | apache_builds.json | 127.28 Kb | 186.8 us | 1.00 |
| SimdJson | apache_builds.json | 127.28 Kb | 262.5 us | 1.41 |
| JsonNet | apache_builds.json | 127.28 Kb | 1,802.6 us | 9.65 |
| | | | | |
| SimdJsonNoValidation | canada.json | 2,251.05 Kb | 4,130.7 us | 1.00 |
| SimdJson | canada.json | 2,251.05 Kb | 7,940.7 us | 1.92 |
| JsonNet | canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
| | | | | |
| SimdJsonNoValidation | citm_catalog.json | 1,727.20 Kb | 2,346.9 us | 1.00 |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 4,064.0 us | 1.75 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 34,831.0 us | 14.84 |

Usage

The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.

Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).

dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed

The following sample parses a file and iterate numeric tokens

byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
while (iterator.MoveForward())
{
if (iterator.GetTokenType() == JsonTokenType.Number)
Console.WriteLine("integer: " + iterator.GetInteger());
}
}

UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN

As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0

Also it's possible to just validate JSON or minify it (remove whitespaces, etc):

string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);

Requirements

  • AVX2 enabled CPU

SimdJsonSharp:每秒解析千兆字节的JSON的更多相关文章

  1. 千兆以太网TCP协议的FPGA实现

    转自https://blog.csdn.net/zhipao6108/article/details/82386355 千兆以太网TCP协议的FPGA实现 Lzx 2017/4/20 写在前面,这应该 ...

  2. 【转】简谈基于FPGA的千兆以太网

    原文地址: http://blog.chinaaet.com/luhui/p/5100052903 大家好,又到了学习时间了,学习使人快乐.今天我们来简单的聊一聊以太网,以太网在FPGA学习中属于比较 ...

  3. AC6102 开发板千兆以太网UDP传输实验2

    AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...

  4. AC6102 开发板千兆以太网UDP传输实验

    AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...

  5. 最新IP数据库 存储优化 查询性能优化 每秒解析上千万

    高性能IP数据库格式详解 每秒解析1000多万ip  qqzeng-ip-ultimate.dat 3.0版 编码:UTF8     字节序:Little-Endian 返回规范字段(如:亚洲|中国| ...

  6. 千兆网口POE供电

    一.IEEE802.3af与at标准的解析 链接:http://www.winchen.com.cn/ShowNews2.asp?ID=21&ClassID=1 2003 年6 月,IEEE  ...

  7. FPGA千兆网UDP协议实现

    接着上一篇百兆网接口的设计与使用,我们接着来进行FPGA百兆网UDP(User Datagram Protocol)协议的设计. 1)UDP简介 在此,参考博主夜雨翛然的博文“https://www. ...

  8. 【转】基于TMS320C6455的千兆以太网设计

    基于TI公司最新DSP芯片TMS320C6455.设计并实现了以太网通信软硬件接口.采用TMS320C6455片内以太网接口模块EMAC/MDIO,结合片外AR8031 PHY芯片,在嵌入式操作系统D ...

  9. 369-双路千兆网络PCIe收发卡

    双路千兆网络PCIe收发卡 一.产品概述 PCIe网络收发卡要求能支持千兆光口,千兆电口:半高板卡.板卡插于服务器,室温工作. 支持2路千兆光口,千兆电口. FPGA选用型号 XC7A50T-1FGG ...

随机推荐

  1. 2、Hibernate持久化编写

    一.对于hibernate中的PO编写规则: 1. 必须提供一个无参数的public构造方法   2. 所有属性要private ,对外提供public 的get/set方法   3. 在PO类必须提 ...

  2. vs code搭建Django环境

    在网上找了很多博客,看了vs code的官方文档,最终拼凑起来,终于搭建起来了djangode开发虚拟环境(win10下) 一.新建项目文件夹 F:\Python\temp\django_demo(例 ...

  3. 计时 答题 demo

    <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <m ...

  4. Python中的函数参数有冒号 声明后有-> 箭头

    在python3.7 环境下 函数声明时能在参数后加冒号,如图: def f(ham: str, eggs: str = 'eggs') -> str : print("Annotat ...

  5. bower私服部署

    目录 bower私服部署 简介 工具清单 安装 安装nodejs 安装git 安装private-bower 配置private-bower 启动private-bower 开放端口 开机启动/注册为 ...

  6. [20190510]rman备份的疑问8.txt

    [20190510]rman备份的疑问8.txt --//上午测试rman备份多个文件,探究input memory buffer 的问题.--//补充测试5个文件的情况.--//http://blo ...

  7. Mysql—修改用户密码(重置密码)

    1.登录mysql [root@localhost ~]# mysql -uroot -p123456 [root@localhost ~]# mysql -hlocalhost -uroot -p1 ...

  8. 初学树型dp

    树型DP DFS的回溯是树形DP的重点以及核心,当回溯结束后,root的子树已经被遍历完并处理完了.这便是树形DP的最重要的特点 自己认为应该注意的点 好多人都说在更新当前节点时,它的儿子结点都给更新 ...

  9. drf框架知识总结

  10. day51_9_15_Django

    一.pycharm接受网页信息原理. 如何实现在后端接受浏览器的数据,并解析出有用的信息呢? 使用socket编写网络连接,然后通过浏览器访问ip+端口号. import socket def ind ...