SimdJsonSharp:每秒解析千兆字节的JSON
SimdJsonSharp: Parsing gigabytes of JSON per second
C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.
UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:
- Fully managed netcoreapp3.0 library (100% port from C to C#)
- netstandard2.0 library with native lib (autogenerated bindings for C)
Benchmarks
The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.
1. Parse doubles
Open canada.json and parse all coordinates as System.Double:
| Method | fileName | fileSize | Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
| SimdJson | canada.json | 2,251.05 Kb | 4,733 ms | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
| JsonNet | canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |
2. Count all tokens
| Method | fileName | fileSize | Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
| SimdJson | apache_builds.json | 127.28 Kb | 99.28 us | 1.00 |
| Utf8JsonReader | apache_builds.json | 127.28 Kb | 226.42 us | 2.28 |
| JsonNet | apache_builds.json | 127.28 Kb | 461.30 us | 4.64 |
| SpanJsonUtf8 | apache_builds.json | 127.28 Kb | 168.08 us | 1.69 |
| | | | | |
| SimdJson | canada.json | 2,251.05 Kb | 4,494.44 us | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 6,308.01 us | 1.40 |
| JsonNet | canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 6,679.82 us | 1.49 |
| | | | | |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 1,572.78 us | 1.00 |
| Utf8JsonReader | citm_catalog.json | 1,727.20 Kb | 3,786.10 us | 2.41 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 5,903.38 us | 3.75 |
| SpanJsonUtf8 | citm_catalog.json | 1,727.20 Kb | 3,021.13 us | 1.92 |
| | | | | |
| SimdJson | github_events.json | 65.13 Kb | 46.01 us | 1.00 |
| Utf8JsonReader | github_events.json | 65.13 Kb | 113.80 us | 2.47 |
| JsonNet | github_events.json | 65.13 Kb | 214.01 us | 4.65 |
| SpanJsonUtf8 | github_events.json | 65.13 Kb | 89.09 us | 1.94 |
| | | | | |
| SimdJson | gsoc-2018.json | 3,327.83 Kb | 2,209.42 us | 1.00 |
| Utf8JsonReader | gsoc-2018.json | 3,327.83 Kb | 4,010.10 us | 1.82 |
| JsonNet | gsoc-2018.json | 3,327.83 Kb | 6,729.44 us | 3.05 |
| SpanJsonUtf8 | gsoc-2018.json | 3,327.83 Kb | 2,759.59 us | 1.25 |
| | | | | |
| SimdJson | instruments.json | 220.35 Kb | 257.78 us | 1.00 |
| Utf8JsonReader | instruments.json | 220.35 Kb | 594.22 us | 2.31 |
| JsonNet | instruments.json | 220.35 Kb | 980.42 us | 3.80 |
| SpanJsonUtf8 | instruments.json | 220.35 Kb | 409.47 us | 1.59 |
| | | | | |
| SimdJson | truenull.json | 12.00 Kb | 16,032.6 ns | 1.00 |
| Utf8JsonReader | truenull.json | 12.00 Kb | 58,365.2 ns | 3.64 |
| JsonNet | truenull.json | 12.00 Kb | 60,977.3 ns | 3.80 |
| SpanJsonUtf8 | truenull.json | 12.00 Kb | 24,069.2 ns | 1.50 |
3. Json minification:
| Method | fileName | fileSize | Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
| SimdJsonNoValidation | apache_builds.json | 127.28 Kb | 186.8 us | 1.00 |
| SimdJson | apache_builds.json | 127.28 Kb | 262.5 us | 1.41 |
| JsonNet | apache_builds.json | 127.28 Kb | 1,802.6 us | 9.65 |
| | | | | |
| SimdJsonNoValidation | canada.json | 2,251.05 Kb | 4,130.7 us | 1.00 |
| SimdJson | canada.json | 2,251.05 Kb | 7,940.7 us | 1.92 |
| JsonNet | canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
| | | | | |
| SimdJsonNoValidation | citm_catalog.json | 1,727.20 Kb | 2,346.9 us | 1.00 |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 4,064.0 us | 1.75 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 34,831.0 us | 14.84 |
Usage
The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.
Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).
dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed
The following sample parses a file and iterate numeric tokens
byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
while (iterator.MoveForward())
{
if (iterator.GetTokenType() == JsonTokenType.Number)
Console.WriteLine("integer: " + iterator.GetInteger());
}
}
UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN
As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0
Also it's possible to just validate JSON or minify it (remove whitespaces, etc):
string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);
Requirements
- AVX2 enabled CPU
SimdJsonSharp:每秒解析千兆字节的JSON的更多相关文章
- 千兆以太网TCP协议的FPGA实现
转自https://blog.csdn.net/zhipao6108/article/details/82386355 千兆以太网TCP协议的FPGA实现 Lzx 2017/4/20 写在前面,这应该 ...
- 【转】简谈基于FPGA的千兆以太网
原文地址: http://blog.chinaaet.com/luhui/p/5100052903 大家好,又到了学习时间了,学习使人快乐.今天我们来简单的聊一聊以太网,以太网在FPGA学习中属于比较 ...
- AC6102 开发板千兆以太网UDP传输实验2
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- AC6102 开发板千兆以太网UDP传输实验
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- 最新IP数据库 存储优化 查询性能优化 每秒解析上千万
高性能IP数据库格式详解 每秒解析1000多万ip qqzeng-ip-ultimate.dat 3.0版 编码:UTF8 字节序:Little-Endian 返回规范字段(如:亚洲|中国| ...
- 千兆网口POE供电
一.IEEE802.3af与at标准的解析 链接:http://www.winchen.com.cn/ShowNews2.asp?ID=21&ClassID=1 2003 年6 月,IEEE ...
- FPGA千兆网UDP协议实现
接着上一篇百兆网接口的设计与使用,我们接着来进行FPGA百兆网UDP(User Datagram Protocol)协议的设计. 1)UDP简介 在此,参考博主夜雨翛然的博文“https://www. ...
- 【转】基于TMS320C6455的千兆以太网设计
基于TI公司最新DSP芯片TMS320C6455.设计并实现了以太网通信软硬件接口.采用TMS320C6455片内以太网接口模块EMAC/MDIO,结合片外AR8031 PHY芯片,在嵌入式操作系统D ...
- 369-双路千兆网络PCIe收发卡
双路千兆网络PCIe收发卡 一.产品概述 PCIe网络收发卡要求能支持千兆光口,千兆电口:半高板卡.板卡插于服务器,室温工作. 支持2路千兆光口,千兆电口. FPGA选用型号 XC7A50T-1FGG ...
随机推荐
- gitlab与jenkins结合构建持续集成
Jenkins是java编写,需要安装JDK,这里采用 yum 安装,对版本有需求的,可以到 oracle 官网下载 JDK. yum install -y java-1.8.0-openjdk 一. ...
- vscode wsl git 换行符问题autocrlf
wsl中使用code,由于windows换行符问题git会显示大量文件修改,此时需要在wsl中设置autocrlf设置 git config --global core.autocrlf input ...
- TCP协议的三次握手与四次挥手
1.数据包说明 1)源端口号(16位):它(连同源主机IP地址)标识源主机的一个应用进程. 2)目标端口号(16位):它(连同源主机IP地址)标识目的主机的一个应用进程.这两个值加上IP报头中的源主机 ...
- Json互相序列化对象
using System.ServiceModel; using System.ServiceModel.Web; using System.Runtime.Serialization.Json; u ...
- HTML常用标签一
html文本格式化标签 在网页中,有时需要为文字设置粗体 .斜体 或下划线 效果,这是就需要用到HTML中的文本格式标签,是文字以特殊的方式显示 标签语义:突出重要性,比普通文字更重要 语义 标签 说 ...
- Android 布局渲染流程与卡顿优化
文章内容概要 一.手机界面UI渲染显示流程 二.16ms原则 三.造成卡顿的原因 四.过度绘制介绍.检测工具.如何避免造成过度绘制造成的卡顿 一.手机界面UI渲染显示流程 大家都知道CPU(中央处理器 ...
- 030.[转] sql事务特性
sql事务特性简介 pphh发布于2018年10月5日 Sql事务有原子性.一致性.隔离性.持久性四个基本特性,要实现完全的ACID事务,是以牺牲事务的吞吐性能作为代价的.在有些应用场景中,通过分析业 ...
- FileSizeLimitExceededException
org.apache.tomcat.util.http.fileupload.FileUploadBase$FileSizeLimitExceededException 很明显,这异常的意思是文件大小 ...
- if选择结构
if的语法规则: if(布尔表达式){java语句} //只有一个java语句的时候可以省略大括号不建议这么写但是别人写的代码能看懂 if的写法有4种: if(){} if(){}else{} if( ...
- 数据库(update tab1 set tab1.name=tab1.name+(select t2.name from tab2 t2 where t2.id=tab1.id))
有t1 和 t2 两个表,表中的数据和字段如下: 执行 如下SQL语句: update tab1 set tab1.name=tab1.name+(select t2.name from tab2 t ...