SimdJsonSharp:每秒解析千兆字节的JSON
SimdJsonSharp: Parsing gigabytes of JSON per second
C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.
UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:
- Fully managed netcoreapp3.0 library (100% port from C to C#)
- netstandard2.0 library with native lib (autogenerated bindings for C)
Benchmarks
The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.
1. Parse doubles
Open canada.json and parse all coordinates as System.Double:
| Method | fileName | fileSize | Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
| SimdJson | canada.json | 2,251.05 Kb | 4,733 ms | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
| JsonNet | canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |
2. Count all tokens
| Method | fileName | fileSize | Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
| SimdJson | apache_builds.json | 127.28 Kb | 99.28 us | 1.00 |
| Utf8JsonReader | apache_builds.json | 127.28 Kb | 226.42 us | 2.28 |
| JsonNet | apache_builds.json | 127.28 Kb | 461.30 us | 4.64 |
| SpanJsonUtf8 | apache_builds.json | 127.28 Kb | 168.08 us | 1.69 |
| | | | | |
| SimdJson | canada.json | 2,251.05 Kb | 4,494.44 us | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 6,308.01 us | 1.40 |
| JsonNet | canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 6,679.82 us | 1.49 |
| | | | | |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 1,572.78 us | 1.00 |
| Utf8JsonReader | citm_catalog.json | 1,727.20 Kb | 3,786.10 us | 2.41 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 5,903.38 us | 3.75 |
| SpanJsonUtf8 | citm_catalog.json | 1,727.20 Kb | 3,021.13 us | 1.92 |
| | | | | |
| SimdJson | github_events.json | 65.13 Kb | 46.01 us | 1.00 |
| Utf8JsonReader | github_events.json | 65.13 Kb | 113.80 us | 2.47 |
| JsonNet | github_events.json | 65.13 Kb | 214.01 us | 4.65 |
| SpanJsonUtf8 | github_events.json | 65.13 Kb | 89.09 us | 1.94 |
| | | | | |
| SimdJson | gsoc-2018.json | 3,327.83 Kb | 2,209.42 us | 1.00 |
| Utf8JsonReader | gsoc-2018.json | 3,327.83 Kb | 4,010.10 us | 1.82 |
| JsonNet | gsoc-2018.json | 3,327.83 Kb | 6,729.44 us | 3.05 |
| SpanJsonUtf8 | gsoc-2018.json | 3,327.83 Kb | 2,759.59 us | 1.25 |
| | | | | |
| SimdJson | instruments.json | 220.35 Kb | 257.78 us | 1.00 |
| Utf8JsonReader | instruments.json | 220.35 Kb | 594.22 us | 2.31 |
| JsonNet | instruments.json | 220.35 Kb | 980.42 us | 3.80 |
| SpanJsonUtf8 | instruments.json | 220.35 Kb | 409.47 us | 1.59 |
| | | | | |
| SimdJson | truenull.json | 12.00 Kb | 16,032.6 ns | 1.00 |
| Utf8JsonReader | truenull.json | 12.00 Kb | 58,365.2 ns | 3.64 |
| JsonNet | truenull.json | 12.00 Kb | 60,977.3 ns | 3.80 |
| SpanJsonUtf8 | truenull.json | 12.00 Kb | 24,069.2 ns | 1.50 |
3. Json minification:
| Method | fileName | fileSize | Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
| SimdJsonNoValidation | apache_builds.json | 127.28 Kb | 186.8 us | 1.00 |
| SimdJson | apache_builds.json | 127.28 Kb | 262.5 us | 1.41 |
| JsonNet | apache_builds.json | 127.28 Kb | 1,802.6 us | 9.65 |
| | | | | |
| SimdJsonNoValidation | canada.json | 2,251.05 Kb | 4,130.7 us | 1.00 |
| SimdJson | canada.json | 2,251.05 Kb | 7,940.7 us | 1.92 |
| JsonNet | canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
| | | | | |
| SimdJsonNoValidation | citm_catalog.json | 1,727.20 Kb | 2,346.9 us | 1.00 |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 4,064.0 us | 1.75 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 34,831.0 us | 14.84 |
Usage
The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.
Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).
dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed
The following sample parses a file and iterate numeric tokens
byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
while (iterator.MoveForward())
{
if (iterator.GetTokenType() == JsonTokenType.Number)
Console.WriteLine("integer: " + iterator.GetInteger());
}
}
UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN
As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0
Also it's possible to just validate JSON or minify it (remove whitespaces, etc):
string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);
Requirements
- AVX2 enabled CPU
SimdJsonSharp:每秒解析千兆字节的JSON的更多相关文章
- 千兆以太网TCP协议的FPGA实现
转自https://blog.csdn.net/zhipao6108/article/details/82386355 千兆以太网TCP协议的FPGA实现 Lzx 2017/4/20 写在前面,这应该 ...
- 【转】简谈基于FPGA的千兆以太网
原文地址: http://blog.chinaaet.com/luhui/p/5100052903 大家好,又到了学习时间了,学习使人快乐.今天我们来简单的聊一聊以太网,以太网在FPGA学习中属于比较 ...
- AC6102 开发板千兆以太网UDP传输实验2
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- AC6102 开发板千兆以太网UDP传输实验
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- 最新IP数据库 存储优化 查询性能优化 每秒解析上千万
高性能IP数据库格式详解 每秒解析1000多万ip qqzeng-ip-ultimate.dat 3.0版 编码:UTF8 字节序:Little-Endian 返回规范字段(如:亚洲|中国| ...
- 千兆网口POE供电
一.IEEE802.3af与at标准的解析 链接:http://www.winchen.com.cn/ShowNews2.asp?ID=21&ClassID=1 2003 年6 月,IEEE ...
- FPGA千兆网UDP协议实现
接着上一篇百兆网接口的设计与使用,我们接着来进行FPGA百兆网UDP(User Datagram Protocol)协议的设计. 1)UDP简介 在此,参考博主夜雨翛然的博文“https://www. ...
- 【转】基于TMS320C6455的千兆以太网设计
基于TI公司最新DSP芯片TMS320C6455.设计并实现了以太网通信软硬件接口.采用TMS320C6455片内以太网接口模块EMAC/MDIO,结合片外AR8031 PHY芯片,在嵌入式操作系统D ...
- 369-双路千兆网络PCIe收发卡
双路千兆网络PCIe收发卡 一.产品概述 PCIe网络收发卡要求能支持千兆光口,千兆电口:半高板卡.板卡插于服务器,室温工作. 支持2路千兆光口,千兆电口. FPGA选用型号 XC7A50T-1FGG ...
随机推荐
- Python中model转dict
问题 在query出来的行信息object中有一个dict变量,这个变量存储了字典信息 for u in session.query(User).all(): print u.__dict__ 但是这 ...
- GO Map的初步使用
一.集合(Map) 1.1 什么是Map 张三:13910101201 李四:13801010134 map是Go中的内置类型,它将一个值与一个键关联起来.可以使用相应的键检索值. Map 是一种无序 ...
- webpack资源处理
使用上篇已谈过,这篇纯代码!!~~ <!DOCTYPE html> <html lang="en"> <head> <meta chars ...
- JS基础语法---do-while循环 + 总结while循环和do-while循环
1. 总结:while循环和do-while循环 while循环特点:先判断,后循环,有可能一次循环体都不执行 do-while循环特点:先循环,后判断,至少执行一次循环体 对比体会: 1. ...
- Jquery绑定事件及动画效果
Jquery绑定事件及动画效果 本文转载于:https://blog.csdn.net/Day_and_Night_2017/article/details/85799522 绑定事件 bind(ty ...
- SQLi-LABS Page-3 (Stacked injections) Less-38-Less-45
Less-38 堆叠注入原理简介堆叠注入简介 Stacked injections: 堆叠注入.从名词的含义就可以看到应该是一堆 sql 语句(多条)一起执行.而在真实的运用中也是这样的, 我们知道在 ...
- [转]国内阿里Maven仓库镜像Maven配置文件Maven仓库速度快
原文地址:http://www.cnblogs.com/ae6623/p/4416256.html 国内连接maven官方的仓库更新依赖库,网速一般很慢,收集一些国内快速的maven仓库镜像以备用. ...
- MySQL 主从复制(实时热备)原理与配置
MySQL是现在普遍使用的数据库,但是如果宕机了必然会造成数据丢失.为了保证MySQL数据库的可靠性,就要会一些提高可靠性的技术.MySQL主从复制可以做到实时热备数据.本文介绍MySQL主从复制原理 ...
- 组装数据- 对象里面是key:value, value里面是数组的形式,如 {key:[aa,bb], key:[cc,dd]}
组合后 对象里面是key:value,value里面是数组的形式{key:[aa,bb], key:[cc,dd]} var chinaGeoCoordMap = { '无锡市': [121.4648 ...
- python3.5.3rc1学习三:文件操作
##全局变量与局部变量x = 6 def printFuc(): y = 8 z =9 print(y + z) print(x) printFuc()#print(y)#常见错误##name = & ...