为了保证:Block中,所有的叶子在所有的中间结点的前面.Static: Huffman coding Dynamic: Adaptive Huffman 一些概念 压缩指标 • Compress a 10MB file to 2MB• Compression ratio = 5 or 5:1• Space savings = 0.8 or 80% 对称与非对称 • Symmetric compression 对称压缩 – requires same time for encoding and…
Statistical methods的除了huffman外的另一种常见压缩方式. Huffman coding的非连续数值特性成为了无法达到香农极限的先天无法弥补的缺陷,但Arithmetic coding给出了better solution. 当然,最好的东西往往伴随着各种专利. 2012年之后,貌似可以有一部分可以用了呢. Encoding: 每个字符分配一个Range,size就是其比例(Probability). Algorithm: Set low to 0.0 Set high t…
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习目标:Dirichlet Process, HDP, HDP-HMM, IBP, CRM Alex Kendall Geometry and Uncertainty in Deep Learning for Computer Vision 语义分割 colah's blog Feature Visu…
1 HANA 是基于内存计算的.行列都支持.使用列存储,列存储的特点是高压缩,查询快,节约空间, ---SAP HANA supports both, but is particularly optimized for column-order storage.. SAP HANA employs highly efficient compression methods, such as run-length encoding, cluster coding and dictionary co…
关系:Vocabulary vs. collection size Heaps’ law: M = kTbM is the size of the vocabulary, T is the number of tokens in the collec*on Typical values: 30 ≤ k ≤ 100 and b ≍ 0.5σ log M = log K - b*log T 关系:Vocabulary中每个term的量 vs. 该term的次序 Zipf’s law: cfi =…