static dictionary methods of text compression
Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.
we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.
Now I plan to describe the first shortly with a routine example.
if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.
To make it clearer, I prefer to take a simple example to explain the method, as follows.
Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is
| code | letter |
| 000 | a |
| 001 | b |
| 010 | c |
| 011 | d |
| 100 | r |
| 101 | ab |
| 110 | ac |
| 111 | ad |
Then I will code a sequence that is 'abracadabra'.
At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does, the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...
The output is '101100110111101100000'.
The routine written by python is as follows.
def getCodeDict():
codeDict = {}
codeDict['a'] = ''
codeDict['b'] = ''
codeDict['c'] = ''
codeDict['d'] = ''
codeDict['r'] = ''
codeDict['ab'] = ''
codeDict['ac'] = ''
codeDict['ad'] = ''
return codeDict def compress(code):
print('start to compress')
result = ''
codeDict = getCodeDict()
offset = 2
unCodedCode = code
while unCodedCode != '':
targetCode = unCodedCode[0 : 2]
if targetCode in codeDict:
#find a pair of letters, and move two steps
result = result + codeDict[targetCode]
offset = 2
else :
#not find a pair of letters, and move only one step
result = result + codeDict[targetCode[0]]
offset = 1
unCodedCode = unCodedCode[offset : ]
print('complete to compress')
return result if __name__=='__main__':
signals = 'abracadabra'
result = compress(signals)
print(result)
static dictionary methods of text compression的更多相关文章
- Effective Java 01 Consider static factory methods instead of constructors
Advantage Unlike constructors, they have names. (BigInteger.probablePrime vs BigInteger(int, int, Ra ...
- public static void speckOnWin7(string text),在win7中读文字
public static void speckOnWin7(string text) { //洪丰写的,转载请注明 try { string lsSource = ""; ...
- Effective Java - Item 1: Consider static factory methods instead of constructors
考虑使用静态工厂方法来替代构造方法, 这样的做的好处有四点. 1. 更好的表意 有的构造方法实际上有特殊的含义, 使用静态工厂方法能更好的表达出他的意思. 例如 BigInteger(int, int ...
- 读Effective Java笔记之one:static Factory methods instead of Constructors (静态工厂方与构造器)
获取类的实例的方法有很多种,在这很多种方法中,它们各有优缺,各有特点.这里,只介绍2中方法 1.使用构造方法 public class Person { private String sex; /** ...
- Effective Java P2 Item1 Consider static factory methods instead of constructors
获得一个类的实例的传统方法是公共的构造方法,还可以提供一个公共的静态工厂方法(一个返回值为该类实例的简单静态方法), 例如Boolean(boolean 的封装类) public static Boo ...
- C#学习笔记-数据的传递(公共变量)以及Dictionary
看的代码越多,写的代码越多,就越是享受这些字符,终于渐渐懂得了那种传闻中的成就感,特别是自己从看不懂然后一步一步学,一个代码一个代码地敲,最后哪怕只是完成了一个小功能,也都是特别自豪的!这种自豪不用告 ...
- Dictionary<k,v>键值对的使用
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Dict ...
- Convert HTML to Text(转载)
原文地址:http://www.blackbeltcoder.com/Articles/strings/convert-html-to-text Download Source Code Intro ...
- C#将URL中的参数转换成字典Dictionary<string, string>
/// <summary> /// 将获取的formData存入字典数组 /// </summary> public static Dictionary<String, ...
随机推荐
- GCD 中使用 dispatch group 进行同步操作
话不多说,先上代码,在分析 Code - (void)viewDidLoad { [super viewDidLoad]; dispatch_group_t group1 = dispatch_gro ...
- iOS 中长按手势回调会被触发过两次
Long-press gestures are continuous. The gesture begins (UIGestureRecognizerStateBegan) when the numb ...
- 安装Nginx并为node.js设置反向代理
最近看了反向代理和正向代理的东西,想到自己的node.js服务器是运行在3333端口的,也没有为他设置反向代理,node.js项目的一些静态文件是完全可以部署在Nginx上,以减少对node.js的请 ...
- D08——C语言基础学PYTHON
C语言基础学习PYTHON——基础学习D08 20180829内容纲要: socket网络编程 1 socket基础概念 2 socketserver 3 socket实现简单的SSH服务器端和 ...
- C++基础知识 基类指针、虚函数、多态性、纯虚函数、虚析构
一.基类指针.派生类指针 父类指针可以new一个子类对象 二.虚函数 有没有一个解决方法,使我们只定义一个对象指针,就可以调用父类,以及各个子类的同名函数? 有解决方案,这个对象指针必须是一个父类类型 ...
- java API连接虚拟机上的hbase
今天用本地的eclipse连接虚拟机上的hbase数据库,代码如下: public static void main(String[] args) throws Exception{ Configur ...
- 【NOIP2018】保卫王国 动态dp
此题场上打了一个正确的$44pts$,接着看错题疯狂$rush$“正确”的$44pts$,后来没$rush$完没将之前的代码$copy$回去,直接变零分了..... 这一题我们显然有一种$O(nm)$ ...
- 【xsy1201】 随机游走 高斯消元
题目大意:你有一个$n*m$的网格(有边界),你从$(1,1)$开始随机游走,求走到$(n,m)$的期望步数. 数据范围:$n≤10$,$m≤1000$. 我们令 $f[i][j]$表示从$(1,1) ...
- 【hdu4609】 3-idiots FFT
题外话:好久没写blog了啊-- 题目传送门 题目大意:给你m条长度为ai的线段,求在其中任选三条出来,能构成三角形的概率.即求在这n条线段中找出三条线段所能拼出的三角形数量除以$\binom{m}{ ...
- 【NOIP2016提高组】 Day2 T3 愤怒的小鸟
题目传送门:https://www.luogu.org/problemnew/show/P2831 说个题外话:NOIP2014也有一道题叫做愤怒的小鸟. 这题自测时算错了eps,导致被卡了精度,从1 ...