static dictionary methods of text compression

　　Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.

　　we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.

Now I plan to describe the first shortly with a routine example.

　　if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.

　　To make it clearer, I prefer to take a simple example to explain the method, as follows.

　　Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is

code	letter
000	a
001	b
010	c
011	d
100	r
101	ab
110	ac
111	ad

　　Then I will code a sequence that is 'abracadabra'.

　　At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does, the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...

　　The output is '101100110111101100000'.

　　The routine written by python is as follows.　　

 def getCodeDict():

     codeDict = {}

     codeDict['a'] = ''

     codeDict['b'] = ''

     codeDict['c'] = ''

     codeDict['d'] = ''

     codeDict['r'] = ''

     codeDict['ab'] = ''

     codeDict['ac'] = ''

     codeDict['ad'] = ''

     return codeDict

 def compress(code):

     print('start to compress')

     result = ''

     codeDict = getCodeDict()

     offset = 2

     unCodedCode = code

     while unCodedCode != '':

         targetCode = unCodedCode[0 : 2]

         if targetCode in codeDict:

             #find a pair of letters, and move two steps

             result = result + codeDict[targetCode]

             offset = 2

         else :

             #not find a pair of letters, and move only one step

             result = result + codeDict[targetCode[0]]

             offset = 1

         unCodedCode = unCodedCode[offset : ]

     print('complete to compress')

     return result  

 if __name__=='__main__':

     signals = 'abracadabra'

     result = compress(signals)

     print(result)

static dictionary methods of text compression的更多相关文章

Effective Java 01 Consider static factory methods instead of constructors
Advantage Unlike constructors, they have names. (BigInteger.probablePrime vs BigInteger(int, int, Ra ...
public static void speckOnWin7(string text)，在win7中读文字
public static void speckOnWin7(string text) { //洪丰写的,转载请注明 try { string lsSource = ""; ...
Effective Java - Item 1: Consider static factory methods instead of constructors
考虑使用静态工厂方法来替代构造方法, 这样的做的好处有四点. 1. 更好的表意有的构造方法实际上有特殊的含义, 使用静态工厂方法能更好的表达出他的意思. 例如 BigInteger(int, int ...
读Effective Java笔记之one：static Factory methods instead of Constructors （静态工厂方与构造器）
获取类的实例的方法有很多种,在这很多种方法中,它们各有优缺,各有特点.这里,只介绍2中方法 1.使用构造方法 public class Person { private String sex; /** ...
Effective Java P2 Item1 Consider static factory methods instead of constructors
获得一个类的实例的传统方法是公共的构造方法,还可以提供一个公共的静态工厂方法(一个返回值为该类实例的简单静态方法), 例如Boolean(boolean 的封装类) public static Boo ...
C#学习笔记-数据的传递（公共变量）以及Dictionary
看的代码越多,写的代码越多,就越是享受这些字符,终于渐渐懂得了那种传闻中的成就感,特别是自己从看不懂然后一步一步学,一个代码一个代码地敲,最后哪怕只是完成了一个小功能,也都是特别自豪的!这种自豪不用告 ...
Dictionary<k,v>键值对的使用
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Dict ...
Convert HTML to Text(转载)
原文地址:http://www.blackbeltcoder.com/Articles/strings/convert-html-to-text Download Source Code Intro ...
C#将URL中的参数转换成字典Dictionary<string, string>
/// <summary> /// 将获取的formData存入字典数组 /// </summary> public static Dictionary<String, ...

随机推荐

日期时间类——Java常用类
时间戳(timestamp):距离特定时间的间隔. 计算机中的时间戳是指距离历元(1970-01-01 00:00:00:000)的时间间隔(ms). 格林尼治时间(GMT):是一个标准时间,用于全球 ...
JavaScript定时器作业
JavaScript定时器作业 <!DOCTYPE html> <html lang="zh-CN"> <head> <meta char ...
(4)Oracle基础--操作表中数据
· 添加数据 <1> INSERT 语句 ① 向表中所有字段添加值语法: INSERT INTO table_name (column1,column2...) VALUES(val ...
php 删除二维数组中某个key值
/** * 根据key删除数组中指定元素 * @param array $arr 数组 * @param string/int $key 键(key) * @return array */ priva ...
无法启动此程序，因为计算机中丢失QtCored4.dll。尝试重新安装该程序以解决此问题。
在创建一个win32控制台应用程序时包含了QtCore中的头文件,并且程序编译成功(至少说明属性配置是正确的),运行此程序会出现弹出如下的一个系统错误: 这样的情况该怎么解决?提示说计算机中丢失了Qt ...
elasticsearch 5.x Delete By Query API（根据条件删除）
之前在 2.X版本里这个Delete By Query功能被去掉了因为官方认为会引发一些错误如需使用需要自己安装插件. bin/plugin install delete-by-query 需 ...
mysql层的内存分配
参考 http://www.cnblogs.com/justfortaste/p/3198406.html http://m.blog.csdn.net/blog/IT_PCode/17007833 ...
jade直接写类似JavaScript语法的东西，不需要写script
我们知道,html做计算都是在JavaScript中完成的,那么不用JavaScript行不行呢,可以直接在jade中一样的编写如: -var a = 3 -var b = 4 div a+b = ...
keepalived安装配置实战心得(实现高可用保证网络服务不间断)
keepalived安装配置实战心得(实现高可用保证网络服务不间断) 一.准备2台虚拟机安装的系统是:centos-release-7-1.1503.el7.centos.2.8.x86_6 ...
Spark开发环境
1. Win7下利用Intellij IDEA构建Spark开发环境前提:Intellij IDEA Community 免费版下载(最新版14.0.1),Scala插件下载(最新版scala-in ...

static dictionary methods of text compression

static dictionary methods of text compression的更多相关文章

随机推荐

热门专题