static dictionary methods of text compression
Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.
we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.
Now I plan to describe the first shortly with a routine example.
if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.
To make it clearer, I prefer to take a simple example to explain the method, as follows.
Now there is a signal composed by five letters, that is 'a', 'b', 'c', 'd' and 'r'. Then we get a dictionary accroding to our signal knowledge. The dictionary is
| code | letter |
| 000 | a |
| 001 | b |
| 010 | c |
| 011 | d |
| 100 | r |
| 101 | ab |
| 110 | ac |
| 111 | ad |
Then I will code a sequence that is 'abracadabra'.
At first, the coder will read the first of two letters, which are 'ab'. After that, the coder have to find if the pair of letters is in our dictionary. If it does, the coder will return the letters's code and read the next letters. otherwise it will return the first letter's code and read the following letter. In this example, the coder will find the code in the dictionary, and return '101'. Following the step, the coder reads 'ra', but it cann't find the value of our dictionary by key 'ra'. So it have to return the code of 'r' that is '100', and read the letter 'c' following 'a' to compose of a new pair of letters that is 'ac'. The coder return '110'. Then read 'ad', return '110'. ...
The output is '101100110111101100000'.
The routine written by python is as follows.
def getCodeDict():
codeDict = {}
codeDict['a'] = ''
codeDict['b'] = ''
codeDict['c'] = ''
codeDict['d'] = ''
codeDict['r'] = ''
codeDict['ab'] = ''
codeDict['ac'] = ''
codeDict['ad'] = ''
return codeDict def compress(code):
print('start to compress')
result = ''
codeDict = getCodeDict()
offset = 2
unCodedCode = code
while unCodedCode != '':
targetCode = unCodedCode[0 : 2]
if targetCode in codeDict:
#find a pair of letters, and move two steps
result = result + codeDict[targetCode]
offset = 2
else :
#not find a pair of letters, and move only one step
result = result + codeDict[targetCode[0]]
offset = 1
unCodedCode = unCodedCode[offset : ]
print('complete to compress')
return result if __name__=='__main__':
signals = 'abracadabra'
result = compress(signals)
print(result)
static dictionary methods of text compression的更多相关文章
- Effective Java 01 Consider static factory methods instead of constructors
Advantage Unlike constructors, they have names. (BigInteger.probablePrime vs BigInteger(int, int, Ra ...
- public static void speckOnWin7(string text),在win7中读文字
public static void speckOnWin7(string text) { //洪丰写的,转载请注明 try { string lsSource = ""; ...
- Effective Java - Item 1: Consider static factory methods instead of constructors
考虑使用静态工厂方法来替代构造方法, 这样的做的好处有四点. 1. 更好的表意 有的构造方法实际上有特殊的含义, 使用静态工厂方法能更好的表达出他的意思. 例如 BigInteger(int, int ...
- 读Effective Java笔记之one:static Factory methods instead of Constructors (静态工厂方与构造器)
获取类的实例的方法有很多种,在这很多种方法中,它们各有优缺,各有特点.这里,只介绍2中方法 1.使用构造方法 public class Person { private String sex; /** ...
- Effective Java P2 Item1 Consider static factory methods instead of constructors
获得一个类的实例的传统方法是公共的构造方法,还可以提供一个公共的静态工厂方法(一个返回值为该类实例的简单静态方法), 例如Boolean(boolean 的封装类) public static Boo ...
- C#学习笔记-数据的传递(公共变量)以及Dictionary
看的代码越多,写的代码越多,就越是享受这些字符,终于渐渐懂得了那种传闻中的成就感,特别是自己从看不懂然后一步一步学,一个代码一个代码地敲,最后哪怕只是完成了一个小功能,也都是特别自豪的!这种自豪不用告 ...
- Dictionary<k,v>键值对的使用
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Dict ...
- Convert HTML to Text(转载)
原文地址:http://www.blackbeltcoder.com/Articles/strings/convert-html-to-text Download Source Code Intro ...
- C#将URL中的参数转换成字典Dictionary<string, string>
/// <summary> /// 将获取的formData存入字典数组 /// </summary> public static Dictionary<String, ...
随机推荐
- Android 流媒体播放 live streaming
安卓支持的协议 RTSP (RTP, SDP)HTTP/HTTPS progressive streamingDynamic adaptive streaming on HTTP => MPEG ...
- 在centOS 7 中安装 MySQL
知道来源:https://www.cnblogs.com/bigbrotherer/p/7241845.html 1 下载并安装MySQL官方的 Yum Repository [root@localh ...
- PHP 5.6 如何使用 CURL 上传文件
以前我们通过 PHP 的 cURL 上传文件是,是使用“@+文件全路径”的来实现的: curl_setopt(ch, CURLOPT_POSTFIELDS, array( 'file' => ' ...
- 对一致性hash原理的理解
一致性hash算法解决的核心问题是,当solt数发生变化的时候能够尽量少的移动数据.该算法最早在<Consistent Hashing and Random Trees:Distributed ...
- 六:MyBatis学习总结(六)——调用存储过程
一.提出需求 查询得到男性或女性的数量, 如果传入的是0就女性否则是男性 二.准备数据库表和存储过程 create table p_user( id int primary key auto_incr ...
- pythonweb框架Flask学习笔记02-一个简单的小程序
#-*- coding:utf-8 -*- #导入了Flask类 这个类的实例将会是我们的WSGI应用程序 from flask import Flask #创建一个Flask类的实例 第一个参数是应 ...
- eclipse 使用prolog编程
第一步:在电脑上安装swi-prolog 相应环境下载地址http://www.swi-prolog.org/download/stable 第二步: eclipse-help-install new ...
- [题解]luogu P4116 Qtree3
终于来到了Qtree3, 其实这是Qtree系列中最简单的一道题,并不需要线段树, 只要树链剖分的一点思想就吼了. 对于树链剖分剖出来的每一根重链,在重链上维护一个Set就好了, 每一个Set里存的都 ...
- 【CODECHEF】Children Trips 倍增
此题绝了,$O(n^{1.5}\ log\ n)$都可以过掉.... 题目大意:给你一颗$n$个点的树,每条边边权不是2就是$1$,有$m$个询问,每次询问一个人从$x$点走到$y$点,每天可以走的里 ...
- Java 并发编程——volatile与synchronized
一.Java并发基础 多线程的优点 资源利用率更好 程序设计在某些情况下更简单 程序响应更快 这一点可能对于做客户端开发的更加清楚,一般的UI操作都需要开启一个子线程去完成某个任务,否者会容易导致客户 ...