关于HuffmanCoding的简单分析
1.what's problem we faced?
/**
* Q: what's problem we faced?
*
* A: Data compression is still a problem, even now. we want to compress
* the space of data. This desire is more and more stronger when we
* need to deal with some operation about data transmission. Before
* we start this article, it may be helpful if you try to provide a valid way
* to compress data . I tried, but failed obviously. That why I write this
* article. ^_^
*/
2. How can I solve it?
/**
* Q: How can I solve it?
*
* A: Where have problem is where have an answer, although it not always
* the best one. In 1951, a algorithm was introduced by David A. Huffman.
* It is different from the normal code and is a variable length code, which
* have different length of code for different symbol. Now, there are two
* problems:
*
* No.1: is variable length code possible? How can we know the length
* of current symbol?
*
* The answer is prefix code. Think about this, a tree like following:
*
*
* O
* 1 / \ 0
* O O
* 1 / \ 0 c
* O O
* a b
*
* This is a simple binary tree. There are three leaf node: a, b ,and c.we
* label all of left branch as 1, and all of right branch as 0. So if we want
* to arrive the leaf node a, the path is 11. In a similar way, we can get
* all of nodes:
* a : 11
* b : 10
* c : 0
*
* By accident, we get a variable length code.
*
*
* No.2: How can we use variable length code to compress a series of symbol?
*
* Now that we have a ability about variable length code. Some funny thing
* will happen. Image this, In a data, which consist of a series of symbols,
* some of symbols have occur at high proportion. some of symbols has occur
* at low proportion. If we use some shorter code to indicate those symbols
* which have a high proportion, the space of data will smaller than ever.
* That is what we want.
*
* Now, we have been know that we could compress a data by use variable length
* code. However, the next problem is what kind of variable length code is what we
* want. what kind of code is optimal ?
*/
3. What is HuffmanCoding ?
/**
* Q: What is HuffmanCoding ?
*
* A:Now,the problem is how can I create a optimal tree ? Do you have any idea?
* Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may
* be simple, but the result is valid( this will be demonstrated below). The simplest
* construction algorithm use a priority queue where the node with lowest probability
* is given highest priority, the steps as following:
*
* 1. create a leaf node for each symbol, and add it to the priority queue.
* 2. while there is more than one node in the queue:
* 1. remove two nodes that have the highest priority.
* 2. create a new node as the parent node of the two nodes above. the
* probability of this one is equal to the sum of the two nodes' probabilities.
* 3. add the new node to the queue.
* 3. the remaining node is the root of this tree. Read it's code as we do above.
*
*/
4. is it optimal ?
/**
* Q: is it optimal ?
*
* A: Hard to say. I haven't a valid method to measure this. About this issue, it is necessary to hear
* about other people's advice. I believe there must be some exciting advice. By the way, this article
* is just talk about compress of independent symbol, another important issue is about related symbol.
* That maybe a serious problem.
*
*/
5. source code
/**
* Here is an simple example
*/ #include <stdio.h>
#include <iostream> /**
* In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which
* haven't a valid symbol. we need to label it in our nodes.
*/
enum ELEM_TYPE {
ET_VALID,
ET_INVALID,
ET_MAX,
}; typedef int INDEX; /**
* this is a container, we push all of element to it, and pop element by a priority. It is
* a class template since we don't know the type of data element.
*/
template <class ELEM>
class Container {
public:
Container( int capacity);
~Container( );
/*
* push a element to this container.
*/
bool push( ELEM item);
/*
* pop a element from this container, the smallest one have the most priority.
* Of course, the element must have provide a reload function for operator '<'.
*/
bool pop( ELEM &item ); private:
bool _find_idle( INDEX &num);
bool _set_elem( INDEX num, ELEM &elem);
bool _get_elem( INDEX num, ELEM &elem); ELEM *ele;
ELEM_TYPE *stat;
int cap;
}; template <class ELEM>
Container<ELEM>::Container( int capacity)
{
this->ele = new ELEM[capacity] ;
this->stat = new ELEM_TYPE[capacity]; int i;
for( i=0; i<capacity; i++)
this->stat[i] = ET_INVALID; this->cap = capacity ;
} template <class ELEM>
Container<ELEM>::~Container( )
{
if( this->ele!=NULL )
delete []this->ele; if( this->stat!=NULL )
delete []this->stat; this->cap = 0;
} template <class ELEM>
bool Container<ELEM>::push( ELEM item)
{
INDEX num = -1; if( (!this->_find_idle( num))
||(!this->_set_elem( num, item)))
return false; return true;
} template <class ELEM>
bool Container<ELEM>::pop( ELEM &item )
{
INDEX i = 0;
INDEX Min; /*
* find the first valid element.
*/
while( (this->stat[i]!=ET_VALID)
&&( i<this->cap))
i++; for( Min = i ; i<this->cap; i++)
{
if( ( this->stat[i]==ET_VALID)
&&( this->ele[i]<this->ele[Min]))
{
Min = i;
}
} return this->_get_elem( Min, item);
} template <class ELEM>
bool Container<ELEM>::_find_idle( INDEX &num)
{
INDEX i;
for( i=0; i<this->cap; i++)
{
if( this->stat[i]==ET_INVALID )
{
num = i;
return true;
}
} return false;
} template <class ELEM>
bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem)
{
if( (num>=this->cap)
||(num<0) )
return false; this->stat[num] = ET_VALID;
this->ele[num] = elem; return true;
} template <class ELEM>
bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem)
{
if( (num<0)
||(num>=this->cap))
return false; this->stat[num] = ET_INVALID;
elem = this->ele[num]; return true;
} /**
* define a type of symbol. It will be used to record all information about a symbol.
*/
typedef char SYMINDEX;
typedef int SYMFRE; class Symbol {
public:
/*
* In the Huffman tree, we need to compute the sum of two child symbol.
* For convenience,build a reload function is necessary.
*/
Symbol operator + ( Symbol &s);
SYMINDEX sym;
SYMFRE freq;
}; Symbol Symbol::operator +( Symbol &s)
{
Symbol ret;
ret.sym = '\0';
ret.freq = this->freq + s.freq;
return ret;
} /**
* define a node of binary tree. It will be used to create a Huffman tree.
*/
class HTreeNode {
public:
/*
* In the container, we need compare two nodes. So this node must
* provide a reload function about '<'.
*/
bool operator< ( HTreeNode &n); HTreeNode *lchild;
HTreeNode *rchild;
Symbol sym;
}; bool HTreeNode::operator < ( HTreeNode &n)
{ return this->sym.freq<n.sym.freq? true: false;
} /**
* This is the core structure. It will build a Huffman coding based on our input symbol.
*/
class HuffmanCoding {
public:
HuffmanCoding( );
~HuffmanCoding( );
bool Set( Symbol s[], int num);
bool Work( void); private:
/*
* create a Huffman tree.
*/
bool CreateTree(Symbol s[], int num );
bool DestroyTree( );
/*
* read Huffman coding from a Huffman tree.
*/
bool ReadCoding( );
bool TravelTree( HTreeNode *parent, char *buf, INDEX cur); Symbol *sym ;
int sym_num ;
HTreeNode *root ;
}; HuffmanCoding::HuffmanCoding( )
{
this->sym = NULL;
this->sym_num = 0;
this->root = NULL;
} HuffmanCoding::~HuffmanCoding( )
{
if( this->sym!=NULL)
delete []this->sym; this->sym_num = 0;
this->DestroyTree( );
} /**
* receive data from outside. Actually, this function is not necessary.But for make the
* algorithm looks like more concise,maybe this function is necessary.
*/
bool HuffmanCoding::Set( Symbol s [ ], int num)
{
this->DestroyTree( ); this->sym = new Symbol[num];
for( int i=0; i<num; i++)
this->sym[i] = s[i]; if( NULL!=this->sym)
{
this->sym_num = num;
return true;
}
else
{
this->sym_num = 0;
return false;
}
}
/**
* The core function. In this function, we create a Huffman tree , then read it.
*/
bool HuffmanCoding::Work( void)
{ //Create a Huffman tree
if( !this->CreateTree( this->sym, this->sym_num))
return false;
//read Huffman coding
if( !this->ReadCoding( ))
return false; return true;
} bool HuffmanCoding::CreateTree( Symbol s[], int num)
{
/*
* create a priority tank. It always pop the element of the highest priority in the tank.
*/
Container<HTreeNode> tank(num);
for( int i=0; i<this->sym_num; i++)
{
HTreeNode node;
node.lchild = NULL;
node.rchild = NULL;
node.sym = s[i];
tank.push( node);
}
/*
* always pop two nodes, if fail, that's means there is only one node remain and it
* is the root node of this Huffman tree.
*/
HTreeNode node1;
HTreeNode node2;
while( tank.pop( node1)
&& tank.pop( node2) )
{
HTreeNode parent;
parent.lchild = new HTreeNode;
parent.rchild = new HTreeNode;
*parent.lchild = node1;
*parent.rchild = node2;
parent.sym = node1.sym + node2.sym;
/*
* push new node to the tank.
*/
tank.push( parent);
} this->root = new HTreeNode(node1); return true;
} bool HuffmanCoding::DestroyTree( )
{ return false;
} bool HuffmanCoding::ReadCoding( )
{
char *code;
code = new char[this->sym_num + 1];
/*
* travel the Huffman tree and print the code of all valid symbols.
*/
this->TravelTree( this->root, code, 0); delete []code; return true;
} #define LCHAR '1'
#define RCHAR '0' bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur)
{
buf[cur] = '\0';
if( (parent->lchild==NULL)
&&(parent->rchild==NULL) )
{//end node
printf("[ %c] : %s\n", parent->sym.sym, buf);
} if( parent->lchild!=NULL )
{
buf[cur] = LCHAR;
this->TravelTree( parent->lchild, buf, cur + 1);
} if( parent->rchild!=NULL )
{
buf[cur] = RCHAR;
this->TravelTree( parent->rchild, buf, cur + 1);
} return true;
} static Symbol sArr[ ] = {
{ '0', 0},
{ '1', 1},
{ '2', 2},
{ '3', 3},
{ '4', 4},
{ '5', 5},
{ '6', 6},
{ '7', 7},
{ '8', 8},
{ '9', 9},
}; int main()
{
HuffmanCoding hcoding;
hcoding.Set( sArr, 10);
hcoding.Work( ); return 0;
}
关于HuffmanCoding的简单分析的更多相关文章
- 简单分析JavaScript中的面向对象
初学JavaScript的时候有人会认为JavaScript不是一门面向对象的语言,因为JS是没有类的概念的,但是这并不代表JavaScript没有对象的存在,而且JavaScript也提供了其它的方 ...
- CSipSimple 简单分析
简介 CSipSimple是一款可以在android手机上使用的支持sip的网络电话软件,可以在上面设置使用callda网络电话.连接使用方式最好是使用wifi,或者3g这样上网速度快,打起电话来效果 ...
- C#中异常:“The type initializer to throw an exception(类型初始值设定项引发异常)”的简单分析与解决方法
对于C#中异常:“The type initializer to throw an exception(类型初始值设定项引发异常)”的简单分析,目前本人分析两种情况,如下: 情况一: 借鉴麒麟.NET ...
- 透过byte数组简单分析Java序列化、Kryo、ProtoBuf序列化
序列化在高性能网络编程.分布式系统开发中是举足轻重的之前有用过Java序列化.ProtocolBuffer等,在这篇文章这里中简单分析序列化后的byte数组观察各种序列化的差异与性能,这里主要分析Ja ...
- 简单分析Java的HashMap.entrySet()的实现
关于Java的HashMap.entrySet(),文档是这样描述的:这个方法返回一个Set,这个Set是HashMap的视图,对Map的操作会在Set上反映出来,反过来也是.原文是 Returns ...
- Ffmpeg解析media容器过程/ ffmpeg 源代码简单分析 : av_read_frame()
ffmpeg 源代码简单分析 : av_read_frame() http://blog.csdn.net/leixiaohua1020/article/details/12678577 ffmpeg ...
- FFmpeg的HEVC解码器源码简单分析:解析器(Parser)部分
===================================================== HEVC源码分析文章列表: [解码 -libavcodec HEVC 解码器] FFmpeg ...
- FFmpeg资料来源简单分析:libswscale的sws_getContext()
===================================================== FFmpeg库函数的源代码的分析文章: [骨架] FFmpeg源码结构图 - 解码 FFmp ...
- wp7之换肤原理简单分析
wp7之换肤原理简单分析 纠结很久...感觉勉强过得去啦.还望各位大牛指点江山 百度找到这篇参考文章http://www.cnblogs.com/sonyye/archive/2012/03/12/2 ...
随机推荐
- oracle插入字符串数据时,字符串中有'单引号
使用insert into(field1,field2...) values('val1','val2'...)时,若值中有单引号时会报错. 处理方法:判断一下val1,val2中是否含有单引号,若含 ...
- MySQL性能优化必备25条
1. 为查询缓存优化你的查询 大多数的MySQL服务器都开启了查询缓存.这是提高性最有效的方法之一,而且这是被MySQL的数据库引擎处理的.当有很多相同的查询被执行了多次的时候,这些查询结果会被放到一 ...
- 如何解决数据库中,数字+null=null
如何解决数据库中,数字+null=null 我使用SQLServer,做一个 update 操作,累计一个数.在数据库中,为了方便,数据库中这个字段我设为允许为空,并且设置了默认值为 0 .但是在新增 ...
- sublime text3 =个人插件
1.sublime text3汉化插件安装. ctrl+shift+p → Package Control:Install Package → ChineseLocalization preferen ...
- Spring学习开发之路——使用JavaBean代替EJB
Spring框架是由于软件开发的复杂性而创建的.Spring使用的是基本的JavaBean来完成以前只可能由EJB完成的事情.然而,Spring的用途不仅仅限于服务器端的开发.从简单性.可测试性和松耦 ...
- python_ 学习笔记(基础语法)
python的注释 使用(#)对单行注释 使用('''或者""")多行注释,下面的代码肯定了python的牛逼 print("python是世界上最好的语言吗? ...
- centOS目录结构介绍
Linux / CentOS目录结构 /: 根目录,一般根目录下只存放目录,不要存放文件,/etc./bin./dev./lib./sbin应该和根目录放置在一个分区中 /bin:/usr/bin: ...
- buf.readDoubleBE()
buf.readDoubleBE(offset[, noAssert]) buf.readDoubleLE(offset[, noAssert]) offset {Number} 0 <= of ...
- Python面向对象之私有属性和方法
私有属性与私有方法 应用场景 在实际开发中,对象的某些属性或者方法 可能只希望在对象的内部被使用,而不希望在外部被访问到: 私有属性 就是对象不希望公开的属性: 私有方法 就是对象不希望公开的方法: ...
- The Falling Leaves(建树方法)
uva 699 紫书P159 Each year, fall in the North Central region is accompanied by the brilliant colors of ...