关于HuffmanCoding的简单分析

1.what's problem we faced?

/**

*    Q: what's problem we faced?

*

*    A: Data compression is still a problem, even now. we want to compress

*        the space of data. This desire is more and more stronger when we

*        need to deal with some operation about data transmission. Before

*        we start this article, it may be helpful if you try to provide a valid way

*        to compress data . I tried, but failed obviously. That why I write this

*        article. ^_^

*/

2. How can I solve it?

/**

*    Q: How can I solve it?

*

*    A: Where have problem is where have an answer, although it not always

*        the best one. In 1951, a algorithm was introduced by David A. Huffman.

*        It is different from the normal code and is a variable length code, which

*        have different length of code for different symbol. Now, there are two

*        problems:

*

*        No.1: is variable length code possible? How can we know the length

*                of current symbol?

*

*                The answer is prefix code. Think about this, a tree like following:

*

*

*                                         O

*                                   1 /     \ 0

*                                    O       O

*                               1 /    \ 0   c

*                                O      O

*                                a       b

*

*                This is a simple binary tree. There are three leaf node: a, b ,and c.we

*                label all of left branch as 1, and all of right branch as 0. So if we want

*                to arrive the leaf node a, the path is 11. In a similar way, we can get

*                all of nodes:

*                        a : 11

*                        b : 10

*                        c : 0

*

*                By accident, we get a variable length code.

*

*

*        No.2: How can we use variable length code to compress a series of symbol?

*

*                Now that we have a ability about variable length code. Some funny thing

*                will happen. Image this, In a data, which consist of a series of symbols,

*                some of symbols have occur at high proportion. some of symbols has occur

*                at low proportion. If we use some shorter code to indicate those symbols

*                which have a high proportion, the space of data will smaller than ever.

*                That is what we want.

*

*        Now, we have been know that we could compress a data by use variable length

*        code. However, the next problem is what kind of variable length code is what we

*        want. what kind of code is optimal ?

*/

3. What is HuffmanCoding ?

/**

* Q: What is HuffmanCoding ?

*

* A:Now,the problem is how can I create a optimal tree ? Do you have any idea?

*        Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may

*        be simple, but the result is valid( this will be demonstrated below). The simplest

*        construction algorithm use a priority queue where the node with lowest probability

*        is given highest priority, the steps as following:

*

*        1. create a leaf node for each symbol, and add it to the priority queue.

*        2. while there is more than one node in the queue:

*            1. remove two nodes that have the highest priority.

*            2. create a new node as the parent node of the two nodes above. the

*                probability of this one is equal to the sum of the two nodes' probabilities.

*            3. add the new node to the queue.

*        3. the remaining node is the root of this tree. Read it's code as we do above.

*

*/

4. is it optimal ?

/**

*    Q: is it optimal ?

*

*    A: Hard to say. I haven't a valid method to measure this. About this issue, it is necessary to hear

*        about other people's advice. I believe there must be some exciting advice. By the way, this article

*        is just talk about compress of independent symbol, another important issue is about related symbol.

*        That maybe a serious problem.

*

*/

5. source code

/**

*    Here is an simple example

*/

#include <stdio.h>

#include <iostream>

/**

*    In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which

*    haven't a valid symbol. we need to label it in our nodes.

*/

enum ELEM_TYPE {

        ET_VALID,

        ET_INVALID,

        ET_MAX,

};

typedef int    INDEX;

/**

*    this is a container, we push all of element to it, and pop element by a priority. It is

*    a class template since we don't know the type of data element.

*/

template <class ELEM>

class Container {

        public:

                Container( int capacity);

                ~Container( );

                /*

            *    push a element to this container.

            */

                bool push( ELEM item);

                /*

            *    pop a element from this container, the smallest one have the most priority.

            *    Of course, the element must have provide a reload function for operator '<'.

            */

                bool pop( ELEM &item );

        private:

                bool _find_idle( INDEX &num);

                bool _set_elem( INDEX num, ELEM &elem);

                bool _get_elem( INDEX num, ELEM &elem);

                ELEM                *ele;

                ELEM_TYPE    *stat;

                int                        cap;

};

template <class ELEM>

Container<ELEM>::Container(  int capacity)

{

        this->ele = new ELEM[capacity] ;

        this->stat = new ELEM_TYPE[capacity];

        int        i;

        for( i=0; i<capacity; i++)

                this->stat[i] = ET_INVALID;

        this->cap = capacity ;

}

template <class ELEM>

Container<ELEM>::~Container(  )

{

        if( this->ele!=NULL )

                delete []this->ele;

        if( this->stat!=NULL )

                delete []this->stat;

        this->cap = 0;

}

template <class ELEM>

bool Container<ELEM>::push( ELEM item)

{

        INDEX        num = -1;

        if( (!this->_find_idle( num))

                ||(!this->_set_elem( num, item)))

                return false;

        return true;

}

template <class ELEM>

bool Container<ELEM>::pop( ELEM &item )

{

        INDEX    i = 0;

        INDEX    Min;

        /*

       *    find the first valid element.

       */

        while( (this->stat[i]!=ET_VALID)

                        &&( i<this->cap))

                            i++;

        for( Min = i ; i<this->cap; i++)

        {

                if(  ( this->stat[i]==ET_VALID)

                      &&( this->ele[i]<this->ele[Min]))

                    {

                            Min = i;

                    }

        }

        return this->_get_elem( Min, item);

}

template <class ELEM>

bool Container<ELEM>::_find_idle( INDEX &num)

{

        INDEX        i;

        for( i=0; i<this->cap; i++)

        {

                if( this->stat[i]==ET_INVALID )

                {

                        num = i;

                        return true;

                }

        }

        return false;

}

template <class ELEM>

bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem)

{

        if( (num>=this->cap)

                ||(num<0) )

                    return false;

        this->stat[num] = ET_VALID;

        this->ele[num] = elem;

        return true;

}

template <class ELEM>

bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem)

{

        if( (num<0)

                ||(num>=this->cap))

                    return false;

        this->stat[num] = ET_INVALID;

        elem =  this->ele[num];

        return true;

}

/**

*    define a type of symbol. It will be used to record all information about a symbol.

*/

typedef char SYMINDEX;

typedef int SYMFRE;

class Symbol {

        public:

                /*

            *    In the Huffman tree, we need to compute the sum of two child symbol.

            *    For convenience,build a reload function is necessary.

            */

                Symbol operator + ( Symbol &s);

                SYMINDEX        sym;

                SYMFRE            freq;

};

Symbol Symbol::operator +( Symbol &s)

{

        Symbol        ret;

        ret.sym = '\0';

        ret.freq = this->freq + s.freq;

        return ret;

}

/**

*    define a node of binary tree. It will be used to create a Huffman tree.

*/

class HTreeNode {

        public:

                /*

            *    In the container, we need compare two nodes. So this node must

            *    provide a reload function about '<'.

            */

                bool operator< ( HTreeNode &n);

                HTreeNode        *lchild;

                HTreeNode        *rchild;

                Symbol                sym;

};

bool HTreeNode::operator < ( HTreeNode &n)

{

        return this->sym.freq<n.sym.freq?

true: false;

}

/**

*    This is the core structure. It will build a Huffman coding based on our input symbol.

*/

class HuffmanCoding {

        public:

                HuffmanCoding( );

                ~HuffmanCoding( );

                bool Set( Symbol s[], int num);

                bool Work( void);

	private:

                /*

            *    create a Huffman tree.

            */

                bool CreateTree(Symbol s[], int num );

                bool DestroyTree( );

                /*

            *    read Huffman coding from a Huffman tree.

            */

                bool ReadCoding( );

                bool TravelTree( HTreeNode *parent, char *buf, INDEX cur);

                Symbol                *sym ;

                int                        sym_num ;

                HTreeNode        *root ;

};

HuffmanCoding::HuffmanCoding( )

{

        this->sym = NULL;

        this->sym_num = 0;

        this->root = NULL;

}

HuffmanCoding::~HuffmanCoding( )

{

        if( this->sym!=NULL)

                delete []this->sym;

        this->sym_num = 0;

        this->DestroyTree( );

}

/**

*    receive data from outside. Actually, this function is not necessary.But for make the

*    algorithm looks like more concise,maybe this function is  necessary.

*/

bool HuffmanCoding::Set( Symbol s [ ], int num)

{

        this->DestroyTree( );

        this->sym = new Symbol[num];

        for( int i=0; i<num; i++)

                this->sym[i] = s[i];

        if( NULL!=this->sym)

        {

                this->sym_num = num;

                return true;

        }

        else

        {

                this->sym_num = 0;

                return false;

        }

}

/**

*    The core function. In this function, we create a Huffman tree , then read it.

*/

bool HuffmanCoding::Work( void)

{

        //Create a Huffman tree

        if( !this->CreateTree( this->sym, this->sym_num))

                return false;

        //read Huffman coding

        if( !this->ReadCoding( ))

                return false;

        return true;

}

bool HuffmanCoding::CreateTree( Symbol s[], int num)

{

         /*

       *    create a priority tank. It always pop the element of the highest priority in the tank.

       */

        Container<HTreeNode>	tank(num);

        for( int i=0; i<this->sym_num; i++)

        {

                HTreeNode        node;

                node.lchild = NULL;

                node.rchild = NULL;

                node.sym = s[i];

                tank.push( node);

        }

        /*

       *    always pop two nodes, if fail, that's means there is only one node remain and it

       *    is the root node of this Huffman tree.

       */

        HTreeNode        node1;

        HTreeNode        node2;

        while(  tank.pop( node1)

                        && tank.pop( node2) )

        {

                HTreeNode        parent;

                parent.lchild = new HTreeNode;

                parent.rchild = new HTreeNode;

                *parent.lchild = node1;

                *parent.rchild = node2;

                parent.sym = node1.sym + node2.sym;

                /*

            *    push new node to the tank.

            */

                tank.push( parent);

        }

        this->root = new HTreeNode(node1);

        return true;

}

bool HuffmanCoding::DestroyTree( )

{

        return false;

}

bool HuffmanCoding::ReadCoding( )

{

        char        *code;

        code = new char[this->sym_num + 1];

        /*

       *    travel the Huffman tree and print the code of all valid symbols.

       */

        this->TravelTree( this->root, code, 0);

        delete []code;

        return true;

}

#define        LCHAR    '1'

#define        RCHAR    '0'

bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur)

{

        buf[cur] = '\0';

        if( (parent->lchild==NULL)

                &&(parent->rchild==NULL) )

                {//end node

                        printf("[ %c] : %s\n", parent->sym.sym, buf);

	        }

        if( parent->lchild!=NULL )

        {

                buf[cur] = LCHAR;

                this->TravelTree( parent->lchild, buf, cur + 1);

        }

        if( parent->rchild!=NULL )

        {

                buf[cur] = RCHAR;

                this->TravelTree( parent->rchild, buf, cur + 1);

        }

        return true;

}

static Symbol	sArr[ ] = {

        { '0', 0},

        { '1', 1},

        { '2', 2},

        { '3', 3},

        { '4', 4},

        { '5', 5},

        { '6', 6},

        { '7', 7},

        { '8', 8},

        { '9', 9},

};

int main()

{

        HuffmanCoding    hcoding;

        hcoding.Set( sArr, 10);

        hcoding.Work( );

        return 0;

}