JavaWeb—Base64编码（转载）

基本概念

Base64这个术语最初是在“MIME内容传输编码规范”中提出的。Base64不是一种加密算法，虽然编码后的字符串看起来有点加密的赶脚。它实际上是一种“二进制到文本”的编码方法，它能够将给定的任意二进制数据转换（映射）为ASCII字符串的形式，以便在只支持文本的环境中也能够顺利地传输二进制数据。例如支持MIME的电子邮件应用，或需要在XML中存储复杂数据（例如图片）时。

要实现Base64，首先需要选取适当的64个字符组成字符集。一条通用的原则是从某种常用字符集中选取64个可打印字符，这样就能避免在传输过程中丢失数据（不可打印字符在传输过程中可能会被当做特殊字符处理，从而导致丢失）。例如，MIME的Base64实现选用了大写字母、小写字母和0~9的数字作为前62个字符。其他实现通常会沿用MIME的这种方式，而仅仅在最后2个字符上有所不同，例如UTF-7编码。

一个例子

下面这段文本：

Man is distinguished, not only by his reason, but by this singular passion from

other animals, which is a lust of the mind, that by a perseverance of delight

in the continued and indefatigable generation of knowledge, exceeds the short

vehemence of any carnal pleasure.

通过MIME Base64进行转换后就成为：

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz

IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg

dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu

dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo

ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

转换方法

以例子开头的“Man”被转换为“TWFu”为例，我们来看看Base64基本的转换过程：

M、a和n的ASCII编码分别为01001101、01100001和01101110，合并后得到一个24位的二进制串010011010110000101101110
按每6位一组将其分为4组：010011、010110、000101、101110
最后按对应关系从字符集中取出4个字符（即T、W、F、u）作为结果（本文后面列出了由MIME定义的字符集）。

Base64的基本思想就是这么简单：它将每3个字节（24位）转换为4个字符。因为6位二进制数可以表示64个不同的数，因此只要确定了字符集（含64个字符），并为其中的每个字符确定一个唯一的编码，就可以通过正向与反向映射将二进制字节转换为Base64编码或反之。

补零处理

通过不断将每3个字节转换为4个Base64字符之后，最后可能会出现以下3种情况之一：

没有字节剩下
还剩下1个字节
还剩下2个字节

1没什么好说的。后面的2和3该如何处理呢？

遇到这种情况，就需要在剩下的字节后面补零，直到其位数能够被6整除（因为Base64是对每6位进行编码的）。假如还剩下1个字节，即8位，那么需要再补4个0使其成为12位，这样就可以分为2组了；如果剩下2个字节，即16位，那么只需要再补2个0（18位）就可以分成3组了。最后再用普通方法做映射即可。

填充

还原时，依次将每4个字符还原成3个字节，最后会出现3种情况之一：

没有字符剩下
还剩下2个字符
还剩下3个字符

这3种情况与上面的3种情况一一对应，只要对补零的过程反过来处理，就可以原样还原了。

我们经常会在Base64编码字符串中看到最后有“=”字符，这就是通过填充生成的。填充就是当出现编码时的情况2和3时，在后面补上“=”字符，使编码后的字符数为4的倍数。

所以我们可以很容易地想到，情况2，即还剩下1个字节时，需要补2个“=”，因为此时最后一个字节编码为2个字符，补上2个“=”正好凑够4个。情况3同理，需要补1个“=”。

填充不是必须的，因为无需填充也可以通过编码后的内容计算出缺失的字节。所以在一些实现中填充是必须的，有些却不是。一种必须使用填充的场合是当需要将多个Base64编码文件合并为一个文件的时候。

实现（示例）

下面是一个Base64字符集，它包含大写字母、小写字母和数字，以及“+”和“/”符号。

编码	字符	编码	字符	编码	字符	编码	字符
0	`A`	16	`Q`	32	`g`	48	`w`
1	`B`	17	`R`	33	`h`	49	`x`
2	`C`	18	`S`	34	`i`	50	`y`
3	`D`	19	`T`	35	`j`	51	`z`
4	`E`	20	`U`	36	`k`	52	`0`
5	`F`	21	`V`	37	`l`	53	`1`
6	`G`	22	`W`	38	`m`	54	`2`
7	`H`	23	`X`	39	`n`	55	`3`
8	`I`	24	`Y`	40	`o`	56	`4`
9	`J`	25	`Z`	41	`p`	57	`5`
10	`K`	26	`a`	42	`q`	58	`6`
11	`L`	27	`b`	43	`r`	59	`7`
12	`M`	28	`c`	44	`s`	60	`8`
13	`N`	29	`d`	45	`t`	61	`9`
14	`O`	30	`e`	46	`u`	62	`+`
15	`P`	31	`f`	47	`v`	63	`/`

利用这个字符集我们可以写一个简单的Base64实现（本文最后附有完整源代码）：

下面这个encode()方法用来将Java字符串转换为字节数组（Base64操作的是字节），然后调用真正的encode()方法完成编码：

public String encode(String inputStr, String charset, boolean padding)

        throws UnsupportedEncodingException {

    String encodeStr = null;

    byte[] bytes = inputStr.getBytes(charset);

    encodeStr = encode(bytes, padding);

    return encodeStr;

}

encode()方法的核心代码是：

for (int i = 0; i < groups; i++) {

    byte_1 = bytes[3*i]   & 0xFF;

    byte_2 = bytes[3*i+1] & 0xFF;

    byte_3 = bytes[3*i+2] & 0xFF;

    group_6bit_1 =  byte_1 >>> 2;

    group_6bit_2 = (byte_1 &  0x03) << 4 | byte_2 >>> 4;

    group_6bit_3 = (byte_2 &  0x0F) << 2 | byte_3 >>> 6;

    group_6bit_4 =  byte_3 &  0x3F;

    sb.append(CHARSET[group_6bit_1])

      .append(CHARSET[group_6bit_2])

      .append(CHARSET[group_6bit_3])

      .append(CHARSET[group_6bit_4]);

}

即将每3个字节转换为4个字符。

当然还需要判断最后是否还有剩余的字节，如果有要单独处理：

if (tail == 1) {

    byte_1 = bytes[bytes.length-1] & 0xFF;

    group_6bit_1 =  byte_1 >>> 2;

    group_6bit_2 = (byte_1 &   0x03) << 4;

    sb.append(CHARSET[group_6bit_1])

      .append(CHARSET[group_6bit_2]);

    if (padding) {

        sb.append('=').append('=');

    }

} else if (tail == 2) {

    byte_1 = bytes[bytes.length-2] & 0xFF;

    byte_2 = bytes[bytes.length-1] & 0xFF;

    group_6bit_1 =  byte_1 >>> 2;

    group_6bit_2 = (byte_1 &   0x03) << 4 | byte_2 >>> 4;

    group_6bit_3 = (byte_2 &   0x0F) << 2;

    sb.append(CHARSET[group_6bit_1])

      .append(CHARSET[group_6bit_2])

      .append(CHARSET[group_6bit_3]);

    if (padding) {

        sb.append('=');

    }

}

decode过程是类似的，具体请自行查阅完整代码。

附：源程序

package base64;

import java.io.UnsupportedEncodingException;

/**

 * This class provides a simple implementation of Base64 encoding and decoding.

 *

 * @author QiaoMingkui

 *

 */

public class Base64 {

    /*

     * charset

     */

    private static final char[] CHARSET = {

        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',

        'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',

        'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',

        'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',

        'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',

        'o', 'p', 'q', 'r', 's', 't', 'u', 'v',

        'w', 'x', 'y', 'z', '0', '1', '2', '3',

        '4', '5', '6', '7', '8', '9', '+', '/'

    };

    /*

     * charset used to decode.

     */

    private static final int[] DECODE_CHARSET = new int[128];

    static {

        for (int i=0; i<64; i++) {

            DECODE_CHARSET[CHARSET[i]] = i;

        }

    }

    /**

     * A convenient method for encoding Java String,

     * it uses encode(byte[], boolean) to encode byte array.

     *

     * @param inputStr a string to be encoded.

     * @param charset charset name ("GBK" for example) that is used to convert inputStr into byte array.

     * @param padding whether using padding characters "="

     * @return encoded string

     * @throws UnsupportedEncodingException if charset is unsupported

     */

    public String encode(String inputStr, String charset, boolean padding)

            throws UnsupportedEncodingException {

        String encodeStr = null;

        byte[] bytes = inputStr.getBytes(charset);

        encodeStr = encode(bytes, padding);

        return encodeStr;

    }

    /**

     * Using Base64 to encode bytes.

     *

     * @param bytes byte array to be encoded

     * @param padding whether using padding characters "="

     * @return encoded string

     */

    public String encode(byte[] bytes, boolean padding) {

        // 4 6-bit groups

        int group_6bit_1,

            group_6bit_2,

            group_6bit_3,

            group_6bit_4;

        // bytes of a group

        int byte_1,

            byte_2,

            byte_3;

        // number of 3-byte groups

        int groups = bytes.length / 3;

        // at last, there might be 0, 1, or 2 byte(s) remained,

        // which needs to be encoded individually.

        int tail = bytes.length % 3;

        StringBuilder sb = new StringBuilder(groups * 4 + 4);

        // handle each 3-byte group

        for (int i = 0; i < groups; i++) {

            byte_1 = bytes[3*i]   & 0xFF;

            byte_2 = bytes[3*i+1] & 0xFF;

            byte_3 = bytes[3*i+2] & 0xFF;

            group_6bit_1 =  byte_1 >>> 2;

            group_6bit_2 = (byte_1 &  0x03) << 4 | byte_2 >>> 4;

            group_6bit_3 = (byte_2 &  0x0F) << 2 | byte_3 >>> 6;

            group_6bit_4 =  byte_3 &  0x3F;

            sb.append(CHARSET[group_6bit_1])

              .append(CHARSET[group_6bit_2])

              .append(CHARSET[group_6bit_3])

              .append(CHARSET[group_6bit_4]);

        }

        // handle last 1 or 2 byte(s)

        if (tail == 1) {

            byte_1 = bytes[bytes.length-1] & 0xFF;

            group_6bit_1 =  byte_1 >>> 2;

            group_6bit_2 = (byte_1 &   0x03) << 4;

            sb.append(CHARSET[group_6bit_1])

              .append(CHARSET[group_6bit_2]);

            if (padding) {

                sb.append('=').append('=');

            }

        } else if (tail == 2) {

            byte_1 = bytes[bytes.length-2] & 0xFF;

            byte_2 = bytes[bytes.length-1] & 0xFF;

            group_6bit_1 =  byte_1 >>> 2;

            group_6bit_2 = (byte_1 &   0x03) << 4 | byte_2 >>> 4;

            group_6bit_3 = (byte_2 &   0x0F) << 2;

            sb.append(CHARSET[group_6bit_1])

              .append(CHARSET[group_6bit_2])

              .append(CHARSET[group_6bit_3]);

            if (padding) {

                sb.append('=');

            }

        }

        return sb.toString();

    }

    /**

     * Decode a Base64 string to bytes (byte array).

     *

     * @param code Base64 string to be decoded

     * @return byte array

     */

    public byte[] decode(String code) {

        char[] chars = code.toCharArray();

        int group_6bit_1,

            group_6bit_2,

            group_6bit_3,

            group_6bit_4;

        int byte_1,

            byte_2,

            byte_3;

        int len = chars.length;

        // ignore last '='s

        if (chars[chars.length - 1] == '=') {

            len--;

        }

        if (chars[chars.length - 2] == '=') {

            len--;

        }

        int groups = len / 4;

        int tail = len % 4;

        // each group of characters (4 characters) will be converted into 3 bytes,

        // and last 2 or 3 characters will be converted into 1 or 2 byte(s).

        byte[] bytes = new byte[groups * 3 + (tail > 0 ? tail - 1 : 0)];

        int byteIdx = 0;

        // decode each group

        for (int i=0; i<groups; i++) {

            group_6bit_1 = DECODE_CHARSET[chars[4*i]];

            group_6bit_2 = DECODE_CHARSET[chars[4*i + 1]];

            group_6bit_3 = DECODE_CHARSET[chars[4*i + 2]];

            group_6bit_4 = DECODE_CHARSET[chars[4*i + 3]];

            byte_1 =  group_6bit_1         << 2 | group_6bit_2 >>> 4;

            byte_2 = (group_6bit_2 & 0x0F) << 4 | group_6bit_3 >>> 2;

            byte_3 = (group_6bit_3 & 0x03) << 6 | group_6bit_4;

            bytes[byteIdx++] = (byte) byte_1;

            bytes[byteIdx++] = (byte) byte_2;

            bytes[byteIdx++] = (byte) byte_3;

        }

        // decode last 2 or 3 characters

        if (tail == 2) {

            group_6bit_1 = DECODE_CHARSET[chars[len - 2]];

            group_6bit_2 = DECODE_CHARSET[chars[len - 1]];

            byte_1 = group_6bit_1 << 2 | group_6bit_2 >>> 4;

            bytes[byteIdx] = (byte) byte_1;

        } else if (tail == 3) {

            group_6bit_1 = DECODE_CHARSET[chars[len - 3]];

            group_6bit_2 = DECODE_CHARSET[chars[len - 2]];

            group_6bit_3 = DECODE_CHARSET[chars[len - 1]];

            byte_1 =  group_6bit_1         << 2 | group_6bit_2 >>> 4;

            byte_2 = (group_6bit_2 & 0x0F) << 4 | group_6bit_3 >>> 2;

            bytes[byteIdx++] = (byte) byte_1;

            bytes[byteIdx]   = (byte) byte_2;

        }

        return bytes;

    }

    /**

     * Test.

     * @param args

     */

    public static void main(String[] args) {

        Base64 base64 = new Base64();

        String str = "Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.";

        System.out.println(str);

        try {

            String encodeStr = base64.encode(str, "GBK", false);

            System.out.println(encodeStr);

            byte[] decodeBytes = base64.decode(encodeStr);

            String decodeStr = new String(decodeBytes, "GBK");

            System.out.println(decodeStr);

        } catch (UnsupportedEncodingException e) {

            e.printStackTrace();

        }

    }

}

工具类

可以使用Java现有的工具类进行编码解码，这样可以简化代码。

        BASE64Encoder encoder = new BASE64Encoder();

        BASE64Decoder decoder = new BASE64Decoder();

        String str1 = encoder.encode("111".getBytes());

        String str2 = encoder.encode("222".getBytes());

        System.out.println(str1);

        System.out.println(new String(decoder.decodeBuffer(str1)));

        System.out.println(str2);

        System.out.println(new String(decoder.decodeBuffer(str2)));

运行结果：

MTEx

MjIy

转载自：

Base64编码简介