Java的字节流，字符流和缓冲流对比探究

一、前言
二、字节操作和字符操作
三、两种方式的效率测试
四、字节顺序endian
五、综合对比
六、总结

一、前言

所谓IO，也就是Input/Output。Java程序跟外部进行的数据交换就叫做Java的IO操作。程序中数据的输入输出，被抽象为流，按照相对于程序的流向，可分为输出流和输入流。 按照数据流的格式，可分为字节流和字符流。Java IO流的体系很庞大，功能丰富。

本文主要探讨了Java中字节操作和字符操作的区别。

二、字节操作和字符操作

下图可以表示Java 的IO体系：

类似于C语言中二进制文件和文本文件的区别，字符其实只是一种特殊的二进制字节，是按照一定的编码方式处理之后，按照一定规则来存储信息的数据，字符在计算机中也是由二进制组成的，只不过这种二进制可以按照一种规则解码后，成为人类可以直接阅读的自然语言，而普通的二进制文件只有计算机能直接“阅读”。字节操作和字符操作的区别就在于数据的格式。

在Java中，字节输入输出流有两个抽象基类：

字节输入流：InputStream
字节输出流：OutputStream

字符输入输出流也有两个抽象基类：

字符输入流：Reader
字符输出流：Writer

此外， Java提供了从字节流到字符流的转换流，分别是InputStreamReader和OutputStreamWriter，但没有从字符流到字节流的转换流。实际上：

字符流=字节流+编码表

一次读取一个字节数组的效率明显比一次读取一个字节的效率高，因此Java提供了带缓冲区的字节类，称为字节缓冲区类：BufferedInputStream和BufferedOutputStream，同理还有字符缓冲区类BufferedReader和BufferedWriter。

在使用场景上，无法直接获取文本信息的二进制文件，比如图片，mp3，视频文件等，只能使用字节流。而对于文本信息，则更适合使用字符流。

三、两种方式的效率测试

下面通过编写测试程序来比较两种方式的效率区别：

3.1 测试代码

笔者编写了8个方法来分别测试字节方式/字符方式的输入输出流，带缓冲区的输入输出流。

package com.verygood.island;

import org.junit.BeforeClass;

import org.junit.Test;

import org.junit.platform.commons.annotation.Testable;

import java.io.*;

/**

 * @author <a href="mailto:kobe524348@gmail.com">黄钰朝</a>

 * @description

 * @date 2020-05-27 08:50

 */

@Testable

public class UnitTest {

    public static final String PATH = "C:\\Users\\Misterchaos\\Documents\\Java Develop Workplaces\\" +

            "Github repository\\island\\src\\test\\java\\com\\verygood\\island\\";

    /**

     * 用于输出的对象

     */

    public static byte[] outputbytes = null;

    public static char[] outputchars = null;

    int count = 1;

    /**

     * 用于输入的对象

     */

    public static final File inputFile = new File("C:\\Users\\Misterchaos\\Downloads\\安装包\\TEST.zip");

    @BeforeClass

    public static void before() {

        StringBuilder stringBuilder = new StringBuilder("");

        for (int i = 0; i < 1000000; i++) {

            stringBuilder.append("stringstringstringstringstringstring");

        }

        outputbytes = stringBuilder.toString().getBytes();

        outputchars = stringBuilder.toString().toCharArray();

    }

    @Test

    public void test0() {

        System.out.println("--------------------------------------------------------");

        System.out.println("                      测试输出流                          ");

        System.out.println("--------------------------------------------------------");

    }

    // 字节流

    @Test

    public void test1() {

        try {

            System.out.println("********方式一：字节流输出**********");

            // 新建文件命名

            String name = PATH + "字节流输出文件.txt";

            File file = new File(name);

            // 创建输入输出流对象

            FileOutputStream fos = new FileOutputStream(file);

            // 读写数据

            long s1 = System.currentTimeMillis();// 测试开始，计时

            writeBytes(fos);

            long s2 = System.currentTimeMillis();// 测试结束，计时

            fos.close();

            System.out.println("输出文件耗时：" + (s2 - s1) + "ms");

            System.out.println("文件大小：" + file.length() / 1024 + "KB");

            file.delete();

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字节流

    @Test

    public void test2() {

        try {

            System.out.println("********方式二：字符流输出**********");

            // 新建文件命名

            String name = PATH + "字符流输出文件.txt";

            File file = new File(name);

            // 创建输入输出流对象

            FileWriter fileWriter = new FileWriter(file);

            // 读写数据

            long s1 = System.currentTimeMillis();// 测试开始，计时

            writeChars(fileWriter);

            long s2 = System.currentTimeMillis();// 测试结束，计时

            fileWriter.close();

            System.out.println("输出文件耗时：" + (s2 - s1) + "ms");

            System.out.println("文件大小：" + file.length() / 1024 + "KB");

            file.delete();

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字节缓冲流

    @Test

    public void test3() {

        try {

            System.out.println("********方式三：字节缓冲流输出**********");

            // 新建文件命名

            String name = PATH + "字节缓冲流输出文件.txt";

            File file = new File(name);

            // 创建输入输出流对象

            FileOutputStream fileOutputStream = new FileOutputStream(file);

            BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);

            // 读写数据

            long s1 = System.currentTimeMillis();// 测试开始，计时

            writeBytes(bufferedOutputStream);

            long s2 = System.currentTimeMillis();// 测试结束，计时

            bufferedOutputStream.close();

            System.out.println("输出文件耗时：" + (s2 - s1) + "ms");

            System.out.println("文件大小：" + file.length() / 1024 + "KB");

            file.delete();

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字符缓冲流

    @Test

    public void test4() {

        try {

            System.out.println("********方式四：字符缓冲流输出**********");

            // 新建文件命名

            String name = PATH + "字符缓冲流输出文件.txt";

            File file = new File(name);

            // 创建输入输出流对象

            BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(file));

            // 读写数据

            long s1 = System.currentTimeMillis();// 测试开始，计时

            for (int i = 0; i < count; i++) {

                bufferedWriter.write(outputchars);

            }

            long s2 = System.currentTimeMillis();// 测试结束，计时

            bufferedWriter.close();

            System.out.println("输出文件耗时：" + (s2 - s1) + "ms");

            System.out.println("文件大小：" + file.length() / 1024 + "KB");

            file.delete();

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    @Test

    public void test5() {

        System.out.println("--------------------------------------------------------");

        System.out.println("                      测试输入流                          ");

        System.out.println("--------------------------------------------------------");

    }

    // 字节流

    @Test

    public void test6() {

        try {

            System.out.println("********方式一：字节流输入**********");

            // 新建文件命名

            // 创建输入输出流对象

            long s1 = System.currentTimeMillis();// 测试开始，计时

            FileInputStream fileInputStream = new FileInputStream(inputFile);

            // 读写数据

            // 读写数据

            while (fileInputStream.read() != -1) {

            }

            fileInputStream.close();

            long s2 = System.currentTimeMillis();// 测试结束，计时

            System.out.println("输入文件耗时：" + (s2 - s1) + "ms");

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字节流

    @Test

    public void test7() {

        try {

            System.out.println("********方式二：字符流输入**********");

            // 新建文件命名

            long s1 = System.currentTimeMillis();// 测试开始，计时

            // 创建输入输出流对象

            FileReader fileReader = new FileReader(inputFile);

            // 读写数据

            while (fileReader.read() != -1) {

            }

            fileReader.close();

            long s2 = System.currentTimeMillis();// 测试结束，计时

            System.out.println("输入文件耗时：" + (s2 - s1) + "ms");

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字节缓冲流

    @Test

    public void test8() {

        try {

            System.out.println("********方式三：字节缓冲流输入**********");

            // 新建文件命名

            long s1 = System.currentTimeMillis();// 测试开始，计时

            BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(inputFile));

            // 创建输入输出流对象

            // 读写数据

            while (bufferedInputStream.read() != -1) {

            }

            bufferedInputStream.close();

            long s2 = System.currentTimeMillis();// 测试结束，计时

            System.out.println("输入文件耗时：" + (s2 - s1) + "ms");

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    // 字符缓冲流

    @Test

    public void test9() {

        try {

            System.out.println("********方式四：字符缓冲流输入**********");

            // 新建文件命名

            long s1 = System.currentTimeMillis();// 测试开始，计时

            // 创建输入输出流对象

            BufferedReader bufferedReader = new BufferedReader(new FileReader(inputFile));

            // 读写数据

            while (bufferedReader.read() != -1) {

            }

            bufferedReader.close();

            long s2 = System.currentTimeMillis();// 测试结束，计时

            System.out.println("输入文件耗时：" + (s2 - s1) + "ms");

        } catch (FileNotFoundException e) {

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    /**

     * 字节输出

     */

    private void writeBytes(OutputStream fos) throws IOException {

        for (int i = 0; i < count; i++) {

            for (int j = 0; j < outputbytes.length; j++) {

                fos.write(outputbytes[j]);

            }

        }

    }

    /**

     * 字符输出

     */

    private void writeChars(Writer writer) throws IOException {

        for (int i = 0; i < count; i++) {

            for (int j = 0; j < outputchars.length; j++) {

                writer.write(outputchars[j]);

            }

        }

    }

}

3.2 测试结果

测试结果如下：

--------------------------------------------------------

                      测试输出流

--------------------------------------------------------

********方式一：字节流输出**********

输出文件耗时：153798ms

文件大小：35156KB

********方式二：字符流输出**********

输出文件耗时：5503ms

文件大小：35156KB

********方式三：字节缓冲流输出**********

输出文件耗时：514ms

文件大小：35156KB

********方式四：字符缓冲流输出**********

输出文件耗时：600ms

文件大小：35156KB

--------------------------------------------------------

                      测试输入流

--------------------------------------------------------

********方式一：字节流输入**********

输入文件耗时：3643276ms

********方式二：字符流输入**********

输入文件耗时：93332ms

********方式三：字节缓冲流输入**********

输入文件耗时：4700ms

********方式四：字符缓冲流输入**********

输入文件耗时：51538ms

3.3 结果分析

测试发现，如果输出的对象是整个直接输出到文件，使用带缓冲区的输出流实际效率更低，实际测试得到结果是：带缓冲区的输出流所需时间大约是不带缓冲区输出流的两倍。查看源码可以看到：

 public synchronized void write(byte b[], int off, int len) throws IOException {

        if (len >= buf.length) {

            /* If the request length exceeds the size of the output buffer,

               flush the output buffer and then write the data directly.

               In this way buffered streams will cascade harmlessly. */

            flushBuffer();

            out.write(b, off, len);

            return;

        }

        if (len > buf.length - count) {

            flushBuffer();

        }

        System.arraycopy(b, off, buf, count, len);

        count += len;

 }

其中的注释已经清楚地写出来，如果写入的长度大于缓冲区的大小，则先刷新缓存区，然后直接写入文件。简而言之，就是不使用缓冲区！

因此，笔者重新设计了使用场景，将一次性的输出改为了一个字节一个字节地输出，上面展示的就是改进后的测试结果。从这一次结果来看，带缓冲区的字节输出流有了非常明显的优势，整体的性能提升了将近400倍！

而在FileWriter和FileOutputStream的比较中，发现FileOutputStream的速度明显更慢，查看源码发现：

FileWriter内部调用了StreamEncoder来输出，而StreamEncoder内部维护了一个8192大小的缓冲区。这样就不难解释为什么FileOutputStream使用字节的方式节省了编码开销反而效率更低，原因就在于FileWriter实际是带有缓冲区的，因此FileWriter在使用了BufferedWriter封装之后性能只有2倍的提升也就不足为奇了。

四、字节顺序endian

字节序，或字节顺序（"Endian"、"endianness" 或 "byte-order"），描述了计算机如何组织字节，组成对应的数字。大端字节序（big-endian）：高位字节在前，低位字节在后。小端字节序（little-endian）反之。

笔者使用编写了测试代码来测试C语言中二进制和文本两种方式效率区别，代码如下：

#define _CRT_SECURE_NO_WARNINGS

#include "stdio.h"

#include <stdlib.h>

#include "time.h"

#define CLOCKS_PER_SEC ((clock_t)1000)  

int main()

{

	FILE* fpRead = fopen("C:\\test.txt", "r");

	if (fpRead == NULL)

	{

		printf("文件打开失败");

		return 0;

	}

	clock_t start, finish;

	int a=0;

	start = clock();

	while (!feof(fpRead))

	{

		a = fgetc(fpRead);

	}

	finish = clock();

	double text_duration = (double)(finish - start) / CLOCKS_PER_SEC;

	printf("\n");

	fclose(fpRead);

	fpRead = fopen("C:\\test.txt","rb");

	if (fpRead == NULL)

	{

		printf("文件打开失败");

		return 0;

	}

	start = clock();

	while (!feof(fpRead))

	{

		a = fgetc(fpRead);

	}

	finish = clock();

	double binary_duration = (double)(finish - start) / CLOCKS_PER_SEC;

	printf("\n");

	printf("文本方式耗时：%f seconds\n", text_duration);

	printf("二进制方式耗时：%f seconds\n", binary_duration);

	system("pause");

	return 1;

}

运行结果：

文本方式耗时：3.042000 seconds

二进制方式耗时：2.796000 seconds

可以看到二进制的方式效率比文本方式稍微有所提高。

五、综合对比

根据以上实验，可以总结得出，字节流和字符流具有以下区别：

在同样使用缓冲区的前提下，字节流比字符流的效率稍微高一点。对于频繁操作且每次输入输出的数据量较小时，使用缓冲区可以带来明显的效率提升。
操作对象上，字节流操作的基本单元为字节，字符流操作的基本单元为Unicode码元（字符）。
字节流通常用于处理二进制数据，实际上它可以处理任意类型的数据，但它不支持直接写入或读取Unicode码元。而字符流通常处理文本数据，它支持写入及读取Unicode码元。
从源码可以看出来，字节流默认不使用缓冲区，而字符流内部使用了缓冲区。

六、总结

在这次博客编写过程中，测试字节流和字符流的效率时曾出现非常令人费解的结果，使用BufferWriter和BufferedOutputSteam封装的输出流效率都没有提高反而有所降低，后来查看源码才发现了问题所在。此外，字节流的效率明显低于字符流也令笔者抓狂，最后发现字符流内部维护了缓冲区，问题才迎刃而解。