大数据应用之HBase数据插入性能优化之多线程并行插入测试案例

一、引言：

　　上篇文章提起关于HBase插入性能优化设计到的五个参数，从参数配置的角度给大家提供了一个性能测试环境的实验代码。根据网友的反馈，基于单线程的模式实现的数据插入毕竟有限。通过个人实测，在我的虚拟机环境下，单线程插入数据的值约为4w/s。集群指标是：CPU双核1.83，虚拟机512M内存，集群部署单点模式。本文给出了基于多线程并发模式的，测试代码案例和实测结果，希望能给大家一些启示：

二、源程序：

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.hbase.HBaseConfiguration;

 import java.io.BufferedReader;

 import java.io.File;

 import java.io.FileNotFoundException;

 import java.io.FileReader;

 import java.io.IOException;

 import java.util.ArrayList;

 import java.util.List;

 import java.util.Random;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.hbase.HBaseConfiguration;

 import org.apache.hadoop.hbase.client.HBaseAdmin;

 import org.apache.hadoop.hbase.client.HTable;

 import org.apache.hadoop.hbase.client.HTableInterface;

 import org.apache.hadoop.hbase.client.HTablePool;

 import org.apache.hadoop.hbase.client.Put;

 public class HBaseImportEx {

     static Configuration hbaseConfig = null;

     public static HTablePool pool = null;

     public static String tableName = "T_TEST_1";

     static{

          //conf = HBaseConfiguration.create();

          Configuration HBASE_CONFIG = new Configuration();

          HBASE_CONFIG.set("hbase.master", "192.168.230.133:60000");

          HBASE_CONFIG.set("hbase.zookeeper.quorum", "192.168.230.133");

          HBASE_CONFIG.set("hbase.zookeeper.property.clientPort", "2181");

          hbaseConfig = HBaseConfiguration.create(HBASE_CONFIG);

          pool = new HTablePool(hbaseConfig, 1000);

     }

     /*

      * Insert Test single thread

      * */

     public static void SingleThreadInsert()throws IOException

     {

         System.out.println("---------开始SingleThreadInsert测试----------");

         long start = System.currentTimeMillis();

         //HTableInterface table = null;

         HTable table = null;

         table = (HTable)pool.getTable(tableName);

         table.setAutoFlush(false);

         table.setWriteBufferSize(24*1024*1024);

         //构造测试数据

         List<Put> list = new ArrayList<Put>();

         int count = 10000;

         byte[] buffer = new byte[350];

         Random rand = new Random();

         for(int i=0;i<count;i++)

         {

             Put put = new Put(String.format("row %d",i).getBytes());

             rand.nextBytes(buffer);

             put.add("f1".getBytes(), null, buffer);

             //wal=false

             put.setWriteToWAL(false);

             list.add(put);

             if(i%10000 == 0)

             {

                 table.put(list);

                 list.clear();

                 table.flushCommits();

             }

         }

         long stop = System.currentTimeMillis();

         //System.out.println("WAL="+wal+",autoFlush="+autoFlush+",buffer="+writeBuffer+",count="+count);

         System.out.println("插入数据："+count+"共耗时："+ (stop - start)*1.0/1000+"s");

         System.out.println("---------结束SingleThreadInsert测试----------");

     }

     /*

      * 多线程环境下线程插入函数

      *

      * */

     public static void InsertProcess()throws IOException

     {

         long start = System.currentTimeMillis();

         //HTableInterface table = null;

         HTable table = null;

         table = (HTable)pool.getTable(tableName);

         table.setAutoFlush(false);

         table.setWriteBufferSize(24*1024*1024);

         //构造测试数据

         List<Put> list = new ArrayList<Put>();

         int count = 10000;

         byte[] buffer = new byte[256];

         Random rand = new Random();

         for(int i=0;i<count;i++)

         {

             Put put = new Put(String.format("row %d",i).getBytes());

             rand.nextBytes(buffer);

             put.add("f1".getBytes(), null, buffer);

             //wal=false

             put.setWriteToWAL(false);

             list.add(put);

             if(i%10000 == 0)

             {

                 table.put(list);

                 list.clear();

                 table.flushCommits();

             }

         }

         long stop = System.currentTimeMillis();

         //System.out.println("WAL="+wal+",autoFlush="+autoFlush+",buffer="+writeBuffer+",count="+count);

         System.out.println("线程:"+Thread.currentThread().getId()+"插入数据："+count+"共耗时："+ (stop - start)*1.0/1000+"s");

     }

     /*

      * Mutil thread insert test

      * */

     public static void MultThreadInsert() throws InterruptedException

     {

         System.out.println("---------开始MultThreadInsert测试----------");

         long start = System.currentTimeMillis();

         int threadNumber = 10;

         Thread[] threads=new Thread[threadNumber];

         for(int i=0;i<threads.length;i++)

         {

             threads[i]= new ImportThread();

             threads[i].start();

         }

         for(int j=0;j< threads.length;j++)

         {

              (threads[j]).join();

         }

         long stop = System.currentTimeMillis();

         System.out.println("MultThreadInsert："+threadNumber*10000+"共耗时："+ (stop - start)*1.0/1000+"s");

         System.out.println("---------结束MultThreadInsert测试----------");

     }    

     /**

      * @param args

      */

     public static void main(String[] args)  throws Exception{

         // TODO Auto-generated method stub

         //SingleThreadInsert();

         MultThreadInsert();

     }

     public static class ImportThread extends Thread{

         public void HandleThread()

         {

             //this.TableName = "T_TEST_1";

         }

         //

         public void run(){

             try{

                 InsertProcess();

             }

             catch(IOException e){

                 e.printStackTrace();

             }finally{

                 System.gc();

                 }

             }

         }

 }

三、说明

1.线程数设置需要根据本集群硬件参数，实际测试得出。否则线程过多的情况下，总耗时反而是下降的。

2.单笔提交数对性能的影响非常明显，需要在自己的环境下，找到最理想的数值，这个需要与单条记录的字节数相关。

四、测试结果

---------开始MultThreadInsert测试----------

线程:8插入数据：10000共耗时：1.328s
线程:16插入数据：10000共耗时：1.562s
线程:11插入数据：10000共耗时：1.562s
线程:10插入数据：10000共耗时：1.812s
线程:13插入数据：10000共耗时：2.0s
线程:17插入数据：10000共耗时：2.14s
线程:14插入数据：10000共耗时：2.265s
线程:9插入数据：10000共耗时：2.468s
线程:15插入数据：10000共耗时：2.562s
线程:12插入数据：10000共耗时：2.671s
MultThreadInsert：100000共耗时：2.703s
---------结束MultThreadInsert测试----------

备注：该技术专题讨论正在群Hadoop高级交流群：293503507同步直播中，敬请关注。

大数据应用之HBase数据插入性能优化之多线程并行插入测试案例的更多相关文章

大数据应用之HBase数据插入性能优化实测教程
引言: 大家在使用HBase的过程中,总是面临性能优化的问题,本文从HBase客户端参数设置的角度,研究HBase客户端数据批量插入性能优化的问题.事实胜于雄辩,数据比理论更有说服力,基于此,作者设计 ...
TODOList 多线程交互、RCP、事物控制、数据倾斜、HBase数据同步性
TODOList 多线程交互.RCP.事物控制.数据倾斜.HBase数据同步性 TODO List thread.join()如何互相之间通知? 线程池何时最后运行完成? MemCache性能要优于R ...
【转载】HBase 数据库检索性能优化策略
转自:http://www.ibm.com/developerworks/cn/java/j-lo-HBase/index.html 高性能 HBase 数据库本文首先介绍了 HBase 数据库基本 ...
HBase 数据库检索性能优化策略--转
https://www.ibm.com/developerworks/cn/java/j-lo-HBase/index.html HBase 数据表介绍 HBase 数据库是一个基于分布式的.面向列的 ...
HBase 数据库检索性能优化策略
HBase 数据表介绍 HBase 数据库是一个基于分布式的.面向列的.主要用于非结构化数据存储用途的开源数据库.其设计思路来源于 Google 的非开源数据库"BigTable" ...
MySQL插入性能优化
目录 MySQL插入性能优化代码优化 values 多个一个事务插入字段尽量少,尽量用默认值关闭 unique_checks bulk_insert_buffer_size 配置优化 inno ...
《Spark大数据处理：技术、应用与性能优化》
基本信息作者: 高彦杰丛书名:大数据技术丛书出版社:机械工业出版社 ISBN:9787111483861 上架时间:2014-11-5 出版日期:2014 年11月开本:16开页码:255 ...
《Spark大数据处理：技术、应用与性能优化》【PDF】下载
内容简介 <Spark大数据处理:技术.应用与性能优化>根据最新技术版本,系统.全面.详细讲解Spark的各项功能使用.原理机制.技术细节.应用方法.性能优化,以及BDAS生态系统的相关技 ...
《Spark大数据处理：技术、应用与性能优化》【PDF】
内容简介 <Spark大数据处理:技术.应用与性能优化>根据最新技术版本,系统.全面.详细讲解Spark的各项功能使用.原理机制.技术细节.应用方法.性能优化,以及BDAS生态系统的相关技 ...

随机推荐

Linux WiFi Deauthenticated Reason Codes
Code Reason Explanation 0 Reserved Normal working operation 1 Unspecific Reason We don’t know what’s ...
js通过class获取元素
<!doctype html> <html> <head> <meta charset="utf-8"> <meta char ...
win7下android开发环境搭建(win7 64位)
win7下android开发环境搭建(win7 64位) 一.安装 JDK 下载JDK最新版本,下载地址如下: http://www.oracle.com/technetwork/java/jav ...
去掉“Windows文件保护”
1.在“开始→运行”对话框中键入“gpedit.msc”,打开“本地计算机策略→计算机配置→管理模板→系统”窗口,找到“Windows文件保护”组,在右侧窗格中双击“设置Windows文件保护扫描”项 ...
怎样更新PE内的工具
准备工作:1. UltraISO - 下载:http://yunpan.cn/Q5XuHwG4ydv85 (访问密码:6263) 2. 7-zip - 下载:http://yunpan.c ...
[Linux] diff命令：逐行进行文件比较
1. 比较文件 $ diff file1 file2 2. 比较文件夹 $ diff -urNa dir1 dir2 -u, -U NUM, --unified[=NUM] output NUM (d ...
第32课初探C++标准库
有趣的重载: 实验: 将1左移到cout对象中. 将Test改名为Console,此时我们的本意是想让这个cout代表当前的命令行: cout代表命令行的一个实例,本意是想将1打印到命令行上. 我们在 ...
关于XCode 的agvtool命令行
简介:用agvtool如何来自动更新版本号和bulid version agvtool是一个命令行工具,允许你自动递增到下一个最高的数量或具体的数字这些数字.本文档提供了更新您的构建和版本号码使用 ...
C#反射字符串转为实体类，并做为参数传入泛型方法中使用
工作中有这样一个需求,有N张不同的报表,每张报表对应一个数据源,统计数据采用内存方式,首先在内在里定义了数据源对应实体.统计条件用lamdba表达式式实现,通过工具对单元格进行定义.在实现过程中针对每 ...
[LeetCode&Python] Problem 821. Shortest Distance to a Character
Given a string S and a character C, return an array of integers representing the shortest distance f ...

大数据应用之HBase数据插入性能优化之多线程并行插入测试案例

大数据应用之HBase数据插入性能优化之多线程并行插入测试案例的更多相关文章

随机推荐

热门专题