HBase应用

太多column family的影响

每个 MemoryStore分配到的内存较少，进而导致过多的合并，影响性能

几个column family比较合适呢

推荐是：1-3个

划分column family的原则：

1、是否具有相似的数据格式

2、是否具有相似的访问类型

例子一：相同的rowkey，有一个很大的text数据需要存储，又有一个picture数据需要存储

对于很大的text数据我们肯定是想让它Compress后再存储

而picture的数据呢，我们并不想让他压缩后存储，因为对于这种二进制的数据压缩并不能节省空间

所以，我们可以将这两个数据分成两个column family来存储

create 'table',{NAME => 't', COMPRESSION => 'SNAPPY'},

{NAME => 'p'}

几个column family比较合适呢

例子二：有一张hbase表，需要存储每个用户的信息(比如名字、年龄等)和这个用户每天访问网站的信息

对于用户的信息，不经常变，而且量少

对于用户每天访问网站的信息是经常变化且数据量很大的

如果将这两种信息放在同一个column family中的话，用户每天访问网站的信息数据的增大导致会出现memory store的flush，然后会导致compaction，因为compaction是column family级别的，所以会将每个用户的信息(比如名字、年龄等)和这个用户每天访问网站的信息都合并到文件中

其实用户的信息不大，且不经常变，没必要每次compaction都要将用户的信息写到磁盘中，导致资源的浪费

所以可以将用户的信息和用户每天访问网站的信息分成两个column family来存储

Table Schema的设计

1、每一个region的大小在10到50G

2、每一个table控制在50-100个regions

3、每一个table控制在1到3个column family

4、每一个column family的命名最好要短，因为column family是会存储在数据文件中的

RowKey的设计一

长度原则：

rowkey的长度一般被建议在10-100个字节，不过建议是越短越好

1、数据持久化文件HFile是按照keyvalue存储的，如果rowkey过长，比如100个字节，1000万列数据光Rowkey就要占用100*1000万=10亿个字节，将近1G数据，这会极大影响HFile的存储效率

2、MemStore将缓存部分数据到内存，如果Rowkey字段过长内存的有效利用率会降低，系统将无法缓存更多的数据，这会降低检索效率。因此Rowkey的字节长度越短越好。

3、目前操作系统是都是64位系统，内存8字节对齐。如果rowkey是8字节的整数倍的话，则利用了操作系统的最佳特性。

RowKey的设计二

特性： rowkey是按照字典顺序进行存储的

相似的rowkey会存储在同一个Region中

比如，我们的rowkey是网站的域名，如下:

www.apache.org

mail.apache.org

jira.apache.org

将域名反转作为rowkey的话更好点，如下:

org.apache.www

org.apache.mail

org.apache.jira

RowKey的设计三

因为rowkey是按照字典顺序存储的，所以如果rowkey没有设计好的话，还会引发：

Hotspotting：大量的请求只发往到一个Region中

解决Hotspotting的三个方法：

1、Salting（（撒盐似的）散布、加盐）

create 'test_salt', 'f',SPLITS => ['b','c','d']

原始的rowkey:

boo0001

boo0002

boo0003

boo0004

boo0005

boo0003

salting rowkey:

a-boo0001

b-boo0002

c-boo0003

d-boo0004

a-boo0005

d-boo0003

import java.util.ArrayList;

import java.util.List;

import java.util.concurrent.atomic.AtomicInteger;

public class KeySalter {

    private AtomicInteger index = new AtomicInteger(0);

    private String[] prefixes = {"a", "b", "c", "d"};

    public String getRowKey(String originalKey) {

        StringBuilder sb = new StringBuilder(prefixes[index.incrementAndGet() % 4]);

        sb.append("-").append(originalKey);

        return sb.toString();

    }

    public List<String> getAllRowKeys(String originalKey) {

        List<String> allKeys = new ArrayList<>();

        for (String prefix : prefixes) {

            StringBuilder sb = new StringBuilder(prefix);

            sb.append("-").append(originalKey);

            allKeys.add(sb.toString());

        }

        //a-boo0001

        //b-boo0001

        //c-boo0001

        //d-boo0001

        return allKeys;

    }

}

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Connection;

import org.apache.hadoop.hbase.client.ConnectionFactory;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Table;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Arrays;

import java.util.List;

public class SaltingTest {

    public static void main(String[] args) throws IOException {

        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);

             Table table = connection.getTable(TableName.valueOf("test_salt"))) {

            KeySalter keySalter = new KeySalter();

            List<String> rowkeys = Arrays.asList("boo0001", "boo0002", "boo0003", "boo0004");

            List<Put> puts = new ArrayList<>();

            for (String key : rowkeys) {

                Put put = new Put(Bytes.toBytes(keySalter.getRowKey(key)));

                put.addColumn(Bytes.toBytes("f"), null, Bytes.toBytes("value" + key));

                puts.add(put);

            }

            table.put(puts);

        }

    }

}

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Arrays;

import java.util.List;

public class SaltingGetter {

    public static void main(String[] args) throws IOException {

        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);

             Table table = connection.getTable(TableName.valueOf("test_salt"))) {

            KeySalter keySalter = new KeySalter();

            List<String> allKeys = keySalter.getAllRowKeys("boo0001");    //读取boo001

            List<Get> gets = new ArrayList<>();

            for (String key : allKeys) {

                Get get = new Get(Bytes.toBytes(key));

                gets.add(get);

            }

            Result[] results = table.get(gets);

            for (Result result : results) {

                if (result != null) {

                    //do something

                }

            }

        }

    }

}

　　RowKey的设计三

2、Hashing

create 'test_hash', 'f', { NUMREGIONS => 4, SPLITALGO => 'HexStringSplit' }

原始的rowkey:

boo0001

boo0002

boo0003

boo0004

md5 hash rowkey:

4b5cdf065e1ada3dbc8fb7a65f6850c4

b31e7da79decd47f0372a59dd6418ba4

d88bf133cf242e30e1b1ae69335d5812

f6f6457b333c93ed1e260dc5e22d8afa

import org.apache.hadoop.hbase.util.MD5Hash;

public class KeyHasher {

    public static String getRowKey(String originalKey) {

        return MD5Hash.getMD5AsHex(originalKey.getBytes());

    }

}

package com.twq.hbase.rowkey.hash;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Connection;

import org.apache.hadoop.hbase.client.ConnectionFactory;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Table;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Arrays;

import java.util.List;

public class HashingTest {

    public static void main(String[] args) throws IOException {

        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);

             Table table = connection.getTable(TableName.valueOf("test_hash"))) {

            List<String> rowkeys = Arrays.asList("boo0001", "boo0002", "boo0003", "boo0004");

            List<Put> puts = new ArrayList<>();

            for (String key : rowkeys) {

                Put put = new Put(Bytes.toBytes(KeyHasher.getRowKey(key)));

                put.addColumn(Bytes.toBytes("f"), null, Bytes.toBytes("value" + key));

                puts.add(put);

            }

            table.put(puts);

        }

    }

}

import com.twq.hbase.rowkey.salt.KeySalter;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

public class HashingGetter {

    public static void main(String[] args) throws IOException {

        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);

             Table table = connection.getTable(TableName.valueOf("test_hash"))) {

            Get get = new Get(Bytes.toBytes(KeyHasher.getRowKey("boo0001")));

            Result results = table.get(get);

            // process result...

            for (Cell cell : results.listCells()) {

                System.out.println(Bytes.toString(CellUtil.cloneRow(cell)) + "===> " +

                        Bytes.toString(CellUtil.cloneFamily(cell)) + ":" +

                        Bytes.toString(CellUtil.cloneQualifier(cell)) + "{" +

                        Bytes.toString(CellUtil.cloneValue(cell)) + "}");

            }

        }

    }

}

RowKey的设计三

3、反转rowkey

create 'test_reverse', 'f',SPLITS => ['0','1','2','3','4','5','6','7','8','9']

时间戳类型的rowkey:

1524536830360

1524536830362

1524536830376

反转rowkey:

0630386354251

2630386354251

6730386354251

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.hbase.filter.*;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class DataFilter {

    public static void main(String[] args) throws IOException {

        Configuration config = HBaseConfiguration.create();

        //Add any necessary configuration files (hbase-site.xml, core-site.xml)

        config.addResource(new Path("src/main/resources/hbase-site.xml"));

        config.addResource(new Path("src/main/resources/core-site.xml"));

        try(Connection connection = ConnectionFactory.createConnection(config)) {

            Table table = connection.getTable(TableName.valueOf("sound"));

            Scan scan = new Scan();

            scan.setStartRow(Bytes.toBytes("00000120120901"));

            scan.setStopRow(Bytes.toBytes("00000120121001"));

            SingleColumnValueFilter nameFilter = new SingleColumnValueFilter(Bytes.toBytes("f"), Bytes.toBytes("n"),

                    CompareFilter.CompareOp.EQUAL, new SubstringComparator("中国好声音"));

            SingleColumnValueFilter categoryFilter = new SingleColumnValueFilter(Bytes.toBytes("f"), Bytes.toBytes("c"),

                    CompareFilter.CompareOp.EQUAL, new SubstringComparator("综艺"));

            FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

            filterList.addFilter(nameFilter);

            filterList.addFilter(categoryFilter);

            scan.setFilter(filterList);

            ResultScanner rs = table.getScanner(scan);

            try {

                for (Result r = rs.next(); r != null; r = rs.next()) {

                    // process result...

                    for (Cell cell : r.listCells()) {

                        System.out.println(Bytes.toString(CellUtil.cloneRow(cell)) + "===> " +

                                Bytes.toString(CellUtil.cloneFamily(cell)) + ":" +

                                Bytes.toString(CellUtil.cloneQualifier(cell)) + "{" +

                                Bytes.toString(CellUtil.cloneValue(cell)) + "}");

                    }

                }

            } finally {

                rs.close();  // always close the ResultScanner!

            }

        }

    }

}

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Connection;

import org.apache.hadoop.hbase.client.ConnectionFactory;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Table;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.util.ArrayList;

import java.util.List;

/**

 * create 'sound',

 */

public class DataPrepare {

    public static void main(String[] args) throws IOException {

        InputStream ins = DataPrepare.class.getClassLoader().getResourceAsStream("sound.txt");

        BufferedReader br = new BufferedReader(new InputStreamReader(ins));

        List<SoundInfo> soundInfos = new ArrayList<>();

        String line = null;

        while ((line = br.readLine()) != null) {

            SoundInfo soundInfo = new SoundInfo();

            String[] arr = line.split("\\|");

            String rowkey = format(arr[4], 6) + arr[1] + format(arr[0], 6);

            soundInfo.setRowkey(rowkey);

            soundInfo.setName(arr[2]);

            soundInfo.setCategory(arr[3]);

            soundInfos.add(soundInfo);

        }

        Configuration config = HBaseConfiguration.create();

        //Add any necessary configuration files (hbase-site.xml, core-site.xml)

        config.addResource(new Path("src/main/resources/hbase-site.xml"));

        config.addResource(new Path("src/main/resources/core-site.xml"));

        try (Connection connection = ConnectionFactory.createConnection(config)) {

            Table table = connection.getTable(TableName.valueOf("sound"));

            List<Put> puts = new ArrayList<>();

            for (SoundInfo soundInfo : soundInfos) {

                Put put = new Put(Bytes.toBytes(soundInfo.getRowkey()));

                put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("n"), Bytes.toBytes(soundInfo.getName()));

                put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("c"), Bytes.toBytes(soundInfo.getCategory()));

                puts.add(put);

            }

            table.put(puts);

        }

    }

    public static String format(String str, int num) {

        return String.format("%0" + num + "d", Integer.parseInt(str));

    }

}

在建立一个scan对象后，我们setStartRow(00000120120901)，setStopRow(00000120120914)。

这样，scan时只扫描userID=1的数据，且时间范围限定在这个指定的时间段内，满足了按用户以及按时间范围对结果的筛选。并且由于记录集中存储，性能很好。

然后使用 SingleColumnValueFilter（org.apache.hadoop.hbase.filter.SingleColumnValueFilter），共4个，分别约束name的上下限，与category的上下限。满足按同时按文件名以及分类名的前缀匹配。

（注意：使用SingleColumnValueFilter会影响查询性能，在真正处理海量数据时会消耗很大的资源，且需要较长的时间）

如果需要分页还可以再加一个PageFilter限制返回记录的个数。

HBase应用的更多相关文章

Mapreduce的文件和hbase共同输入
Mapreduce的文件和hbase共同输入 package duogemap; import java.io.IOException; import org.apache.hadoop.co ...
Redis/HBase/Tair比较
KV系统对比表对比维度 Redis Redis Cluster Medis Hbase Tair 访问模式支持Value大小理论上不超过1GB(建议不超过1MB) 理论上可配置(默认配置1 ...
Hbase的伪分布式安装
Hbase安装模式介绍单机模式 1> Hbase不使用HDFS,仅使用本地文件系统 2> ZooKeeper与Hbase运行在同一个JVM中分布式模式– 伪分布式模式1> 所有进 ...
Spark踩坑记——数据库（Hbase+Mysql）
[TOC] 前言在使用Spark Streaming的过程中对于计算产生结果的进行持久化时,我们往往需要操作数据库,去统计或者改变一些值.最近一个实时消费者处理任务,在使用spark streami ...
Spark读写Hbase的二种方式对比
作者:Syn良子出处:http://www.cnblogs.com/cssdongl 转载请注明出处一.传统方式这种方式就是常用的TableInputFormat和TableOutputForm ...
深入学习HBase架构原理
HBase定义 HBase 是一个高可靠.高性能.面向列.可伸缩的分布式存储系统,利用Hbase技术可在廉价PC Server上搭建大规模结构化存储集群. HBase 是Google Bigtabl ...
hbase协处理器编码实例
Observer协处理器通常在一个特定的事件(诸如Get或Put)之前或之后发生,相当于RDBMS中的触发器.Endpoint协处理器则类似于RDBMS中的存储过程,因为它可以让你在RegionSer ...
hbase集群安装与部署
1.相关环境 centos7 hadoop2.6.5 zookeeper3.4.9 jdk1.8 hbase1.2.4 本篇文章仅涉及hbase集群的搭建,关于hadoop与zookeeper的相关部 ...
从零自学Hadoop(22)：HBase协处理器
阅读目录序介绍 Observer操作示例下载系列索引本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephisto)写的,Sour ...
Hbase安装和错误
集群规划情况: djt1 active Hmaster djt2 standby Hmaster djt3 HRegionServer 搭建步骤: 第一步:配置conf/regionservers d ...

随机推荐

Docker容器安装配置SQLServer服务（Linux）
一:前言随着不断的对Docker容器的实践和学习,越来越觉得容器的强大,把 SQL Server 数据库服务放在docker容器中,比你自己在宿主服务器上面安装配置一个SQL Server服务器是要 ...
aspnetcore identity result.Succeeded SignInManager.IsSignedIn(User) false？
登陆返回的是 result.Succeeded 为什么跳转到其他页面SignInManager.IsSignedIn(User)为false呢? result.Succeeded _signInMan ...
Java学习笔记二——正则表达式
Java正则表达式正则表达式的规则 “abc” 匹配字符串abc [abc] 匹配[]里任意一个字符 [a-z]: 匹配所有小写字母中的任意一个字符 [A-Z]: 匹配所有大写字母中的任意一个字符 ...
【C语言】学不会的指针
指针前言: 指针是C语言程序的核心,刚开始学指针,嗯....这样呀,貌似不难呀:之后开始用指针,&p,p,*p,**p,这些指针在用的时候,额.....什么东东?每次都要想半天,特别是遇到双 ...
JAVA知识点总结篇（三）
抽象类使用规则 abstract定义抽象类: abstract定义抽象方法,只有声明,不需要实现: 包含抽象方法的类是抽象类: 抽象类中可以包含普通方法,也可以没有抽象方法: 抽象类不能直接创建,可 ...
go 程序整个执行过程
使用 kill 命令杀死 java进程，你用对了吗？
在本地调试agent相关功能,需要经常性的杀掉Java进程,验证一些极端情况. 每次都是本能执行如下步骤 jps kill -9 <pid> reboot 有一次验证,发现代码中添加的Sh ...
守护线程在logback中的使用 - 论基础知识的重要性
守护线程在logback中的使用先说问题,在java应用中,logback的异步Appender是怎么在主线程结束后,停下来的? 复盘我在一个logback的测试用例中,写了这样的代码和logba ...
Java Annontation 注解的学习和理解
/** * <html> * <body> * <P> Copyright 1994 JsonInternational</p> * <p> ...
AutoFac的简单使用教程
Autofac可以对代码进行依赖注入,实现控制反转.以下是本菜鸟在初次入门时的代码配置,其源码,内部原理都还有待日后研究.目前也只是仅仅做到了能够使项目正常使用而已. 跟我一样刚刚入门的菜鸟朋友们可以 ...

HBase应用

几个column family比较合适呢

几个column family比较合适呢

Table Schema的设计

RowKey的设计一

RowKey的设计二

RowKey的设计三

RowKey的设计三

RowKey的设计三

HBase应用的更多相关文章

随机推荐

热门专题

　　RowKey的设计三