微软Azure云平台Hbase 的使用
In this article
- What is HBase?
- Prerequisites
- Provision HBase clusters using Azure Management portal
- Mange HBase tables using HBase shell
- Use HiveQL to query HBase tables
- Use the Microsoft HBase REST client library to manage HBase tabels
- See also
What is HBase?
HBase is a low-latency NoSQL database that allows online transactional processing of big data. HBase is offered as a managed cluster integrated into the Azure environment. The clusters are configured to store data directly in Azure Blob storage, which provides low latency and increased elasticity in performance/cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. For more information on HBase and the scenarios it can be used for, see HDInsight HBase overview.
NOTE:
HBase (version 0.98.0) is only available for use with HDInsight 3.1 clusters on HDInsight (based on Apache Hadoop and YARN 2.4.0). For version information, see What's new in the Hadoop cluster versions provided by HDInsight?
Prerequisites
Before you begin this tutorial, you must have the following:
- An Azure subscription For more information about obtaining a subscription, see Purchase Options, Member Offers, or Free Trial.
- An Azure storage account For instructions, see How To Create a Storage Account.
- A workstation with Visual Studio 2013 installed. For instructions, see Installing Visual Studio.
Provision an HBase cluster on the Azure portal
This section describes how to provision an HBase cluster using the Azure Management portal.
NOTE:
The steps in this article create an HDInsight cluster using basic configuration settings. For information on other cluster configuration settings, such as using Azure Virtual Network or a metastore for Hive and Oozie, see Provision an HDInsight cluster.
To provision an HDInsight cluster in the Azure Management portal
- Sign in to the Azure Management Portal.
- Click NEW on the lower left, and then click DATA SERVICES, HDINSIGHT, HBASE.
Enter CLUSTER NAME, CLUSTER SIZE, CLUSTER USER PASSWORD, and STORAGE ACCOUNT.

Click on the check icon on the lower left to create the HBase cluster.
Create an HBase sample table from the HBase shell
This section describes how to enable and use the Remote Desktop Protocol (RDP) to access the HBase shell and then use it to create an HBase sample table, add rows, and then list the rows in the table.
It assumes you have completed the procedure outlined in the first section, and so have already successfully created an HBase cluster.
To enable the RDP connection to the HBase cluster
- From the Management portal, click HDINSIGHT from the left to view the list of the existing clusters.
- Click the HBase cluster where you want to open HBase Shell.
- Click CONFIGURATION from the top.
- Click ENABLE REMOTE from the bottom.
- Enter the RDP user name and password. The user name must be different from the cluster user name you used when provisioning the cluster. TheEXPIRES ON data can be up to seven days from today.
- Click the check on the lower right to enable remote desktop.
- After the RPD is enabled, click CONNECT from the bottom of the CONFIGURATION tab, and follow the instructions.
To open the HBase Shell
Within your RDP session, click on the Hadoop Command Line shortcut located on the desktop.
Change the folder to the HBase home directory:
cd %HBASE_HOME%\bin
Open the HBase shell:
hbase shell
To create a sample table, add data and retrieve the data
Create a sample table:
create 'sampletable', 'cf1'
Add a row to the sample table:
put 'sampletable', 'row1', 'cf1:col1', 'value1'
List the rows in the sample table:
scan 'sampletable'
Check cluster status in the HBase WebUI
HBase also ships with a WebUI that helps monitoring your cluster, for example by providing request statistics or information about regions. On the HBase cluster you can find the WebUI under the address of the zookeepernode.
http://zookeepernode:60010/master-status
In a HighAvailability (HA) cluster, you will find a link to the current active HBase master node hosting the WebUI.
Bulk load a sample table
Create samplefile1.txt containing the following data, and upload to Azure Blob Storage to /tmp/samplefile1.txt:
row1 c1 c2
row2 c1 c2
row3 c1 c2
row4 c1 c2
row5 c1 c2
row6 c1 c2
row7 c1 c2
row8 c1 c2
row9 c1 c2
row10 c1 c2Change the folder to the HBase home directory:
cd %HBASE_HOME%\bin
Execute ImportTsv:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,a:b,a:c" -Dimporttsv.bulk.output=/tmpOutput sampletable2 /tmp/samplefile1.txt
Load the output from prior command into HBase:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmpOutput sampletable2
Use Hive to query an HBase table
Now you have an HBase cluster provisioned and have created an HBase table, you can query it using Hive. This section creates a Hive table that maps to the HBase table and uses it to queries the data in your HBase table.
To open cluster dashboard
- Sign in to the Azure Management Portal.
- Click HDINSIGHT from the left pane. You shall see a list of clusters created including the one you just created in the last section.
- Click the cluster name where you want to run the Hive job.
- Click QUERY CONSOLE from the bottom of the page to open cluster dashboard. It opens a Web page on a different browser tab.
- Enter the Hadoop User account username and password. The default username is admin, the password is what you entered during the provision process. A new browser tab is opened.
Click Hive Editor from the top. The Hive Editor looks like :

To run Hive queries
Enter the HiveQL script below into Hive Editor and click SUBMIT to create an Hive Table mapping to the HBase table. Make sure that you have created the sampletable table referenced here in HBase using the HBase Shell before executing this statement.
CREATE EXTERNAL TABLE hbasesampletable(rowkey STRING, col1 STRING, col2 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf1:col1,cf1:col2')
TBLPROPERTIES ('hbase.table.name' = 'sampletable');Wait until the Status is updated to Completed.
Enter the HiveQL script below into Hive Editor, and then click SUBMIT button. The Hive query queries the data in the HBase table:
SELECT count(*) FROM hbasesampletable;
To retrieve the results of the Hive query, click on the View Details link in the Job Session window when the job finishes executing. The Job Output shall be 1 because you only put one record into the HBase table.
To browse the output file
- From Query Console, click File Browser from the top.
- Click the Azure Storage account used as the default file system for the HBase cluster.
- Click the HBase cluster name. The default Azure storage account container uses the cluster name.
- Click user.
- Click admin. This is the Hadoop user name.
- Click the job name with the Last Modified time matching the time when the SELECT Hive query ran.
Click stdout. Save the file and open the file with Notepad. The output shall be 1.

Use HBase REST Client Library for .NET C# APIs to create an HBase table and retrieve data from the table
The Microsoft HBase REST Client Library for .NET project must be downloaded from GitHub and the project built to use the HBase .NET SDK. The following procedure includes the instructions for this task.
- Create a new C# Visual Studio Windows Desktop Console application.
- Open NuGet Package Manager Console by click the TOOLS menu, NuGet Package Manager, Package Manager Console.
Run the following NuGet command in the console:
Install-Package Microsoft.HBase.Client
Add the following using statements on the top of the file:
using Microsoft.HBase.Client;
using org.apache.hadoop.hbase.rest.protobuf.generated;Replace the Main function with the following:
static void Main(string[] args)
{
string clusterURL = "https://<yourHBaseClusterName>.azurehdinsight.net";
string hadoopUsername= "<yourHadoopUsername>";
string hadoopUserPassword = "<yourHadoopUserPassword>"; string hbaseTableName = "sampleHbaseTable"; // Create a new instance of an HBase client.
ClusterCredentials creds = new ClusterCredentials(new Uri(clusterURL), hadoopUsername, hadoopUserPassword);
HBaseClient hbaseClient = new HBaseClient(creds); // Retrieve the cluster version
var version = hbaseClient.GetVersion();
Console.WriteLine("The HBase cluster version is " + version); // Create a new HBase table.
TableSchema testTableSchema = new TableSchema();
testTableSchema.name = hbaseTableName;
testTableSchema.columns.Add(new ColumnSchema() { name = "d" });
testTableSchema.columns.Add(new ColumnSchema() { name = "f" });
hbaseClient.CreateTable(testTableSchema); // Insert data into the HBase table.
string testKey = "content";
string testValue = "the force is strong in this column";
CellSet cellSet = new CellSet();
CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(testKey) };
cellSet.rows.Add(cellSetRow); Cell value = new Cell { column = Encoding.UTF8.GetBytes("d:starwars"), data = Encoding.UTF8.GetBytes(testValue) };
cellSetRow.values.Add(value);
hbaseClient.StoreCells(hbaseTableName, cellSet); // Retrieve a cell by its key.
cellSet = hbaseClient.GetCells(hbaseTableName, testKey);
Console.WriteLine("The data with the key '" + testKey + "' is: " + Encoding.UTF8.GetString(cellSet.rows[0].values[0].data));
// with the previous insert, it should yield: "the force is strong in this column" //Scan over rows in a table. Assume the table has integer keys and you want data between keys 25 and 35.
Scanner scanSettings = new Scanner()
{
batch = 10,
startRow = BitConverter.GetBytes(25),
endRow = BitConverter.GetBytes(35)
}; ScannerInformation scannerInfo = hbaseClient.CreateScanner(hbaseTableName, scanSettings);
CellSet next = null;
Console.WriteLine("Scan results"); while ((next = hbaseClient.ScannerGetNext(scannerInfo)) != null)
{
foreach (CellSet.Row row in next.rows)
{
Console.WriteLine(row.key + " : " + Encoding.UTF8.GetString(row.values[0].data));
}
} Console.WriteLine("Press ENTER to continue ...");
Console.ReadLine();
}Set the first three variables in the Main function.
Press F5 to run the application.
What's Next?
In this tutorial, you have learned how to provision an HBase cluster, how to create tables, and and view the data in those tables from the HBase shell. You also learned how use Hive to query the data in HBase tables and how to use the HBase C# APIs to create an HBase table and retrieve data from the table.
To learn more, see:
- HDInsight HBase overview: HBase is an Apache open source NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data.
- Provision HBase clusters on Azure Virtual Network: With the virtual network integration, HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly.
- Analyze Twitter sentiment with HBase in HDInsight: Learn how to do real-time sentiment analysis of big data using HBase in an Hadoop cluster in HDInsight.
微软Azure云平台Hbase 的使用的更多相关文章
- 微软Azure云主机及blob存储的网络性能测试
http://www.cnblogs.com/sennly/p/4137024.html 微软Azure云主机及blob存储的网络性能测试 1. 测试目的 本次测试的目的在于对微软Azure的云主机. ...
- 【Microsoft Azure 的1024种玩法】三.基于Azure云平台构建Discuz论坛
[简介] Discuz!是一套通用社区论坛软件系统,用户在不需要任何编程的基础上,通过简单的设置和安装,在互联网上搭建起具备完善功能.很强负载能力和可高度定制的论坛服务. [前期文章] [操作步骤] ...
- 【Microsoft Azure 的1024种玩法】二.基于Azure云平台的安全攻防靶场系统构建
简介 本篇文章将基于在Microsoft Azure云平台上使用Pikachu去构建安全攻防靶场,Pikachu使用世界上最好的语言PHP进行开发,数据库使用的是mysql,因此运行Pikachu需要 ...
- 多云时代,海外微软Azure云与国内阿里云专线打通性能测试
本文地址:http://www.cnblogs.com/taosha/p/6528730.html 在云计算的大时代,大型客户都有业务全球拓展的需求,考虑到成本,时间因素,一般都是选择云计算,现在云计 ...
- Azure 云平台用 SQOOP 将 SQL server 2012 数据表导入 HIVE / HBASE
My name is Farooq and I am with HDinsight support team here at Microsoft. In this blog I will try to ...
- 微软Azure云主机测试报告
http://www.cnblogs.com/sennly/p/4135658.html 1. 测试目的 本次测试的目的在于对微软云主机做性能测试,评估其是否能够满足我们业务的需求. 2. 测试项目 ...
- 微软Azure开始支持Docker技术
前一段时间还在与微软的技术人员讨论媒体转换服务的效率问题,如果应用 Docker将会有质的提高,没想到国外的Azure已经开始支持了,相信国内Azure支持也不远了.微软正在努力确保Azure成为开发 ...
- 用手机应用追踪城市噪声污染——微软Azure助力解决城市问题
噪声无孔不入的城市地带(图片来自于网络) 2014年4月19日发行的<经济学人>杂志预言,到2030年,中国人口的70%(约10亿人)会在城市中居住.中国城镇化的高速发展一方面大大提高了 ...
- 微软云平台媒体服务实践系列 1- 使用静态封装为iOS, Android 设备实现点播(VoD)方案
微软的云平台媒体服务为流媒体服务提供了多种选择,在使用流媒体服务为企业做流媒体方案时,首先需要确认要流媒体接收目标,如针对广大iOS, Android移动设备,由于它们都支持HLS 格式的流媒体,基于 ...
随机推荐
- Windows 多线程知识点汇总
一.什么叫原子性? 答:一个操作不会被分成两个时间片来执行,不会刚执行到一半,由于时间片到了,CPU就跑去执行其他线程了.在多线程环境中对一个变量进行读写时,我们需要有一种方法能够保证对一个值的操作是 ...
- VirtualBox中安装CentOS-6.6虚拟机(转载)
1. 下载 可以到官网下载,http://mirror.centos.org/centos/ 如果下载速度太慢的话,也可以到163镜像下载: http://mirrors.163.com/centos ...
- qt 1 qt开发中的窗口设计
一个简单的qt界面 相应代码如下: setWindowTitle(tr("Sotware"));//设置窗体标题 ui->tabWidget->removeTab(); ...
- 【leetcode❤python】409. Longest Palindrome
#-*- coding: UTF-8 -*- from collections import Counterclass Solution(object): def longestPalindro ...
- 异步设备IO 《windows核心编程》第10章学习
异步IO操作与同步操作区别: 在CreateFile里的FILE_FLAG_OVERLAPPED标志 异步操作函数LPOVERLAPPED参数 接收IO请求完成通知 触发设备内核对象 缺点:同一个设备 ...
- XML学习笔记(二)-- DTD格式规范
标签(空格分隔): 学习笔记 XML的一个主要目的是允许应用程序之间自由交换结构化的数据,因此要求XML文档具有一致的结构.业务逻辑和规则.可以定义一种模式来定义XML文档的结构,并借此验证XML文档 ...
- Linux curl命令参数详解
笔者出处:http://www.aiezu.com/system/linux/linux_curl_syntax.html linux curl是通过url语法在命令行下上传或下载文件的工具软件,它支 ...
- iOS - Swift NSUserDefaults 数据存储
前言 public class NSUserDefaults : NSObject 用来保存应用程序设置和属性.用户保存的数据.用户再次打开程序或开机后这些数据仍然存在.如果往 userDefault ...
- SAP接口编程 之 JCo3.0系列(04) : 会话管理
在SAP接口编程之 NCo3.0系列(06) : 会话管理 这篇文章中,对会话管理的相关知识点已经说得很详细了,请参考.现在用JCo3.0来实现. 1. JCoContext 如果SAP中多个函数需要 ...
- 事件冒泡与事件委托 -Tom
事件冒泡 事件冒泡,就是事件触发的时候通过DOM向上冒泡,首先要知道不是所有的事件都有冒泡.事件在一个目标元素上触发的时候,该事件将触发祖先节点元素,直到最顶层的元素: 如图所示,如果a连接被点击,触 ...