The Apache HBase™ Reference Guide
以下内容由http://hbase.apache.org/book.html#getting_started节选并改编而来。
运行环境:hadoop-1.0.4,hbase-0.94.22,jdk1.7.0_65
Chapter 1. Getting Started
create a table in HBase using the hbase shell CLI,
insert rows into the table,
perform put and scan operations against the table,
enable or disable the table,
start and stop HBase
Local Filesystem and Durability
Using HBase 0.98.2 and earlier releases with a local filesystem does not guarantee durability. The HDFS local filesystem implementation will lose edits if files are not properly closed. This is very likely to happen when you are experimenting with new software, starting and stopping the daemons often and not always cleanly. You need to run HBase on HDFS to ensure all writes are preserved. 尽管有bug,我们依然这样做的目的是快速方便地熟悉HBase的相关知识。
Loopback IP - HBase 0.94.x and earlier
Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase care about /etc/hosts? for detail.
The following /etc/hosts file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
HBase requires that a JDK be installed. See Table 2.1, “Java” for information about supported JDK versions.
1.2.2. Get Started with HBase
For HBase 0.98.5 and later, you are required to set the
JAVA_HOMEenvironment variable before starting HBase. Prior to 0.98.5, HBase attempted to detect the location of Java if the variables was not set. You can set the variable via your operating system's usual mechanism, but HBase provides a central mechanism,conf/hbase-env.sh. Edit this file, uncomment the line starting withJAVA_HOME, and set it to the appropriate location for your operating system. TheJAVA_HOMEvariable should be set to a directory which contains the executable filebin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can setJAVA_HOMEto the directory containing the symbolic link tobin/java, which is usually/usr.JAVA_HOME=/usr
Edit main HBase configuration file
conf/hbase-site.xml. At this time, only need to specify the directory on the local filesystem where HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. The following configuration will store HBase's data in thehbasedirectory, in the home directory of the user calledtestuser. Paste the<property> tags beneath the <configuration> tags, which should be empty in a new HBase install.Example
hbase-site.xmlfor Standalone HBase<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/testuser/hbase</value> //hbase是用户为存储数据而建立的一个目录
</property>
<property> //我还没有安装zookeeper,所以没有这部分的配置信息
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/testuser/zookeeper</value>
</property>
</configuration>The
bin/start-hbase.shscript is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps command to verify that you have one running process calledHMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon.
Procedure 1.2. Use HBase For the First Time
Connect to HBase.
$
./bin/hbase shell
hbase(main):001:0>Display HBase Shell Help Text.
Type
helpand press Enter, to display some basic usage information for HBase Shell, as well as several example commands.Create a table.
You must specify the table name and the ColumnFamily name.
hbase>
create 'test', 'cf'
0 row(s) in 1.2200 secondsList Information About your Table
hbase>
list 'test'
TABLE
test
1 row(s) in 0.0350 seconds => ["test"]Put data into your table.
hbase>
put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds hbase>put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0160 seconds hbase>put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0260 secondsScan the table for all data at once.
hbase>
scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1403759475114, value=value1
row2 column=cf:b, timestamp=1403759492807, value=value2
row3 column=cf:c, timestamp=1403759503155, value=value3
3 row(s) in 0.0440 secondsGet a single row of data.
hbase>
get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1403759475114, value=value1
1 row(s) in 0.0230 secondsDisable a table.
If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the
disablecommand. You can re-enable it using theenablecommand.hbase> disable 'test'
0 row(s) in 1.6270 seconds hbase> enable 'test'
0 row(s) in 0.4500 secondshbase> disable 'test'
0 row(s) in 1.6270 secondsDrop the table.
hbase> drop 'test'
0 row(s) in 0.2900 secondsExit the HBase Shell.
To exit the HBase Shell and disconnect from your cluster, use the quit command.
Stop HBase
$ ./bin/stop-hbase.sh
stopping hbase....................
$
1.2.3. Intermediate - Pseudo-Distributed Local Install
You can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the hbase.rootdir property as described in Section 1.2, “Quick Start - Standalone HBase”, your data is still stored in /tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available.
我启动hbase后只有HMaster而无HRegionServer,为什么???? (我今天把hbase启动成伪分布模式,就有HRegionServer这个daemon了!!!)
Hadoop Configuration
This procedure assumes that you have configured Hadoop and HDFS on your local system and or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. Currently, the documentation on the Hadoop website does not include a quick start for Hadoop 2, but the guide athttp://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide is a good starting point.
Configure HBase.
Edit the
hbase-site.xmlconfiguration.<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>Next, change the
hbase.rootdirfrom the local filesystem to the address of your HDFS instance, using thehdfs:////URI syntax.<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>You do not need to create the directory in HDFS. HBase will do this for you.
Start HBase.
Use the
bin/start-hbase.shcommand to start HBase.Check the HBase directory in HDFS.
If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in
/hbase/on HDFS. You can use the hadoop fs command in Hadoop'sbin/directory to list this directory.$
./bin/hadoop fs -ls /hbase //这里一开始运行会提示连接错误,网上查,解决办法是重新格式化namenode,果然解决,但是只有/tmp目录,无/hbase?????
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALsCreate a table and populate it with data.
You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in Procedure 1.2, “Use HBase For the First Time”.
Start and stop a backup HBase Master (HMaster) server.
Note
Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.
The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.
$ ./bin/local-master-backup.sh 2 3 5To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like
/tmp/hbase-. The only contents of the file are the PID. You can use the kill -9 command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:USER-X-master.pid$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9Start and stop additional RegionServers
The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. The local-regionservers.sh command allows you to run multiple RegionServers. It works in a similar way to the local-master-backup.sh command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).
$ .bin/local-regionservers.sh start 2 3 4 5To stop a RegionServer manually, use the local-regionservers.sh command with the
stopparameter and the offset of the server to stop.$ .bin/local-regionservers.sh stop 3
Stop HBase.
You can stop HBase the same way as in the Section 1.2, “Quick Start - Standalone HBase” procedure, using the
bin/stop-hbase.shcommand.
1.2.4. Advanced - Fully Distributed
In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple Zookeeper nodes, and multiple RegionServer nodes.
This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:
Table 1.1. Distributed Cluster Demo Architecture
| Node Name | Master | ZooKeeper | RegionServer |
|---|---|---|---|
| node-a.example.com | yes | yes | no |
| node-b.example.com | backup | yes | yes |
| node-c.example.com | no | yes | yes |
This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart, Section 1.2.3, “Intermediate - Pseudo-Distributed Local Install”, assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a before continuing.
Note
Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like no route to host, check your firewall.
Procedure 1.4. Configure Password-Less SSH Access
node-a needs to be able to log into node-b and node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from node-a to each of the others.
On
node-a, generate a key pair.While logged in as the user who will run HBase, generate a SSH key pair, using the following command:
$ ssh-keygen -t rsa
If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key is
id_rsa.pub.Create the directory that will hold the shared keys on the other nodes.
On
node-bandnode-c, log in as the HBase user and create a.ssh/directory in the user's home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.Copy the public key to the other nodes.
Securely copy the public key from
node-ato each of the nodes, by using the scp or some other secure means. On each of the other nodes, create a new file called.ssh/authorized_keysif it does not already exist, and append the contents of theid_rsa.pubfile to the end of it. Note that you also need to do this fornode-aitself.$ cat id_rsa.pub >> ~/.ssh/authorized_keys
Test password-less login.
If you performed the procedure correctly, if you SSH from
node-ato either of the other nodes, using the same username, you should not be prompted for a password.Since
node-bwill run a backup Master, repeat the procedure above, substitutingnode-beverywhere you seenode-a. Be sure not to overwrite your existing.ssh/authorized_keysfiles, but concatenate the new key onto the existing file using the>>operator rather than the>operator.
Procedure 1.5. Prepare node-a
node-a will run your primary master and ZooKeeper processes, but no RegionServers.
Stop the RegionServer from starting on
node-a.Edit
conf/regionserversand remove the line which containslocalhost. Add lines with the hostnames or IP addresses fornode-bandnode-c. Even if you did want to run a RegionServer onnode-a, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would benode-a.example.com. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.Configure HBase to use
node-bas a backup master.Create a new file in
conf/calledbackup-masters, and add a new line to it with the hostname fornode-b. In this demonstration, the hostname isnode-b.example.com.Configure ZooKeeper
In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper in Chapter 20, ZooKeeper. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.
On
node-a, editconf/hbase-site.xmland add the following properties.<property>
<name>hbase.zookeeper.quorum</name>
<value>node-a.example.com,node-b.example.com,node-c.example.com</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>Everywhere in your configuration that you have referred to
node-aaslocalhost, change the reference to point to the hostname that the other nodes will use to refer tonode-a. In these examples, the hostname isnode-a.example.com.
Procedure 1.6. Prepare node-b and node-c
node-b will run a backup master server and a ZooKeeper instance.
Download and unpack HBase.
Download and unpack HBase to
node-b, just as you did for the standalone and pseudo-distributed quickstarts.Copy the configuration files from
node-atonode-b.andnode-c.Each node of your cluster needs to have the same configuration information. Copy the contents of the
conf/directory to theconf/directory onnode-bandnode-c.
Procedure 1.7. Start and Test Your Cluster
Be sure HBase is not running on any node.
If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using the jps command. Look for the processes
HMaster,HRegionServer, andHQuorumPeer. If they exist, kill them.Start the cluster.
On
node-a, issue the start-hbase.sh command. Your output will be similar to that below.$
bin/start-hbase.sh
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.outZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.
Verify that the processes are running.
On each node of the cluster, run the jps command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.
Example 1.3.
node-ajps Output$
jps
20355 Jps
20071 HQuorumPeer
20137 HMasterExample 1.4.
node-bjps Output$
jps
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMasterExample 1.5.
node-cjps Output$
jps
13901 Jps
13639 HQuorumPeer
13737 HRegionServerZooKeeper Process Name
The
HQuorumPeerprocess is a ZooKeeper instance which is controlled and started by HBase. If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of HBase, the process is calledQuorumPeer. For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see Chapter 20, ZooKeeper.Browse to the Web UI.
Web UI Port Changes
In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030 for the RegionServer.
If everything is set up correctly, you should be able to connect to the UI for the Master
http://node-a.example.com:60110/or the secondary master athttp://node-b.example.com:60110/for the secondary master, using a web browser. If you can connect vialocalhostbut not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 60130 of their IP addresses, or by clicking their links in the web UI for the Master.Test what happens when nodes or services disappear.
With a three-node cluster like you have configured, things will not be very resilient. Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.
1.2.5. Where to go next
The next chapter, Chapter 2, Apache HBase Configuration, gives more information about the different HBase run modes, system requirements for running HBase, and critical configuration areas for setting up a distributed HBase cluster.
Chapter 2. Apache HBase Configuration
The Apache HBase™ Reference Guide的更多相关文章
- How-to: Enable User Authentication and Authorization in Apache HBase
With the default Apache HBase configuration, everyone is allowed to read from and write to all table ...
- Hibernate Validator 6.0.9.Final - JSR 380 Reference Implementation: Reference Guide
Preface Validating data is a common task that occurs throughout all application layers, from the pre ...
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- 【转】How-to: Enable User Authentication and Authorization in Apache HBase
With the default Apache HBase configuration, everyone is allowed to read from and write to all table ...
- 【HBase学习】Apache HBase项目简介
原创声明:转载请注明作者和原始链接 http://www.cnblogs.com/zhangningbo/p/4068957.html 英文原版:http://hbase.apache.o ...
- Apache HBase RPC身份验证中间人安全措施绕过漏洞(CVE-2013-2193)
漏洞版本: Apache Group HBase 0.94.x Apache Group HBase 0.92.x 漏洞描述: BUGTRAQ ID: 61981 CVE(CAN) ID: CVE-2 ...
- GoldenGate实时投递数据到大数据平台(7)– Apache Hbase
Apache Hbase安装及运行 安装hbase1.4,确保在这之前hadoop是正常运行的.设置相应的环境变量, export HADOOP_HOME=/u01/hadoop export HBA ...
- Apache HBase Performance Tuning 官文总结
Apache HBase Performance Tuning RAM, RAM, RAM. 不要让HBase饿死. 请使用64位的平台 必须将swapping设定为0 使用本地硬件来完成hdfs的c ...
- Apache HBase 集群安装文档
简介: Apache HBase 是一个分布式的.面向列的开源 NoSQL 数据库.具有高性能.高可靠性.可伸缩.面向列.分布式存储的特性. HBase 的数据文件最终落地在 HDFS 之上,所以在 ...
随机推荐
- Pandas系列(十)-转换连接详解
目录 1. 拼接 1.1 append 1.2 concat 2. 关联 2.1 merge 2.2 join 数据准备 # 导入相关库 import numpy as np import panda ...
- Python变量命名规范
模块名: 小写字母,单词之间用_分割 ad_stats.py 包名: 和模块名一样 类名: 单词首字母大写 AdStats ConfigUtil 全局变量名(类变量,在java中相当于static变量 ...
- Docker:容器的四种网络类型 [十三]
一.None类型 简介:不为容器配置任何网络功能,--net=none 1.创建容器 docker run -it --network none busubox:latest 2.功能测试 [root ...
- LFYZ-OJ ID: 1011 hanoi双塔问题
思路 虽然每种大小盘子数量为2,但对总步数的影响只是一个简单的倍数关系而已,递推关系很容易可以总结出来:an=an-1+2+an-1=2(an-1+1),n=1时,a1=2.故递推的过程就是从a1=2 ...
- 第四节: EF调用存储过程的通用写法和DBFirst模式子类调用的特有写法
一. 背景 上一个章节,介绍了EF调用两类SQL语句,主要是借助 ExecuteSqlCommand 和 SqlQuery 两个方法来完成,在本章节主要是复习几类存储过程的写法和对应的EF调用这几类 ...
- Ubuntu 18.04 LTS搭建GO语言开发环境
一.下载Go语言安装包 官网下载地址:https://golang.org/dl/,使用tar命令将档案包解压到/usr/local目录中: sudo tar -C /usr/local -xzf g ...
- hibernate之Session对象
Session对象:数据库的核心对象 增删改查 ...java public class UserDao { public void addUser(User user) { //使用hibernat ...
- 关于 min_25 筛的入门以及复杂度证明
min_25 筛是由 min_25 大佬使用后普遍推广的一种新型算法,这个算法能在 \(O({n^{3\over 4}\over log~ n})\) 的复杂度内解决所有的积性函数前缀和求解问题(个人 ...
- web富文本编辑器收集
1.UEditor 百度的. 优点:插件多,基本满足各种需求,类似贴吧中的回复界面. 缺点:不再维护,文档极少,使用并不普遍,图片只能上传到本地服务器,如果需要上传到其他服务器需要改动源码,较为难办, ...
- centos7.6环境下编译安装tengine-2.2.2的编译安装
centos7.6环境下编译安装tengine-2.2.2的编译安装 .获取tengine2..2的源码包 http://tengine.taobao.org/download/tengine-2.2 ...