Home

Gabriel Reid edited this page on 7 Aug 2013 · 8 revisions

Introduction

The HBase Indexer project provides indexing (via Solr) for content stored in HBase. It provides a flexible and extensible way of defining indexing rules, and is designed to scale.

Indexing is performed asynchronously, so it does not impact write throughput on HBase. SolrCloud is used for storing the actual index in order to ensure scalability of the indexing.

Getting started with the HBase Indexer

Make sure you've got the required software installed, as detailed on the Requirements page.
Follow the Tutorial to get a feel for how to use the HBase Indexer.
Customize your indexing setup as needed using the other reference documentation provided here.

How it works

The HBase Indexer works by acting as an HBase replication sink. As updates are written to HBase region servers, they are "replicated" asynchronously to the HBase Indexer processes.

The indexer analyzes incoming HBase mutation events, and where applicable it creates Solr documents and pushes them to SolrCloud servers.

The indexed documents in Solr contain enough information to uniquely identify the HBase row that they are based on, allowing you to use Solr to search for content that is stored in HBase.

HBase replication is based on reading the HBase log files, which are the precise source of truth of the what is stored in HBase: there are no missing or no extra events. In various cases, the log also contains all the information needed to index, so that no expensive random-read on HBase is necessary (see the read-row attribute in the Indexer Configuration).

HBase replication delivers (small) batches of events. HBase-indexer exploits this by avoiding double-indexing of the same row if it would have been updated twice in a short time frame, and as well will batch/buffer the updates towards Solr, which gives important performance gains. The updates are applied to Solr before confirming the processing of the events to HBase, so that no event loss is possible.

Horizontal scalability

All information about indexers is stored in ZooKeeper. New indexer hosts can always be added to a cluster, in the same way that HBase regionservers can be added to to an HBase cluster.

All indexing work for a single configured indexer is shared over all machines in the cluster. In this way, adding additional indexer nodes allows horizontal scaling.

Automatic failure handling

The HBase replication system upon which the HBase Indexer is based is designed to handle hardware failures. Because the HBase Indexer is based on this system, it also benefits from the same ability to handle failures.

In general, indexing nodes going down or Solr nodes going down will not result in any lost data in the HBase Indexer.

hbase-indexer官网wiki的更多相关文章

eclipse p2更新官网wiki的例子
官网的cvs好像没了,不过在github上找到一份,可用. https://github.com/anthonydahanne/make-p2-buildable-with-tycho/tree/ma ...
hbases索引技术：Lily HBase Indexer介绍
Lily HBase Indexer 为hbase提供快速查询,他允许不写代码,快速容易的把hbase行索引到solr.Lily HBase Indexer drives HBase indexing ...
卸载 Cloudera Manager 5.1.x.和相关软件【官网翻译】
问题导读: 1.不同的安装方式,卸载方法存在什么区别?2.不同的操作系统,卸载 Cloudera Manager Server and 数据库有什么区别? 重新安装不完整如果你来到这里,因为你的安装没 ...
dubbo系列一：dubbo介绍、dubbo架构、dubbo的官网入门使用demo
一.dubbo介绍 Dubbo是阿里巴巴公司开源的一个高性能优秀的服务框架,使得应用可通过高性能的RPC实现服务的输出和输入功能,可以和Spring框架无缝集成.简单地说,dubbo是一个基于Spri ...
OpenTSDB（时序数据库官网）
官网地址:http://opentsdb.net/ 下载地址:https://github.com/OpenTSDB/opentsdb/releases ----------------------- ...
Java微信扫描支付模式二Demo ,整合官网直接运行版本
概述场景介绍用户使用微信“扫一扫”扫描二维码后,获取商品支付信息,引导用户完成支付. 详细代码下载:http://www.demodashi.com/demo/13880.html 一.相关配置 ...
caffe官网的部分翻译及NG的教程
Caffe原来叫:Convolutional Architecture for Fast Feature Embedding 官网的个人翻译:http://blog.csdn.net/fengbing ...
【原】Zookeeper 概述 + 官网 Overview 翻译
分布式应用分布式应用 distributed application可以在给定时间(同时)在网络中的多个系统上运行,通过协调它们以快速有效的方式完成特定任务. (a), (b): a distrib ...
几个比較好的IT站和开发库官网
几个比較好的IT站和开发库官网 1.IT技术.项目类站点 (1)首推CodeProject,一个国外的IT站点,官网地址为:http://www.codeproject.com,这个站点为程序开发人员 ...

随机推荐

Python字符编码和转码
一:Python2 python2默认编码格式是ascii码,解释器解释代码时会将代码以及代码中的字符串等转换成ascii码再执行.这样会导致字符串输出或传输时,与当前环境编码格式不同的话会显示乱码. ...
交叉编译支持SVE ACLE的gcc
最近在学习AArch64的SVE技术时,发现目前可以在网上找到的gcc版本都不支持SVE intrinsic方式调用,在看文档时发现,GCC要到2020年的GCC10时才会支持: 在github上看到 ...
hdu1801 01翻转贪心
题目描述: 对于给出的一个n*m的矩形,它由1和0构成,现在给你一个r*c的矩形空间可以选择,且可以选择无数次(被选中的范围内01翻转),要求问将这个01矩阵全部变成0的最少需要翻多少次,且如果无法实 ...
scapy 中sniff指定的数据包并打印指定信息
在理解这篇文章前可以先看看这两篇文章: https://www.cnblogs.com/liyuanhong/p/10925582.html https://www.cnblogs.com/liyua ...
Tomcat8 访问 manager App 失败
Tomcat8 访问 manager App 失败进入 tomcat 8 的下面路径修改上面的 context.xml 注释了下面的框框保存退出.重启tomcat
Flex弹性盒模型（新老版本完整）--移动端开发整理笔记（二）
Flex布局 Flex即Flexible Box,写法为:display:flex(旧版:display: -webkit-box) 在Webkit内核下,需要加-webkit前缀: .box{ di ...
【电脑】xshell报：需要Xmanager软件来处理X11转发请求
https://www.netsarang.com/zh/xmanager/ 下载了就好了我的图片出不来,下了就好了.
GrowingIO配置
一.什么是UTM参数UTM是一套标准的跟踪渠道流量的参数,你可以通过它来跟踪访问你网站的流量来自于哪些渠道.哪些媒介等. 二.高级用法 UTM参数包括了utm_source在内的5个参数,分别是: 参 ...
题解洛谷 P2010 【回文日期】
By:Soroak 洛谷博客知识点:模拟+暴力枚举思路:题目中有提到闰年然后很多人就认为,闰年是需要判断的其实,含有2月29号的回文串,前四位是一个闰年那么我们就可以直接进行暴力枚举一些小细节: ...
Web项目中使用Log4net 案例
简介: 几乎所有的大型应用都会有自己的用于跟踪调试的API.因为一旦程序被部署以后,就不太可能再利用专门的调试工具了.然而一个管理员可能需要有一套强大的日志系统来诊断和修复配置上的问题. 经验表明,日 ...

hbase-indexer官网wiki