Home

 
Gabriel Reid edited this page on 7 Aug 2013 · 8 revisions

Introduction

The HBase Indexer project provides indexing (via Solr) for content stored in HBase. It provides a flexible and extensible way of defining indexing rules, and is designed to scale.

Indexing is performed asynchronously, so it does not impact write throughput on HBase. SolrCloud is used for storing the actual index in order to ensure scalability of the indexing.

Getting started with the HBase Indexer

  1. Make sure you've got the required software installed, as detailed on the Requirements page.
  2. Follow the Tutorial to get a feel for how to use the HBase Indexer.
  3. Customize your indexing setup as needed using the other reference documentation provided here.

How it works

The HBase Indexer works by acting as an HBase replication sink. As updates are written to HBase region servers, they are "replicated" asynchronously to the HBase Indexer processes.

The indexer analyzes incoming HBase mutation events, and where applicable it creates Solr documents and pushes them to SolrCloud servers.

The indexed documents in Solr contain enough information to uniquely identify the HBase row that they are based on, allowing you to use Solr to search for content that is stored in HBase.

HBase replication is based on reading the HBase log files, which are the precise source of truth of the what is stored in HBase: there are no missing or no extra events. In various cases, the log also contains all the information needed to index, so that no expensive random-read on HBase is necessary (see the read-row attribute in the Indexer Configuration).

HBase replication delivers (small) batches of events. HBase-indexer exploits this by avoiding double-indexing of the same row if it would have been updated twice in a short time frame, and as well will batch/buffer the updates towards Solr, which gives important performance gains. The updates are applied to Solr before confirming the processing of the events to HBase, so that no event loss is possible.

Horizontal scalability

All information about indexers is stored in ZooKeeper. New indexer hosts can always be added to a cluster, in the same way that HBase regionservers can be added to to an HBase cluster.

All indexing work for a single configured indexer is shared over all machines in the cluster. In this way, adding additional indexer nodes allows horizontal scaling.

Automatic failure handling

The HBase replication system upon which the HBase Indexer is based is designed to handle hardware failures. Because the HBase Indexer is based on this system, it also benefits from the same ability to handle failures.

In general, indexing nodes going down or Solr nodes going down will not result in any lost data in the HBase Indexer.

hbase-indexer官网wiki的更多相关文章

  1. eclipse p2更新官网wiki的例子

    官网的cvs好像没了,不过在github上找到一份,可用. https://github.com/anthonydahanne/make-p2-buildable-with-tycho/tree/ma ...

  2. hbases索引技术:Lily HBase Indexer介绍

    Lily HBase Indexer 为hbase提供快速查询,他允许不写代码,快速容易的把hbase行索引到solr.Lily HBase Indexer drives HBase indexing ...

  3. 卸载 Cloudera Manager 5.1.x.和 相关软件【官网翻译】

    问题导读: 1.不同的安装方式,卸载方法存在什么区别?2.不同的操作系统,卸载 Cloudera Manager Server and 数据库有什么区别? 重新安装不完整如果你来到这里,因为你的安装没 ...

  4. dubbo系列一:dubbo介绍、dubbo架构、dubbo的官网入门使用demo

    一.dubbo介绍 Dubbo是阿里巴巴公司开源的一个高性能优秀的服务框架,使得应用可通过高性能的RPC实现服务的输出和输入功能,可以和Spring框架无缝集成.简单地说,dubbo是一个基于Spri ...

  5. OpenTSDB(时序数据库官网)

    官网地址:http://opentsdb.net/ 下载地址:https://github.com/OpenTSDB/opentsdb/releases ----------------------- ...

  6. Java微信扫描支付模式二Demo ,整合官网直接运行版本

    概述 场景介绍 用户使用微信“扫一扫”扫描二维码后,获取商品支付信息,引导用户完成支付. 详细 代码下载:http://www.demodashi.com/demo/13880.html 一.相关配置 ...

  7. caffe官网的部分翻译及NG的教程

    Caffe原来叫:Convolutional Architecture for Fast Feature Embedding 官网的个人翻译:http://blog.csdn.net/fengbing ...

  8. 【原】Zookeeper 概述 + 官网 Overview 翻译

    分布式应用 分布式应用 distributed application可以在给定时间(同时)在网络中的多个系统上运行,通过协调它们以快速有效的方式完成特定任务. (a), (b): a distrib ...

  9. 几个比較好的IT站和开发库官网

    几个比較好的IT站和开发库官网 1.IT技术.项目类站点 (1)首推CodeProject,一个国外的IT站点,官网地址为:http://www.codeproject.com,这个站点为程序开发人员 ...

随机推荐

  1. django rest_framework vue 实现用户列表分页

    django rest_framework vue 实现用户列表分页 后端 配置urls # 导入view from api.appview.userListView import userListV ...

  2. Codeforces J. Soldier and Number Game(素数筛)

    题目描述: Soldier and Number Game time limit per test 3 seconds memory limit per test 256 megabytes inpu ...

  3. Acesrc and Travel(2019年杭电多校第八场06+HDU6662+换根dp)

    题目链接 传送门 题意 两个绝顶聪明的人在树上玩博弈,规则是轮流选择下一个要到达的点,每达到一个点时,先手和后手分别获得\(a_i,b_i\)(到达这个点时两个人都会获得)的权值,已经经过的点无法再次 ...

  4. linux 以导入文件形式添加定时任务(crontab)时需要注意的坑

    在实际操作过程中发现,使用导入文件形式添加定时任务时,会将用户已有的定时任务全部覆盖清理(先清空,再重新导入),所以在使用文件导入定时任务时,需要先将已有定时任务导出,然后将新任务进行追加到已有定时任 ...

  5. Tomcat8 访问 manager App 失败

    Tomcat8 访问 manager App 失败 进入 tomcat 8 的下面路径 修改 上面 的 context.xml 注释了下面的框框 保存退出.重启tomcat

  6. stm32的flash操作注意事项

    从STM32编程手册中,可以知道:在进行写或擦除操作时,不能进行代码或数据的读取操作. 比如:你在写Flash期间有接收串口数据,很有可能会丢串口数据. 因为比较耗时,所以,在写数据时,CPU不会执行 ...

  7. 安装Vyos

      Vyos是一个开源的网络操作系统,基于Debian,相对于ROS需要购买license,Vyos就更加开放的多. 下载Vyos wget http://vyos.hecint.com/iso/re ...

  8. 【转】编写高质量代码改善C#程序的157个建议——建议56:使用继承ISerializable接口更灵活地控制序列化过程

    建议56:使用继承ISerializable接口更灵活地控制序列化过程 接口ISerializable的意义在于,如果特性Serializable,以及与其像配套的OnDeserializedAttr ...

  9. Q1094

    一,看题 1,字符串确实是我的弱项. 2, 二,看题解 #include<iostream> #include<string> using namespace std; int ...

  10. zzulioj - 2597: 角谷猜想2

    题目链接: http://acm.zzuli.edu.cn/problem.php?id=2597 题目描述 大家想必都知道角谷猜想,即任何一个自然数,如果是偶数,就除以2,如果是奇数,就乘以3再加1 ...