https://github.com/Stratio/cassandra-lucene-index

Stratio’s Cassandra Lucene Index

Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, is a plugin for Apache Cassandra that extends its index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data. Stratio’s Cassandra indexes are one of the core modules on which Stratio’s BigData platform is based.

Index relevance searches allow you to retrieve the n more relevant results satisfying a search. The coordinator node sends the search to each node in the cluster, each node returns its n best results and then the coordinator combines these partial results and gives you the n best of them, avoiding full scan. You can also base the sorting in a combination of fields.

Any cell in the tables can be indexed, including those in the primary key as well as collections. Wide rows are also supported. You can scan token/key ranges, apply additional CQL3 clauses and page on the filtered results.

Index filtered searches are a powerful help when analyzing the data stored in Cassandra with MapReduce frameworks as Apache Hadoop or, even better, Apache Spark. Adding Lucene filters in the jobs input can dramatically reduce the amount of data to be processed, avoiding full scan.

The following benchmark result can give you an idea about the expected performance when combining Lucene indexes with Spark. We do successive queries requesting from the 1% to 100% of the stored data. We can see a high performance for the index for the queries requesting strongly filtered data. However, the performance decays in less restrictive queries. As the number of records returned by the query increases, we reach a point where the index becomes slower than the full scan. So, the decision to use indexes in your Spark jobs depends on the query selectivity. The trade-off between both approaches depends on the particular use case. Generally, combining Lucene indexes with Spark is recommended for jobs retrieving no more than the 25% of the stored data.

This project is not intended to replace Apache Cassandra denormalized tables, inverted indexes, and/or secondary indexes. It is just a tool to perform some kind of queries which are really hard to be addressed using Apache Cassandra out of the box features, filling the gap between real-time and analytics.

More detailed information is available at Stratio’s Cassandra Lucene Index documentation.

Features

Lucene search technology integration into Cassandra provides:

Stratio’s Cassandra Lucene Index and its integration with Lucene search technology provides:

  • Full text search (language-aware analysis, wildcard, fuzzy, regexp)
  • Boolean search (and, or, not)
  • Sorting by relevance, column value, and distance
  • Geospatial indexing (points, lines, polygons and their multiparts)
  • Geospatial transformations (bounding box, buffer, centroid, convex hull, union, difference, intersection)
  • Geospatial operations (intersects, contains, is within)
  • Bitemporal search (valid and transaction time durations)
  • CQL complex types (list, set, map, tuple and UDT)
  • CQL user defined functions (UDF)
  • CQL paging, even with sorted searches
  • Columns with TTL
  • Third-party CQL-based drivers compatibility
  • Spark and Hadoop compatibility

Not yet supported:

  • Thrift API
  • Legacy compact storage option
  • Indexing counter columns
  • Indexing static columns
  • Other partitioners than Murmur3

Requirements

  • Cassandra (identified by the three first numbers of the plugin version)
  • Java >= 1.8 (OpenJDK and Sun have been tested)
  • Maven >= 3.0

cassandra的全文检索插件的更多相关文章

  1. 大约SQL/NoSQL数据库搜索/思考查询

    转载请注明出处:jiq•钦's technical Blog Hbase特征: 近期在学习Hbase.Hbase基于行健是建立了索引的,查询速度会很快,全然实时. 可是Hbase要基于行健之外的字段进 ...

  2. Spring cloud gateway

    ==================================为什么需要API gateway?==================================企业后台微服务互联互通, 因为 ...

  3. Postgresql-模糊匹配大杀器

    # Postgresql-模糊匹配大杀器 ## 问题背景 随着pg越来越强大,abase目前已经升级到5.0(postgresql10.4),目前abase5.0继承了全文检索插件(zhparser) ...

  4. mysql字段约束-索引-外键---3

    本节所讲内容: 字段修饰符 清空表记录 索引 外键 视图 一:字段修饰符 (约束) 1:null和not null修饰符   我们通过这个例子来看看 mysql> create table wo ...

  5. MySql5.7InnoDB全文索引(针对中文搜索)

    1.ngram and MeCab full-text parser plugins 全文检索在MySQL里面很早就支持了,只不过一直以来只支持英文.缘由是他从来都使用空格来作为分词的分隔符,而对于中 ...

  6. Mysql常见索引介绍

    索引是一种特殊的文件,包含了对数据表中所有记录的引用指针.InnoDB引擎的数据库,其上的索引是表空间的一个组成部分. (1).索引的优缺点 优点:加快搜索速度,减少查询时间 缺点:索引是以文件的形式 ...

  7. search(9)- elastic4s logback-appender

    前面写了个cassandra-appender,一个基于cassandra的logback插件.正是cassandra的分布式数据库属性才合适作为akka-cluster-sharding分布式应用的 ...

  8. 老司机带你玩转面试(1):缓存中间件 Redis 基础知识以及数据持久化

    引言 今天周末,我在家坐着掐指一算,马上又要到一年一度的金九银十招聘季了,国内今年上半年受到 YQ 冲击,金三银四泡汤了,这就直接导致很多今年毕业的同学会和明年毕业的同学一起参加今年下半年的秋招,这个 ...

  9. mysql使用全文索引实现大字段的模糊查询

    0.场景说明 centos7 mysql5.7 InnoDB引擎 0.1创建表 DROP TABLE IF EXISTS tbl_article_content; CREATE TABLE tbl_a ...

随机推荐

  1. 笔试算法题(48):简介 - A*搜索算法(A Star Search Algorithm)

    A*搜索算法(A Star Search Algorithm) A*算法主要用于在二维平面上寻找两个点之间的最短路径.在从起始点到目标点的过程中有很多个状态空间,DFS和BFS没有任何启发策略所以穷举 ...

  2. 零基础入门学习Python(33)--异常处理:你不可能总是对的2

    知识点 异常处理 捕捉异常可以使用try/except语句. try/except语句用来检测try语句块中的错误,从而让except语句捕获异常信息并处理. 如果你不想在异常发生时结束你的程序,只需 ...

  3. 树莓派 - gpio-led platform driver 控制LED

    树莓派3b板上有两个LED, pwr (power) 和 act (activity).是platform_driver gpio-led驱动. 可以通过设备树和gpio-led来额外控制一个LED. ...

  4. PHP:Mysql判断KEY是否存在 如果存在走修改 如果不存在走添加

    文章来源:http://www.cnblogs.com/hello-tl/p/7738113.html 0.PHP代码 <?php /** * POST 传参 * * 例子 添加修改 使用同一个 ...

  5. prometheus监控mysql

    创建一个用于mysqld_exporter连接到MySQL的用户并赋予所需的权限 mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO '; my ...

  6. 最详细的JavaWeb开发基础之java环境搭建(Mac版)

    阅读文本大概需要 5 分钟. 我之前分享过在 Windows 下面配置 Java 环境,这次给大家带来的是 Mac 下面安装配置 Java 环境.首先 Mac 系统已经带有默认的 Java,但是由于使 ...

  7. <转> 二分图多重匹配问题

    在二分图最大匹配中,每个点(不管是X方点还是Y方点)最多只能和一条匹配边相关联,然而,我们经常遇到这种问题,即二分图匹配中一个点可以和多条匹配边相关联,但有上限,或者说,Li表示点i最多可以和多少条匹 ...

  8. 九度oj 题目1023:EXCEL排序

    题目1023:EXCEL排序 时间限制:1 秒 内存限制:32 兆 特殊判题:否 提交:20699 解决:4649 题目描述:     Excel可以对一组纪录按任意指定列排序.现请你编写程序实现类似 ...

  9. MVC系统学习5——验证

    其实关于Mvc的验证在上一篇已经有讲过一些了,可以通过在我们定义的Model上面添加相应的System.ComponentModel.DataAnnotations空间下的验证属性.在服务器端通过Mo ...

  10. hdu 5029树链剖分

    /* 解:标记区间端点,按深度标记上+下-. 然后用线段树维护求出最小的,再将它映射回来 */ #pragma comment(linker, "/STACK:102400000,10240 ...