lucene搜索之facet查询原理和facet查询实例—

转自：http://www.lai18.com/content/7084969.html

Facet说明

我们在浏览网站的时候，经常会遇到按某一类条件查询的情况，这种情况尤以电商网站最多，以天猫商城为例，我们选择某一个品牌，系统会将该品牌对应的商品展示出来，效果图如下：

如上图，我们关注的是品牌，选购热点等方面，对于类似的功能我们用lucene的term查询当然可以，但是在数据量特别大的情况下还用普通查询来实现显然会因为FSDirectory.open等耗时的操作造成查询效率的低下，同时普通查询是全部document都扫描一遍，这样显然造成了查询效率低；

lucene提供了facet查询用于对同一类的document进行聚类化，这样在查询的时候先关注某一个方面，这种显然缩小了查询范围，进而提升了查询效率；

facet模块提供了多个用于处理facet的统计和值处理的方法；

要实现facet的功能，我们需要了解facetField,FacetField定义了dim和此field对应的path,需要特别注意的是我们在做facetField索引的时候，需要事先调用FacetsConfig.build(Document);

FacetField的indexOptions设置为了DOCS_AND_FREQS_AND_POSITIONS的,即既索引又统计出现的频次和出现的位置，这样做主要是为了方便查询和统计；

相应的在存储的时候我们需要利用FacetsConfig和DirectoryTaxonomyWriter；

DirectoryTaxonomyWriter用来利用Directory来存储Taxono信息到硬盘；

DirectoryTaxonomyWriter的构造器如下:

public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode,

      TaxonomyWriterCache cache) throws IOException {

    dir = directory;

    IndexWriterConfig config = createIndexWriterConfig(openMode);

    indexWriter = openIndexWriter(dir, config);

    // verify (to some extent) that merge policy in effect would preserve category docids

    assert !(indexWriter.getConfig().getMergePolicy() instanceof TieredMergePolicy) :

      "for preserving category docids, merging none-adjacent segments is not allowed";

    // after we opened the writer, and the index is locked, it's safe to check

    // the commit data and read the index epoch

    openMode = config.getOpenMode();

    if (!DirectoryReader.indexExists(directory)) {

      indexEpoch = 1;

    } else {

      String epochStr = null;

      Map<String, String> commitData = readCommitData(directory);

      if (commitData != null) {

        epochStr = commitData.get(INDEX_EPOCH);

      }

      // no commit data, or no epoch in it means an old taxonomy, so set its epoch to 1, for lack

      // of a better value.

      indexEpoch = epochStr == null ? 1 : Long.parseLong(epochStr, 16);

    }

    if (openMode == OpenMode.CREATE) {

      ++indexEpoch;

    }

    FieldType ft = new FieldType(TextField.TYPE_NOT_STORED);

    ft.setOmitNorms(true);

    parentStreamField = new Field(Consts.FIELD_PAYLOADS, parentStream, ft);

    fullPathField = new StringField(Consts.FULL, "", Field.Store.YES);

    nextID = indexWriter.maxDoc();

    if (cache == null) {

      cache = defaultTaxonomyWriterCache();

    }

    this.cache = cache;

    if (nextID == 0) {

      cacheIsComplete = true;

      // Make sure that the taxonomy always contain the root category

      // with category id 0.

      addCategory(new FacetLabel());

    } else {

      // There are some categories on the disk, which we have not yet

      // read into the cache, and therefore the cache is incomplete.

      // We choose not to read all the categories into the cache now,

      // to avoid terrible performance when a taxonomy index is opened

      // to add just a single category. We will do it later, after we

      // notice a few cache misses.

      cacheIsComplete = false;

    }
}

由上述代码可知，DirectoryTaxonomyWriter先打开一个IndexWriter,在确保indexWriter打开和locked的前提下，读取directory对应的segments中需要提交的内容，如果读取到的内容为空，说明是上次的内容，设置indexEpoch为1，接着对cache进行设置；判断directory中是否还包含有document，如果有设置cacheIsComplete为false,反之为true;

lucene搜索之facet查询原理和facet查询实例——TODO的更多相关文章

sql查询原理和Select执行顺序
一 sql语句的执行步骤 1)语法分析,分析语句的语法是否符合规范,衡量语句中各表达式的意义. 2) 语义分析,检查语句中涉及的所有数据库对象是否存在,且用户有相应的权限. 3)视图转换,将涉及视图的 ...
Lucene系列六：Lucene搜索详解（Lucene搜索流程详解、搜索核心API详解、基本查询详解、QueryParser详解）
一.搜索流程详解 1. 先看一下Lucene的架构图由图可知搜索的过程如下: 用户输入搜索的关键字.对关键字进行分词.根据分词结果去索引库里面找到对应的文章id.根据文章id找到对应的文章 2. L ...
Lucene搜索方式大合集
package junit; import java.io.File; import java.io.IOException; import java.text.ParseException; imp ...
lucene 搜索demo
package com.ljq.utils; import java.io.File; import java.util.ArrayList; import java.util.List; impor ...
Lucene学习笔记：五，Lucene搜索过程解析
一.Lucene搜索过程总论搜索的过程总的来说就是将词典及倒排表信息从索引中读出来,根据用户输入的查询语句合并倒排表,得到结果文档集并对文档进行打分的过程. 其可用如下图示: 总共包括以下几个过程: ...
Lucene学习总结之七：Lucene搜索过程解析
一.Lucene搜索过程总论搜索的过程总的来说就是将词典及倒排表信息从索引中读出来,根据用户输入的查询语句合并倒排表,得到结果文档集并对文档进行打分的过程. 其可用如下图示: 总共包括以下几个过程: ...
Lucene核心--构建Lucene搜索(上篇，理论篇)
2.1构建Lucene搜索 2.1.1 Lucene内容模型一个文档(document)就是Lucene建立索引和搜索的原子单元,它由一个或者多个字段(field)组成,字段才是Lucene的真实内 ...
Mybatis插件原理和PageHelper结合实战分页插件（七）
今天和大家分享下mybatis的一个分页插件PageHelper,在讲解PageHelper之前我们需要先了解下mybatis的插件原理.PageHelper 的官方网站:https://github ...
（四）Lucene——搜索和相关度排序
1. 搜索 1.1 创建查询对象的方式通过Query子类来创建查询对象 Query子类常用的有:TermQuery.NumericRangeQuery.BooleanQuery 特点:不能输入luc ...

随机推荐

我的Android进阶之旅------>android中service的onStartCommand()方法中intent为null的问题
今天在维护公司的一个APP的时候,突然爆了空指针异常, Caused by: java.lang.NullPointerException: Attempt to invoke virtual met ...
MySQL 第三天
回顾字段类型(列类型): 数值型, 时间日期型和字符串类型数值型: 整型和小数型(浮点型和定点型) 时间日期型: datetime, date,time,timestamp, ye ...
C语言中的const,free使用方法具体解释
注意:C语言中的const和C++中的const是有区别的,并且在使用VS编译測试的时候. 假设是C的话.请一定要建立一个后缀为C的文件.不要是CPP的文件. 由于.两个编译器会有区别的. 一.C语言 ...
PhoneGap 兼容IOS上移20px(包括启动页，拍照)
引自:http://stackoverflow.com/questions/19209781/ios-7-status-bar-with-phonegap 情景:在ios7下PhoneGap app会 ...
剑指offer 面试32题
面试32题: 题目:从上到下打印二叉树题:不分行从上到下打印二叉树解题代码: # -*- coding:utf-8 -*- # class TreeNode: # def __init__(sel ...
【转】JAVA学习笔记----PL/SQL最差实践
1. 超长的PL/SQL代码影响:可维护性,性能症状: 在复杂的企业应用中,存在动辄成百上千行的存储过程或上万行的包.为什么是最差: 太长的PL/SQL代码不利于阅读,第三方工 ...
classmethod
描述 classmethod 修饰符对应的函数不需要实例化,不需要 self 参数,但第一个参数需要是表示自身类的 cls 参数,可以来调用类的属性,类的方法,实例化对象等. 语法 classmeth ...
MySQL数据库（2）_MySQL数据库和数据库表操作语句
一.关于数据库操作的sql语句 -- .创建数据库(在磁盘上创建一个对应的文件夹) create database [if not exists] db_name [character set xxx ...
Apache Shiro:【1】Shiro基础及Web集成
Apache Shiro:[1]Shiro基础及Web集成 Apache Shiro是什么 Apache Shiro是一个强大且易于使用的Java安全框架,提供了认证.授权.加密.会话管理,与spri ...
gstreamer-tips-picture-in-picture-compositing
http://www.oz9aec.net/index.php/gstreamer/347-more-gstreamer-tips-picture-in-picture-compositing htt ...

lucene搜索之facet查询原理和facet查询实例——TODO

转自：http://www.lai18.com/content/7084969.html

Facet说明

lucene搜索之facet查询原理和facet查询实例——TODO的更多相关文章

随机推荐

热门专题