lucene源码分析(5)lucence-group

1. 普通查询的用法

org.apache.lucene.search.IndexSearcher

public void search(Query query, Collector results)

其中

Collector定义

/**

 * <p>Expert: Collectors are primarily meant to be used to

 * gather raw results from a search, and implement sorting

 * or custom result filtering, collation, etc. </p>

 *

 * <p>Lucene's core collectors are derived from {@link Collector}

 * and {@link SimpleCollector}. Likely your application can

 * use one of these classes, or subclass {@link TopDocsCollector},

 * instead of implementing Collector directly:

 *

 * <ul>

 *

 *   <li>{@link TopDocsCollector} is an abstract base class

 *   that assumes you will retrieve the top N docs,

 *   according to some criteria, after collection is

 *   done.  </li>

 *

 *   <li>{@link TopScoreDocCollector} is a concrete subclass

 *   {@link TopDocsCollector} and sorts according to score +

 *   docID.  This is used internally by the {@link

 *   IndexSearcher} search methods that do not take an

 *   explicit {@link Sort}. It is likely the most frequently

 *   used collector.</li>

 *

 *   <li>{@link TopFieldCollector} subclasses {@link

 *   TopDocsCollector} and sorts according to a specified

 *   {@link Sort} object (sort by field).  This is used

 *   internally by the {@link IndexSearcher} search methods

 *   that take an explicit {@link Sort}.

 *

 *   <li>{@link TimeLimitingCollector}, which wraps any other

 *   Collector and aborts the search if it's taken too much

 *   time.</li>

 *

 *   <li>{@link PositiveScoresOnlyCollector} wraps any other

 *   Collector and prevents collection of hits whose score

 *   is &lt;= 0.0</li>

 *

 * </ul>

 *

 * @lucene.experimental

 */

Collector的层次结构

2 lucene-group

提供了分组查询GroupingSearch，对应相应的collector

3.实例：

public Map<String, Integer> groupBy(Query query, String field, int topCount) {

          Map<String, Integer> map = new HashMap<String, Integer>();

          long begin = System.currentTimeMillis();

          int topNGroups = topCount;

          int groupOffset = 0;

          int maxDocsPerGroup = 100;

          int withinGroupOffset = 0;

          try {

           FirstPassGroupingCollector c1 = new FirstPassGroupingCollector(field, Sort.RELEVANCE, topNGroups);

           boolean cacheScores = true;

           double maxCacheRAMMB = 4.0;

           CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);

           indexSearcher.search(query, cachedCollector);

           Collection<SearchGroup<String>> topGroups = c1.getTopGroups(groupOffset, true);

           if (topGroups == null) {

            return null;

           }

           SecondPassGroupingCollector c2 = new SecondPassGroupingCollector(field, topGroups, Sort.RELEVANCE, Sort.RELEVANCE, maxDocsPerGroup, true, true, true);

           if (cachedCollector.isCached()) {

            // Cache fit within maxCacheRAMMB, so we can replay it:

            cachedCollector.replay(c2);

           } else {

               // Cache was too large; must re-execute query:

            indexSearcher.search(query, c2);

           }

           TopGroups<String> tg = c2.getTopGroups(withinGroupOffset);

           GroupDocs<String>[] gds = tg.groups;

           for(GroupDocs<String> gd : gds) {

            map.put(gd.groupValue, gd.totalHits);

           }

          } catch (IOException e) {

           e.printStackTrace();

          }

          long end = System.currentTimeMillis();

          System.out.println("group by time :" + (end - begin) + "ms");

          return map;

        }

几个参数说明：

groupField: 分组域
groupSort: 分组排序
topNGroups: 最大分组数
groupOffset: 分组分页用
withinGroupSort: 组内结果排序
maxDocsPerGroup: 每个分组的最多结果数
withinGroupOffset: 组内分页用

参考资料

https://blog.csdn.net/wyyl1/article/details/7388241

lucene源码分析(5)lucence-group的更多相关文章

Lucene 源码分析之倒排索引（三）
上文找到了 collect(-) 方法,其形参就是匹配的文档 Id,根据代码上下文,其中 doc 是由 iterator.nextDoc() 获得的,那 DefaultBulkScorer.itera ...
一个lucene源码分析的博客
ITpub上的一个lucene源码分析的博客,写的比较全面:http://blog.itpub.net/28624388/cid-93356-list-1/
lucene源码分析的一些资料
针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/d ...
Lucene 源码分析之倒排索引（一）
倒排索引是 Lucene 的核心数据结构,该系列文章将从源码层面(源码版本:Lucene-7.3.0)分析.该系列文章将以如下的思路展开. 什么是倒排索引? 如何定位 Lucene 中的倒排索引? 倒 ...
lucene源码分析(1)基本要素
1.源码包 core: Lucene core library analyzers-common: Analyzers for indexing content in different langua ...
Lucene 源码分析之倒排索引（二）
本文以及后面几篇文章将讲解如何定位 Lucene 中的倒排索引.内容很多,唯有静下心才能跟着思路遨游. 我们可以思考一下,哪个步骤与倒排索引有关,很容易想到检索文档一定是要查询倒排列表的,那么就从此处 ...
lucene源码分析(8)MergeScheduler
1.使用IndexWriter.java mergeScheduler.merge(this, MergeTrigger.EXPLICIT, newMergesFound); 2.定义MergeSch ...
lucene源码分析(7)Analyzer分析
1.Analyzer的使用 Analyzer使用在IndexWriter的构造方法 /** * Constructs a new IndexWriter per the settings given ...
lucene源码分析(6)Query分析
查询的入口 /** Lower-level search API. * * <p>{@link LeafCollector#collect(int)} is called for ever ...

随机推荐

Eclipse技巧
1 alt + / 提示 2 ctrl + shift + g 查找方法被谁调用 3 ctrl + t 查看某个类的继承关系 4 alt + 上/下移动当前行上或者下 5 ctrl + / 行注释 ...
Python学习-4.Python的模块加载（二）
1.部分函数加载 from SameFolder import printSameFolder printSameFolder() 该代码指从SameFolder.py中加载printSameFold ...
Apache Geode with Spark
在一些特定场景,例如streamingRDD需要和历史数据进行join从而获得一些profile信息,此时形成较小的新数据RDD和很大的历史RDD的join. Spark中直接join实际上效率不高: ...
python——回文函数（reversed）
回文数:正向排列与反向排列所得结果是相等的(即从左到右和从右到左的结果是相等的),例如:“123321”,“0000”等. reversed函数:反转一个序列对象,将其元素从后向前颠倒构建成一个新的迭 ...
[C#]C#时间日期操作
一.C# 日期格式 1. DateTime dt = DateTime.Now; 2. dt.ToString();//2005-11-5 13:21:25 3. dt.ToFileTime().To ...
HttpWebRequest 模拟浏览器访问网站
最近抓网页时报错: 要么返回 The remote server returned an error: (442)要么返回: 非法访问,您的行为已被WAF系统记录! 想了想,就当是人家加了抓网页的东西 ...
C# InDepth 第一章
深入理解C#第一部分,第一章介绍了C#开发得进化史. 1 从数据类型定义引入c#1到4中得改变 c#2:强类型集合(泛型) c#3:自动实现得属性和简化得初始化 c#4:命名实参 2 排序和过滤排序 ...
Office - Outlook
将邮件存到本地服务器容量有限,避免丢失和经常提示容量不足步骤在File->Account Settings->Account Settings下面在Data Files标签页新建一 ...
简单了解 iTextSharp实现HTML to PDF
查了下转PDF的各种框架发现大部分都是收费的. 发现一款免费的iTextSharp 就想玩一下只是简单做个HTML 转PDF 没有过深的探究. 首先我在项目中引入iTextSharp ...
Backbone学习笔记 - Collection及Router篇
Collection Collection可以看成是Model的集合.以下是一个集合的例子: var Song = Backbone.Model.extend({ defaults: { name: ...

lucene源码分析(5)lucence-group

lucene源码分析(5)lucence-group的更多相关文章

随机推荐

热门专题