Lucene根据字段进行自定义搜索扩展

最近需要对公司的产品搜索功能做一步改动，搜索到的结果首先按照是否有库存进行排序，然后再按照销量。由于库存量也是一个整数，如果直接按照库存量进行倒序排序的话，是不符合要求的，Lucene也没有支持我们这种特殊的业务需求，但是可以通过扩展的方式进行改写。

参考文档：http://blog.csdn.net/cctcc/article/details/45672247

public class EmptyStockComparatorSource extends FieldComparatorSource {

    @Override

    public FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed)

            throws IOException {

        return new LongComparator(numHits, fieldname, 0L);

    }

    public static class LongComparator extends FieldComparator.NumericComparator<Long> {

        private final long[] values;

        private long bottom;

        private long topValue;

        /**

         * Creates a new comparator based on {@link Long#compare} for {@code numHits}.

         * When a document has no value for the field, {@code missingValue} is substituted.

         */

        public LongComparator(int numHits, String field, Long missingValue) {

            super(field, missingValue);

            values = new long[numHits];

        }

        @Override

        protected void doSetNextReader(LeafReaderContext context) throws IOException {

            currentReaderValues = getNumericDocValues(context, field);

            if (missingValue != null) {

                docsWithField = getDocsWithValue(context, field);

                // optimization to remove unneeded checks on the bit interface:

                if (docsWithField instanceof Bits.MatchAllBits) {

                    docsWithField = null;

                }

            } else {

                docsWithField = null;

            }

        }

        @Override

        public int compare(int slot1, int slot2) {

            return Long.compare(values[slot1], values[slot2]);

        }

        @Override

        public int compareBottom(int doc) {

            // TODO: there are sneaky non-branch ways to compute

            // -1/+1/0 sign

            long v2 = currentReaderValues.get(doc);

            // Test for v2 == 0 to save Bits.get method call for

            // the common case (doc has value and value is non-zero):

            if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {

                v2 = missingValue;

            }

            return Long.compare(bottom, v2);

        }

        @Override

        public void copy(int slot, int doc) {

            long v2 = currentReaderValues.get(doc);

            // Test for v2 == 0 to save Bits.get method call for

            // the common case (doc has value and value is non-zero):

            if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {

                v2 = missingValue;

            }

            values[slot] = v2 > 0L ? 1L : 0L;

        }

        @Override

        public void setBottom(final int bottom) {

            this.bottom = values[bottom];

        }

        @Override

        public void setTopValue(Long value) {

            topValue = value;

        }

        @Override

        public Long value(int slot) {

            return Long.valueOf(values[slot]) ;

        }

        @Override

        public int compareTop(int doc) {

            long docValue = currentReaderValues.get(doc);

            // Test for docValue == 0 to save Bits.get method call for

            // the common case (doc has value and value is non-zero):

            if (docsWithField != null && docValue == 0 && !docsWithField.get(doc)) {

                docValue = missingValue;

            }

            return Long.compare(topValue, docValue);

        }

    }

}

其中LongComparator直接从lucene源码中copy出来，只需要做些许修改即可，最主要的修改就是copy(int slot, int doc)方法，在复制比较值得过程中，将所有存在库存的值都视为1，否则视为0，这样排序的结果就是我们所期待的。

我们用到的测试用例：

Directory directory1 = FSDirectory.open(Paths.get(

                "/Users/xxx/develop/tools/solr-5.5.0/server/solr/product/data/index"));

        DirectoryReader directoryReader1 = DirectoryReader.open(directory1);

        IndexSearcher searcher1 = new IndexSearcher(directoryReader1);

        Sort sort1 = new Sort(new SortField("psfixstock", new EmptyStockComparatorSource(), true),

                new SortField("salesVolume", SortField.Type.INT, true));

        TopFieldDocs topDocs1 = searcher1.search(new TermQuery(new Term("gender_text", "女士")), 10, sort1);

        for (ScoreDoc scoreDoc : topDocs1.scoreDocs) {

            int doc = scoreDoc.doc;

            Document document = searcher1.doc(doc);

            System.out.println(String.format("docId=%s, psfixstock=%s, salesVolumn=%s", doc, document.get("psfixstock"), document.get("salesVolume")));

        }

在排序时，需要将其加入至Sort对象中，但执行的时候出现错误，显示docvalues的类型不正确：

Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'psfixstock' (expected=NUMERIC). Use UninvertingReader or index with docvalues.

    at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)

    at org.apache.lucene.index.DocValues.getNumeric(DocValues.java:227)

    at org.apache.lucene.search.FieldComparator$NumericComparator.getNumericDocValues(FieldComparator.java:167)

    at com.zp.solr.handler.component.EmptyStockComparatorSource$LongComparator.doSetNextReader(EmptyStockComparatorSource.java:36)

    at org.apache.lucene.search.SimpleFieldComparator.getLeafComparator(SimpleFieldComparator.java:36)

    at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)

    at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:164)

    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:812)

    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)

    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)

    at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:729)

    at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:671)

    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)

    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:627)

    at com.zp.solr.handler.component.EmptyStockSortingTest.main(EmptyStockSortingTest.java:57)

经过一番查找，找到原因，参考文档：http://qindongliang.iteye.com/blog/2297280，我们搜索所使用到的字段没有设置对应的docType。如果在solr中，需要进行手动排序的字段，设置docValues=“true”，并进行重新索引（使用full-import方式）：

<field name="psfixstock" type="tint" indexed="true" stored="true" multiValued="false" docValues="true" />

必须要重新建立索引才可以正常运行。注意，此时Solr与Elastic Search采取的方案有所不同，Solr默认docValues=false，而ES则相反，使用Doc索引方式会对性能产生一定的影响，要谨慎使用。

对于lucene中，需要将添加document中增加数字类型Field：NumericDocValuesField，否则出现上面的错误，

document.add(new NumericDocValuesField("stock", stock));

最终的排序结果已经按照我们的需要进行了：

docId=2629, psfixstock=98391, salesVolumn=4685

docId=305, psfixstock=991, salesVolumn=14

docId=16762, psfixstock=3, salesVolumn=12

docId=22350, psfixstock=993, salesVolumn=10

docId=29021, psfixstock=11076, salesVolumn=10

docId=3635, psfixstock=61, salesVolumn=6

docId=4111, psfixstock=1104, salesVolumn=5

docId=10608, psfixstock=4395, salesVolumn=5

docId=4874, psfixstock=4975, salesVolumn=4

docId=4911, psfixstock=6, salesVolumn=4

docId=15071, psfixstock=998, salesVolumn=4

docId=4837, psfixstock=9, salesVolumn=3

docId=4860, psfixstock=1002, salesVolumn=3

docId=3749, psfixstock=2240, salesVolumn=2

docId=4109, psfixstock=1493, salesVolumn=2

docId=15068, psfixstock=1000, salesVolumn=2

docId=25901, psfixstock=11110, salesVolumn=2

docId=3688, psfixstock=21, salesVolumn=1

docId=4912, psfixstock=17, salesVolumn=1

docId=5035, psfixstock=2, salesVolumn=1

docId=11835, psfixstock=8, salesVolumn=1

docId=12044, psfixstock=1, salesVolumn=1

docId=13508, psfixstock=2, salesVolumn=1

docId=20019, psfixstock=1, salesVolumn=1

docId=20884, psfixstock=100000, salesVolumn=1

docId=22620, psfixstock=1, salesVolumn=1

docId=24128, psfixstock=1, salesVolumn=1

docId=0, psfixstock=2, salesVolumn=0

docId=9, psfixstock=1, salesVolumn=0

docId=11, psfixstock=4, salesVolumn=0

docId=15, psfixstock=3, salesVolumn=0

docId=20, psfixstock=4, salesVolumn=0

docId=23, psfixstock=2, salesVolumn=0

docId=24, psfixstock=5, salesVolumn=0

docId=25, psfixstock=7, salesVolumn=0

docId=35, psfixstock=2, salesVolumn=0

docId=53, psfixstock=2, salesVolumn=0

Lucene根据字段进行自定义搜索扩展的更多相关文章

搜索引擎系列 ---lucene简介创建索引和搜索初步
一.什么是Lucene? Lucene最初是由Doug Cutting开发的,2000年3月,发布第一个版本,是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎 :Lucene得名于Doug妻子 ...
自定义和扩展 SharePoint 2010 Server 功能区
了解构成 SharePoint 2010 服务器功能区的组件以及如何通过演练两个功能区自定义项方案来自定义功能区. 适用范围: Microsoft SharePoint Foundation 2010 ...
lucene简介创建索引和搜索初步
lucene简介创建索引和搜索初步一.什么是Lucene? Lucene最初是由Doug Cutting开发的,2000年3月,发布第一个版本,是一个全文检索引擎的架构,提供了完整的查询引擎和索引 ...
Angular4.x 自定义搜索组件
Angular4 随笔(三) ——自定义搜索组件 1.简介本组件主要是实现了搜索功能,主要是通过父子组件传值实现. 基本逻辑: 1.创建一个搜索组件,如:ng g component searc ...
Elasticsearch7.X 入门学习第七课笔记-----Mapping多字段与自定义Analyzer
原文:Elasticsearch7.X 入门学习第七课笔记-----Mapping多字段与自定义Analyzer 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处 ...
paip.lucene 4.3 中文语义搜索最佳实践
paip.lucene 4.3 中文语义搜索最佳实践首先一个问题是要不要使用lucene 自带的分词器...我觉得最好不使用哪自带的分词器.效果还凑火,就是不好控制... 先使用ik,ict,mms ...
Qt之自定义搜索框
简述关于搜索框,大家都经常接触.例如:浏览器搜索.Windows资源管理器搜索等. 当然,这些对于Qt实现来说毫无压力,只要思路清晰,分分钟搞定. 方案一:调用QLineEdit现有接口 void ...
【Qt】Qt之自定义搜索框【转】
简述关于搜索框,大家都经常接触.例如:浏览器搜索.Windows资源管理器搜索等. 当然,这些对于Qt实现来说毫无压力,只要思路清晰,分分钟搞定. 简述效果细节分析 Coding 源码下载效果 ...
Android自定义View——自定义搜索框（SearchView）
Android自定义View——自定义搜索框(SearchView) http://www.apkbus.com/android-142064-1-1.html

随机推荐

UVa 11609 组队（快速幂）
https://vjudge.net/problem/UVA-11609 题意: 有n个人,选一个或多个人参加比赛,其中一名当队长,有多少种方案?如果参赛者完全相同,但队长不同,算作不同的方案. 思路 ...
mis权限系统
在mis中开发,主要目的是有一个统一的权限管理(即r360.right表),以及一个统一的系统和界面供后台配置管理 1.数据库准备工作: mis后台涉及表: right表是权限操作表,role_rig ...
Qt532_QWebView做成DLL供VC/Delphi使用_Bug
Qt5.3.2 vs2010 OpenGL ,VC6.0,Delphi7 1.自己继承类QWebView,制作成DLL 供 VC6/Delphi7 使用 2.测试下来,DLL供VC6使用: 加载&q ...
Eclipse 常用快捷键和使用技巧
1.查看快捷键定义的地方 Window->Preferences->General->Keys. 2.更改启动页在AndroidManifest.xml第一个activity标签项 ...
『转』Dr.Web Security Space 8 – 免费3个月
简短的测试五个问题,任意回答问题,都将获得Dr.Web Security Suite 3个月免费许可证以及大蜘蛛企业安全套件2个月来保护整个公司!活动地址:https://www.drweb.com/ ...
解决Jenkins 中无法展示 HTML 样式的问题
问题将本地的jmeter脚本部署到Jenkins上时,可以运行成功也可以在本地生成正确的HTML.但在Jenkins中查看HTML report时内容显示不出来. because the docum ...
nginx详细应用
一.nginx的基本功能基本Http服务,可以作为Http代理服务器和反向代理服务器,支持通过缓存加速访问,可以完成简单的负载均衡和容错,支持包过滤功能,支持SSL 高级Http服务,可以进行自定义 ...
ubuntu:NVIDIA设置性能模式，以降低CPU使用、温度
NVIDIA设置性能模式,以降低CPU使用.温度 ubuntu安装完NVIDIA显卡驱动后终端输入 nvidia-settings 选择OpenGL Settings->Image Setti ...
ajax向后台请求数据，后台接收到数据并进行了处理，但前台就是调用error方法
如果你的前台页面书写正确的情况下,并且运行情况和本文题目类似,那不妨试试这个: 在ajax方法中加上:async:false,让ajax同步执行. 因为ajax默认是异步的,至于为什么会不执行succ ...
jdbc-DAO的实现
什么是 DAO DAO(Data Access Object)是一个数据访问接口,夹在业务逻辑与数据库资源中间. 在核心J2EE模式中是这样介绍DAO模式的:为了建立一个健壮的J2EE应用,应该将所有 ...

Lucene根据字段进行自定义搜索扩展

Lucene根据字段进行自定义搜索扩展的更多相关文章

随机推荐

热门专题