Sometime back, I described how I built (among other things) a custom Solr QParser plugin to handle Payload Term Queries. Looking back on this recently, I realized how lame it was - all it could handle were single Payload Term Queries, and a one level deep AND and OR combinations of these queries. More to the point, I discovered that I had to support queries of this form:

1
+concepts:123456 +(concepts:234567 concepts:345678 ...)

The original parsing code simply split up the query by whitespace, then by colon, and depending on whether the key was preceded by a "+" sign, either added it to the Boolean Query as an Occur.MUST or Occur.SHOULD. Obviously, this would not be able to parse the form of the query above.

Coencidentally, a few days ago, I was hunting around for something completely different on my laptop, and I came across the QueryParser Lucene contrib module that replaces the original Lucene JavaCC based QueryParser with a nice little framework that splits the query parsing into 3 phases - syntax parsing, query processing and query building. It has been available since Lucene 2.9.0, and on the version I am using (Lucene 2.9.3/Solr 1.4.1) both QueryParser implementations are supported.

In my case, my Payload Query syntax is identical to the Term Query syntax, so all I really needed to do was to return a PayloadTermQuery instead of a TermQuery in the query building phase. So all I needed to do to build a robust Payload QueryParser was to just implement a custom QueryBuilder and call it from within this framework.

There is not much documentation available on how to use the framework though, apart from the Javadocs, and the advice in there is to take a look at the StandardQueryParser and use that as a template to design your own. So thats what I did. I ended up building a few more classes in order to integrate it into my custom QParser plugin, but it was really quite simple.

Here is the updated code for my QParser plugin. Apart from this code change, all I had to do was add the lucene-queryparser-2.9.3.jar to the Solr classpath. There is no change in its configuration and the associated Solr request handler I used it from - both these are described in my previous post I referred to above.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
// $Source: src/java/org/apache/solr/search/ext/PayloadQParserPlugin.java
package org.apache.solr.search.ext; import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.core.QueryNodeException;
import org.apache.lucene.queryParser.core.QueryParserHelper;
import org.apache.lucene.queryParser.core.nodes.FieldQueryNode;
import org.apache.lucene.queryParser.core.nodes.QueryNode;
import org.apache.lucene.queryParser.standard.builders.StandardQueryBuilder;
import org.apache.lucene.queryParser.standard.builders.StandardQueryTreeBuilder;
import org.apache.lucene.queryParser.standard.config.StandardQueryConfigHandler;
import org.apache.lucene.queryParser.standard.parser.StandardSyntaxParser;
import org.apache.lucene.queryParser.standard.processors.StandardQueryNodeProcessorPipeline;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.payloads.AveragePayloadFunction;
import org.apache.lucene.search.payloads.PayloadTermQuery;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin; /**
* Parser plugin to parse payload queries.
*/
public class PayloadQParserPlugin extends QParserPlugin { @Override
public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
return new PayloadQParser(qstr, localParams, params, req);
} public void init(NamedList args) {
// do nothing
}
} class PayloadQParser extends QParser { public PayloadQParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
} @Override
public Query parse() throws ParseException {
PayloadQueryParser parser = new PayloadQueryParser();
try {
Query q = (Query) parser.parse(qstr, "concepts");
return q;
} catch (QueryNodeException e) {
throw new ParseException(e.getMessage());
}
}
} class PayloadQueryParser extends QueryParserHelper { public PayloadQueryParser() {
super(new StandardQueryConfigHandler(), new StandardSyntaxParser(),
new StandardQueryNodeProcessorPipeline(null),
new PayloadQueryTreeBuilder());
}
} class PayloadQueryTreeBuilder extends StandardQueryTreeBuilder { public PayloadQueryTreeBuilder() {
super();
setBuilder(FieldQueryNode.class, new PayloadQueryNodeBuilder());
}
} class PayloadQueryNodeBuilder implements StandardQueryBuilder { @Override
public PayloadTermQuery build(QueryNode queryNode) throws QueryNodeException {
FieldQueryNode node = (FieldQueryNode) queryNode;
return new PayloadTermQuery(
new Term(node.getFieldAsString(), node.getTextAsString()),
new AveragePayloadFunction(), false);
}
}

As you can see, in my QParser.parse() method, I instantiate PayloadQueryParser, which is a subclass of QueryParserHelper. I reuse the same constructor code as StandardQueryParser (another subclass of QueryParserHelper and my template), except I pass in a custom QueryBuilder - the PayloadQueryTreeBuilder. The PayloadQueryTreeBuilder subclasses StandardQueryTreeBuilder, except it redefines what builder to use for FieldQueryNode types - the StandardQueryTreeBuilder is sort of a factory and delegates to the appropriate QueryBuilder depending on the type of the node. Finally, the PayloadQueryNodeBuilder implements the StandardQueryBuilder (similar to the FieldQueryNodeBuilder), and redefines the build() method to produce a PayloadTermQuery instead of a TermQuery as FieldQueryNodeBuilder does.

And thats pretty much it. I tested this by hitting the /concept-search URL and verified that the queries are correctly parsed and returned by printing the queries in the log.

Hopefully this post was useful, if for nothing else that people find out about the new QueryParser framework and begin to use it. The customization I did here is pretty trivial in terms of code, but it saved me a lot of work.

http://sujitpal.blogspot.com/2011/03/using-lucenes-new-queryparser-framework.html

Using Lucene's new QueryParser framework in Solr的更多相关文章

  1. Lucene自定义扩展QueryParser

    Lucene版本:4.10.2 在使用lucene的时候,不可避免的需要扩展lucene的相关功能来实现业务的需要,比如搜索时,需要在满足一个特定范围内的document进行搜索,如年龄在20和30岁 ...

  2. solr调用lucene底层实现倒排索引源码解析

    1.什么是Lucene? 作为一个开放源代码项目,Lucene从问世之后,引发了开放源代码社群的巨大反响,程序员们不仅使用它构建具体的全文检索应用,而且将之集成到各种系统软件中去,以及构建Web应用, ...

  3. Lucene&Solr框架之第二篇

    2.1.开发环境准备 2.1.1.数据库jar包 我们这里可以尝试着从数据库中采集数据,因此需要连接数据库,我们一直用MySQL,所以这里需要MySQL的jar包 2.1.2.MyBatis的jar包 ...

  4. apache lucene solr 官网历史版本下载地址

    官网上一般只提供最新版本的下载,下面两个链接为所有历史版本的下载地址: lucene地址:archive.apache.org/dist/lucene/java/ solr地址:archive.apa ...

  5. Lucene/Solr开发经验

    1.开篇语2.概述3.渊源4.初识Solr5.Solr的安装6.Solr分词顺序7.Solr中文应用的一个实例8.Solr的检索运算符 [开篇语]按照惯例应该写一篇技术文章了,这次结合Lucene/S ...

  6. 在Lucene或Solr中实现高亮的策略

    一:功能背景 近期要做个高亮的搜索需求,曾经也搞过.所以没啥难度.仅仅只是原来用的是Lucene,如今要换成Solr而已,在Lucene4.x的时候,散仙在曾经的文章中也分析过怎样在搜索的时候实现高亮 ...

  7. lucene和solr

    我们为什么要用solr呢? 1.solr已经将整个索引操作功能封装好了的搜索引擎系统(企业级搜索引擎产品) 2.solr可以部署到单独的服务器上(WEB服务),它可以提供服务,我们的业务系统就只要发送 ...

  8. Elasticsearch、Solr、Lucene、Hermes区别

    Elasticsearch简介 Elasticsearch是一个实时分布式搜索和分析引擎.它让你以前所未有的速度处理大数据成为可能.它用于全文搜索.结构化搜索.分析以及将这三者混合使用:维基百科使用E ...

  9. Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载

    原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...

随机推荐

  1. 脱壳系列(五) - MEW 壳

    先用 PEiD 看一下 MEW 11 1.2 的壳 用 OD 载入程序 按 F8 进行跳转 往下拉 找到这个 retn 指令,并下断点 然后 F9 运行 停在该断点处后再按 F8 右键 -> 分 ...

  2. Julia - 复数

    全局变量 im 即复数 i ,为复数的虚数单位,表示 -1 的正平方根 Julia 允许数值作为代数系数,这也适用于复数 julia> 1 + 2im 1 + 2im 复数的运算 julia&g ...

  3. ceph---luminous 块存储(RBD)搭建

    1. 创建pool 创建存储池: ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-n ...

  4. Resetting the Root Password Using rd.break for RHEL7

    Start the system and, on the GRUB 2 boot screen, press the e key for edit. Remove the rhgb and quiet ...

  5. django -- url (模版语言 {% url 'test1' param1=5 param2=6 %})

    如果想让form表单提交的url是类似 action="/index-5-6.html" 这样的,可以在html模版语言中使用{% url 'test1' param1=5 par ...

  6. JavaScript 修改元素值

    document.getElementById('yybz').value=jsdata.toLocaleString(); document.getElementById('yybz').inner ...

  7. C++提高编译与链接速度的资料

    1,https://blog.csdn.net/lihao21/article/details/47610309 2,https://www.zhihu.com/question/37330979 3 ...

  8. STL - Vector迭代器简单应用之计算元素和

    Description 用vector向量容器装入10个整数,然后,使用迭代器iterator和accumulate算法统计出这10个元素的和 Solution #include "stda ...

  9. JSTL中EL表达式无法直接取size的处理

    jsp中使用${list.size }会编译成list.getSize()方法,并不能获取list的长度,因为程序回去找List对象中的getSize()方法,所以只能想别的办法, 一种方法是在后台程 ...

  10. avalon最佳实践

    最近从angular的QQ群与新浪微博拉了许多人来用我的avalon,成为第一批登上方舟,脱离DOM苦海的人.短短三个月内,5群的朋友也搞出几个切实实行的案例了.为应对粉丝们高益高涨的热情,遂放出此文 ...