I've been building some custom search components for SOLR lately, so wanted to share a couple of things I learned in the process. Most likely this is old hat to people who have been doing this for a while, but thought I'd share, just in case it benefits someone...

Passing State

In a previous post, I described a custom SOLR search handler returns layered search results for a given query term (and optional filters). As I went further, though, I realized that I needed to return information relating to facets and category clusters as well. Of course, I could have added this stuff into the handler itself, but splitting the logic across a chain of search components seemed to be more preferable, readability and reusability wise, so I went that route.

So the first step was to refactor my custom SearchHandler into a SearchComponent. Not much to do there, except to subclass SearchComponent instead of RequestHandlerBase and move the handleRequestBody(SolrQueryRequest,SolrQueryResponse) to a process(ResponseBuilder) method. The request and response objects are accessible from the ResponseBuilder as properties, ie, ResponseBuilder.req and ResponseBuilder.rsp. I then declared this component and an enclosing handler in solrconfig.xml, something like this:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  <!-- this used to be my search handler -->
<searchComponent name="component1"
class="org.apache.solr.handler.component.ext.MyComponent1">
<str name="prop1">value1</str>
<str name="prop2">value2</str>
</searchComponent>
<searchComponent name="component2"
class="org.apache.solr.handler.component.ext.MyComponent2">
<lst name="facets">
<str name="prop1">1</str>
<str name="prop2">2</str>
</lst>
</searchComponent>
<requestHandler name="/mysearch2"
class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="fl">*,score,id</str>
<str name="wt">xml</str>
</lst>
<arr name="components">
<str>component1</str>
<str>component2</str>
<!-- ... more components as needed ... -->
</arr>
</requestHandler>

I've also added a second component to the chain above (just so I don't have to show this snippet again later), hope its not too confusing. Obviously there can be multiple components before and after my search handler turned search component, but for the purposes of this discussion, I'll keep things simple and just concentrate on this one other component and pretend that it has multiple unique (and pertinent) requirements.

Now, assume that the second component needed data that was already available, or can be easily generated by component1. Its actually true in my case, since I needed a BitSet of document ids in the search results in my second component, which I could easily get by collecting them while looping through the SolrDocumentList of results in my first component. So it seemed kind of wasteful to compute this again. So I updated this snippet of code in component1's process() method (what used to be my handleRequestBody() method):

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
  public void process(ResponseBuilder rb) throws IOException {
...
// build and write response
...
OpenBitSet bits = new OpenBitSet(searcher.maxDoc());
List<SolrDocument> slice = new ArrayList<SolrDocument>();
for (Iterator<SolrDocument> it = results.iterator(); it.hasNext(); ) {
SolrDocument sdoc = it.next();
...
bits.set(Long.valueOf((Integer) sdoc.get("id")));
if (numFound >= start && numFound < start + rows) {
slice.add(sdoc);
}
numFound++;
}
...
rsp.add("response", results);
rsp.add("_bits", bits);
}

In my next component (component2), I simply grab the OpenBitSet data structure by name from the NamedList, use them to generate the result for this component, stick the result back into the response, and discard the temporary data. The last is so that the data does not appear on the response XML (for both aesthetic and performance reasons).

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  public void process(ResponseBuilder rb) throws IOException {
Map<String,Object> cres = new HashMap<String,Object>();
NamedList nl = rb.rsp.getValues();
OpenBitSet bits = (OpenBitSet) nl.get("_bits");
if (bits == null) {
logger.warn("Component 1 must write _bits into response");
rb.rsp.add(COMPONENT_NAME, cres);
return;
}
// do something with bits and generate component response
doSomething(bits, cres);
// stick the result into the response and delete temp data
rb.rsp.add("component2_result", cres);
rb.rsp.getValues().remove("_bits");
}

Before I did this, I investigated if I could subclass the XmlResponseWriter to ignore NamedLists with "hidden" names (ie names prefixed with underscore), but the XmlResponseWriter calls XMLWriter which does the actual XML generation, and XMLWriter is final (at least in SOLR 1.4.1). Good thing too, forced me to look for and find a simpler solution :-).

So there you have it - a simple way to pass data between components in a SOLR Search RequestHandler. Note that it does mean that component2 is always dependent on component1 (or some other component that produces the same data) upstream to it, so these components are no longer truly reusable pieces of code. But this can be useful if you really need it and you document the requirement (or complain about it if not met, as I've done here).

Reacting to a COMMIT

The second thing I needed to do in component2 was to give it some reference data that it would need to compute its results. The reference data is generated from the contents of the index, and the generation is fairly heavyweight, so you don't want to do this on every request.

Now one of the cool things about SOLR is its built-in incremental indexing feature (one of the main reasons we considered using SOLR in the first place), so you can POST data to a running SOLR instance followed by a COMMIT, and voila: your searcher re-opens with the new data.

Of course, this also means that if we want to provide accurate information, the reference data should be regenerated whenever the searcher is reopened. The way I went about doing this is mostly derived from how the SpellCheckerComponent does it, in order to regenerate its dictionaries -- by hooking into the SOLR event framework.

To do this, my component2 implements SolrCoreAware in addition to extending SearchComponent. This requires me to implement the inform(SolrCore) method, which is invoked by SOLR after the init(NamedList) but before prepare(ResponseBuilder) and process(ResponseBuilder). In the inform(SolrCore) method, I register a listener for the firstSearcher and newSearcher events (described in more detail here).

I then build the inner listener class, which implements SolrEventListener, which requires me to provide implementations for newSearcher() and postCommit() methods. Since my listener is a query-side listener, I provide an empty implementation for postCommit(). The newSearcher() method contains the code to generate the reference sets. Here is the relevant snippet of code from the component.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
public class MyComponent2 extends SearchComponent implements SolrCoreAware {

  private RefData refdata; // this needs to be regenerated on COMMIT

  @Override
public void init(NamedList args) {
...
} @Override
public void inform(SolrCore core) {
listener = new MyComponent2Listener();
core.registerFirstSearcherListener(listener);
core.registerNewSearcherListener(listener);
} @Override
public void prepare(ResponseBuilder rb) throws IOException {
...
} @Override
public void process(ResponseBuilder rb) throws IOException {
...
// do something with refdata
...
} private class MyComponent2Listener implements SolrEventListener { @Override
public void init(NamedList args) { /* NOOP */ } @Override
public void newSearcher(SolrIndexSearcher newSearcher,
SolrIndexSearcher currentSearcher) {
RefData copy = new RefData();
copy = generateRefData(newSearcher);
refdata.clear();
refdata.addAll(copy);
} @Override
public void postCommit() { /* NOOP */ }
}
...
}

Notice that I have registered the listener to listen on both firstSearcher and newSearcher events. This way, it gets called on SOLR startup (reacting to a firstSearcher event), and again each time the searcher is reopened (reacting to a newSearcher event).

One other thing... since the generation of RefData takes some time, its best to have the listener's newSearcher method build a copy and then repopulate the refdata variable from the copy, that way the component continues to use the old data until the new one is available.

And thats pretty much it for today. Till next time.

http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html

Custom SOLR Search Components - 2 Dev Tricks的更多相关文章

  1. How to Reuse Old PCs for Solr Search Platform?

    家裡或公司的舊電腦不夠力? 效能慢到想砸爛它們? 朋友或同事有電腦要丟嗎? 我有一個廢物利用的方法, 我收集了四台舊電腦, 組了一個Fully Distributed Mode的Hadoop Clus ...

  2. solr search基础知识(控制符及其参数)

    1.^ 控制符 (1)查询串上用^ 搜索: 天后王菲,如果希望将王菲的相关度加大,用^控制符. 天后  王菲^10.5  结果就会将含有王菲的document权重加大分数提高,排序靠前,10.5为权重 ...

  3. Solr Cloud - SolrCloud

    关于 Solr Cloud Zookeeper 入门,介绍 原理 原封不动转自 http://wiki.apache.org/solr/SolrCloud/ ,文章的内存有些过时,但是了解原理. Th ...

  4. Solr: a custom Search RequestHandler

    As you know, I've been playing with Solr lately, trying to see how feasible it would be to customize ...

  5. Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载

    原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...

  6. Solr 6.7学习笔记(03)-- 样例配置文件 solrconfig.xml

    位于:${solr.home}\example\techproducts\solr\techproducts\conf\solrconfig.xml <?xml version="1. ...

  7. Solr基础知识二(导入数据)

    上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表 创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...

  8. Solr 02 - 最详细的solrconfig.xml配置文件解读

    目录 1 luceneMatchVersion - 指定Lucene版本 2 lib - 配置扩展jar包 3 dataDir - 索引数据路径 4 directoryFactory - 索引存储工厂 ...

  9. solr 5.3.1安装配置

    1.下载Solr5.3.1 http://mirror.bit.edu.cn/apache/lucene/solr/5.3.1/ wget http://mirror.bit.edu.cn/apach ...

随机推荐

  1. 关于ie6中绝对定位或浮动的div中既有向左float也有向右float时候如何让外层div自适应宽度的解决方案--

    一个详细的说明请见: http://www.cnblogs.com/yiyang/p/3265006.html 我的问题大约为,如下代码: <!DOCTYPE html PUBLIC " ...

  2. 来谈谈 WebAssembly 是个啥?为何说它会影响每一个 Web 开发者?

    作者:link 原文:What is WebAssembly and why it affects web developers! 你听说过WebAssembly吗?这是由Google, Micros ...

  3. 构建一个完整的DNS系统

    人心不同 各如其面 如之奈何 如之奈何 ——引子   我们的目标很明了——构建一个具有根的.私有的DNS(Domain Name System). 这里不会陈述太多关于DNS与BIND的基础知识,如果 ...

  4. 「小程序JAVA实战」Springboot版mybatis逆向生成工具(32)

    转自:https://idig8.com/2018/08/29/xiaochengxujavashizhanspringbootbanmybatisnixiangshengchenggongju32/ ...

  5. centos安装rvm报错@curl -L get.rvm.io | bash -s stable fails on cent OS

    It is a security feature introduced in the latest version of RVMhttps://github.com/wayneeseguin/rvm/ ...

  6. Linux环境下搭建python+selenium+webdriver环境

    1.下载并安装python,一般安装linux系统,自带有python,则python不用安装.要下载可以在官网上下载: 或者使用下面命令安装: sudo apt-get install python ...

  7. jxl导出excel的问题

    jxl导出excel,通常浏览器会提示excel导出完成情况及默认保存路径,或让用户自定义选择保存路径,要达到这种效果,有些要做下修改,如:response是jsp的内置对象,在jsp中使用时不用声明 ...

  8. Linux实战教学笔记43:squid代理与缓存实践(二)

    第6章 squid代理模式案例 6.1 squid传统正向代理生产使用案例 6.1.1 squid传统正向代理两种方案 (1)普通代理服务器 作为代理服务器,这是SQUID的最基本功能:通过在squi ...

  9. Nginx负载均衡高可用

    1.   Nginx负载均衡高可用 首先介绍一下Keepalived,它是一个高性能的服务器高可用或热备解决方案,Keepalived主要来防止服务器单点故障的发生问题,可以通过其与Nginx的配合实 ...

  10. rocketmq--push消费过程

    Rocketmq消费分为push和pull两种方式,push为被动消费类型,pull为主动消费类型,push方式最终还是会从broker中pull消息.不同于pull的是,push首先要注册消费监听器 ...