Apache SOLR and Carrot2集成

1.环境

下载软件

名称	地址
solr-integration-strategies-gh-pages.zip	https://github.com/carrot2/solr-integration-strategies
solr-4.7.1
apache-tomcat-6.0.39
carrot2-webapp-3.9.2.war

2.启动Solr

使用Jetty启动solr

F:\solr\solr-4.7.1\example>java -Dsolr.solr.home=../../carrot2-3.8.0-4.7.1/solr-home -jar start.jar

访问 http://localhost:8983/solr/#/

3.导入数据

使用solr-docs中post.jar向solr中导入数据

F:\solr\solr-integration-strategies-gh-pages\solr-docs>java -jar post.jar 20newsgroups

4.聚类集成到Solr中

carrot2提供的solrconfig.xml中的对搜索结果进行了配置

<lib dir="../../../solr-4.7.1/contrib/clustering/lib/" regex=".*\.jar" />

<lib dir="../../../solr-4.7.1/dist/" regex=".*solr-clustering-.*\.jar" />

<str name="name">default</str>

<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>

</lst>

</searchComponent>

配置

config1_1(search handler):

<str name="defType">edismax</str>

<str name="qf">title^1.5 content^1.0</str>

<str name="fl">*,score</str>

<str name="clustering.engine">default</str>

<str name="carrot.title">title</str>

<str name="carrot.snippet">content</str>

</lst>

<str>clustering</str>

</arr>

</requestHandler>

config1_2 (search handler returning a subset of fields)

<str name="defType">edismax</str>

<str name="qf">title^1.5 content^1.0</str>

<str name="fl">name,title,score</str>

<str name="clustering.engine">default</str>

<str name="carrot.title">title</str>

<str name="carrot.snippet">content</str>

</lst>

<str>clustering</str>

</arr>

</requestHandler>

config1_3(search handler returning contextual snippets)

<str name="defType">edismax</str>

<str name="qf">title^1.5 content^1.0</str>

<str name="fl">name,title,score</str>

<!—对内容字段高亮 -->

<str name="hl.fl">content</str>

<str name="clustering.engine">default</str>

<str name="carrot.title">title</str>

<str name="carrot.snippet">content</str>

</lst>

<str>clustering</str>

</arr>

</requestHandler>

config2_1(search handler clustering query-in-context snippets)

<str name="defType">edismax</str>

<str name="qf">title^1.5 content^1.0</str>

<str name="fl">name,title,score</str>

<str name="clustering.engine">default</str>

<str name="carrot.title">title</str>

<str name="carrot.snippet">content</str>

</lst>

<str>clustering</str>

</arr>

</requestHandler>

访问

http://localhost:8983/solr/example/config1_1?q=memory&wt=xml&indent=true

5.集成solr到carrot2的网站

准备Tomcat和carrot2-webapp-3.9.2

将carrot2-webapp-3.9.2.war解压,修改F:\solr\apache-tomcat-6.0.39\webapps\carrot2-webapp-3.9.2\WEB-INF\suites中suite-webapp.xml文件

<component-suite>

<source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"

attribute-sets-resource="source-solr-attributes.xml">

<title>Solr Search Engine</title>

<icon-path>icons/solr.png</icon-path>

<description>Solr document source queries an instance of Apache Solr search engine.</description>

<example-queries>

<example-query>test</example-query>

<example-query>solr</example-query>

</example-queries>

</source>

</sources>

</component-suite>

修改source-solr-attributes.xml文件

<attribute-sets default="overridden-attributes">

<attribute-set id="overridden-attributes">

<value-set>

<label>overridden-attributes</label>

</attribute>

</attribute>

</attribute>

</attribute>

</attribute>

</attribute>

</value-set>

</attribute-set>

</attribute-sets>

6.启动Tomcat

将carrot2-webapp-3.9.2放到tomcat的webapps下并启动tomcat

访问地址http://localhost:8080/carrot2-webapp-3.9.2

搜索memory

Carrot2的图形界面