solr 自聚类实现

　　参考官网：https://lucene.apache.org/solr/guide/6_6/result-clustering.html

　　最近用到solr自聚类的，先简单介绍如下：

　　1、配置文件

　　　　主要配置文件必须配置如下内容：

<lib dir="${solr.install.dir:../../..}/contrib/clustering/lib/" regex=".*\.jar" />

<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-clustering-\d.*\.jar" />

<searchComponent name="clustering" enable="${solr.clustering.enabled:true}" class="solr.clustering.ClusteringComponent">

    <!-- Lingo clustering algorithm -->

    <lst name="engine">

      <str name="name">lingo</str>

      <!--<bool name="optional">true</bool>-->

      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

      <str name="carrot.resourcesDir">clustering/carrot2</str>

    </lst>

    <!-- An example definition for the STC clustering algorithm. -->

    <lst name="engine">

      <str name="name">stc</str>

      <bool name="optional">true</bool>

      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>

      <str name="carrot.resourcesDir">clustering/carrot2</str>

    </lst>

    <lst name="engine">

      <str name="name">kmeans</str>

      <!--<bool name="optional">true</bool>-->

      <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>

      <str name="carrot.resourcesDir">clustering/carrot2</str>

    </lst>

  </searchComponent>

　　　　下面的配置文件根据自己的实际情况进行修改：

 <requestHandler name="/clustering"

                  startup="lazy"

                  class="solr.SearchHandler">

    <lst name="defaults">

      <bool name="clustering">true</bool>

      <bool name="clustering.results">true</bool>

      <!-- Field name with the logical "title" of a each document (optional) -->

      <str name="carrot.title">keyword</str>

      <!-- Logical field to physical field mapping. -->

      <str name="carrot.url">id</str>

      <!-- Field name with the logical "content" of a each document (optional) -->

      <str name="carrot.snippet">summary</str>

      <!-- Apply highlighter to the title/ content and use this for clustering. -->

      <bool name="carrot.produceSummary">true</bool>

      <!-- the maximum number of labels per cluster -->

      <!--<int name="carrot.numDescriptions">5</int>-->

      <!-- produce sub clusters -->

      <bool name="carrot.outputSubClusters">false</bool>

      <!-- Configure any other request handler parameters. We will cluster the

         top 100 search results so bump up the 'rows' parameter. -->

      <!--<str name="defType">edismax</str>

      <str name="qf">

        text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4

      </str>

      <str name="q.alt">*:*</str>-->

      <str name="defType">edismax</str>

      <!--<str name="qf">

        summary^0.5 category^1.2  id^10.0

      </str>-->

      <str name="qf">keyword^0.5 title^1.2  id^10.0</str>

      <str name="rows">100</str>

      <str name="fl">*,score</str>

    </lst>

    <!-- Append clustering at the end of the list of search components. -->

    <arr name="last-components">

      <str>clustering</str>

    </arr>

  </requestHandler>

　　　　managed-schema配置文件包含以下内：

 <fieldType name="text_ik" class="solr.TextField">

    <analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer"/>

    <analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer"/>

  </fieldType>

  <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>

  <field name="text" type="text_ik" multiValued="false" indexed="true" stored="true" termVectors ="true"/>

  <field name="title" type="text_ik" multiValued="false" indexed="true" stored="true" />

  <field name="snippet" type="text_ik" multiValued="false" indexed="true" stored="true" />

  <field name="keyword" type="text_ik" multiValued="false" indexed="true" stored="true" />

  <field name="category" type="text_ik" multiValued="false" indexed="true" stored="true" />

  <field name="summary" type="text_ik" multiValued="false" indexed="true" stored="true"/>

  <field name="path" type="string" multiValued="false" indexed="true" stored="true"/>

　　　　注意：text_ik对应的分词组件，要引用对应的jar包，具体参见：http://www.cnblogs.com/shaosks/p/8204615.html

　　2、测试索引的文件

　　　　启动solr服务，在浏览器输入：http://localhost:8983/solr/mycore/clustering?q=*:*&rows=10

　　　　结果如下：

　　3、java查询代码

import org.apache.solr.client.solrj.SolrClient;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.impl.HttpSolrClient;

import org.apache.solr.client.solrj.response.Cluster;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.ClusteringResponse;

import org.apache.solr.common.SolrDocument;

import java.io.IOException;

import java.util.List;

/**

 * @Author：sks

 * @Description：

 * @Date：Created in 9:41 2018/1/18

 * @Modified by：

 **/

public class AutoCluster {

    private static SolrClient solr;

    /**

     * @Author：sks

     * @Description：初始化solr客户端

     * @Date：

     */

    public static void Init(String urlString){

        solr = new HttpSolrClient.Builder(urlString).build();

    }

    public static void main(String[] args) throws SolrServerException,IOException {

        String urlString = "http://localhost:8983/solr/mycore";

        String path = "D:/work/Solr/ImportData";

        Init(urlString);

        getAutoClusterInfo();

        System.exit(0);

    }

    /**

     * @Author：sks

     * @Description：获取聚类数据

     * @Date：

     */

    private static void getAutoClusterInfo() throws SolrServerException,IOException {

        //使用这个对象做查询

        SolrQuery params = new SolrQuery();

        //查询所有数据

        params.set("qt", "/clustering");

        params.setQuery("*:*");

        params.setStart(0);

        params.setRows(30);

        QueryResponse queryResponse = solr.query(params);

        ClusteringResponse clr = queryResponse.getClusteringResponse();

        List<Cluster> list = clr.getClusters();

        //拿到聚类数据集合,返回查询结果

        String  txt = "";

        for(Cluster c :list){

            //类别标签

            List<String> lblist = c.getLabels();

            for(String lb:lblist){

                System.out.println(lb);

            }

            //聚类文档ID

            List<String> doclist  = c.getDocs();

            for(String doc:doclist){

                System.out.println("        " + doc);

            }

        }

    }

}

　　　　查询结果如下：

solr 自聚类实现的更多相关文章

Solr调研总结
http://wiki.apache.org/solr/ Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍solr的功能使用及相关注意事项;主要包括以下内容:环境 ...
solr教程，值得刚接触搜索开发人员一看
http://blog.csdn.net/awj3584/article/details/16963525 Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍sol ...
Solr总结
http://www.cnblogs.com/guozk/p/3498831.html Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍solr的功能使用及相关注 ...
【转载】solr教程，值得刚接触搜索开发人员一看
转载:http://blog.csdn.net/awj3584/article/details/16963525 Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍 ...
Solr调研总结(转)
Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍solr的功能使用及相关注意事项;主要包括以下内容:环境搭建及调试.两个核心配置文件介绍.中文分词器配置.维护索引 ...
Solr调研总结（很详细很全面）
Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍solr的功能使用及相关注意事项;主要包括以下内容:环境搭建及调试;两个核心配置文件介绍;维护索引;查询索引,和在 ...
solr入门教程-较详细
Solr调研总结开发类型全文检索相关开发 Solr版本 4.2 文件内容本文介绍solr的功能使用及相关注意事项;主要包括以下内容:环境搭建及调试;两个核心配置文件介绍;维护索引;查询索引,和在 ...
Lucene/Solr搜索引擎开发笔记 - 第1章 Solr安装与部署（Jetty篇）
一.为何开博客写<Lucene/Solr搜索引擎开发笔记> 本人毕业于2011年,2011-2014的三年时间里,在深圳前50强企业工作,从事工业控制领域的机器视觉方向,主要使用语言为C/ ...
1.7.1 solr Searching概述
1. Overview of Searching in Solr 在用户运行一个solr搜索时,搜索查询会被request handler处理.一个request handler就是一个请求处理插件, ...

随机推荐

Graph Cut
转自:http://blog.csdn.net/zouxy09/article/details/8532111 Graph Cut,下一个博文我们再学习下Grab Cut,两者都是基于图论的分割方法. ...
linux下环境变量设置的问题
在当前环境变量前新增加一个路径 export PATH=/your/bin/path:$PATH export LD_LIBRARY_PATH=/your/lib/path:$LD_LIBRARY_P ...
树莓派使用opencv
安装 reference1 reference2 注意安装顺利,但是使用的时候提示 you need install libgtk2.0-dev xxx ,这时候说明你安装的库的顺序不对,你应该先安 ...
Laravel通过Swoole提升性能
1.安装配置laravel 1.1.composer下载laravel composer create-project --prefer-dist laravel/laravel blog " ...
python webpy 框架环境架设
前几年使用过 webpy做个些小东西,今天有个东西从拾webpy.但是基本上都忘记了,还是那句古话“好记性不如烂笔头”.这里把相应的步骤梳理下. 前提: 操作系统 windows 一.webpy 方面 ...
CF1027C Minimum Value Rectangle【贪心/公式化简】
https://www.luogu.org/problemnew/show/CF1027C #include<cstdio> #include<string> #include ...
洛谷P2224 [HNOI2001] 产品加工 [DP补完计划，背包]
题目传送门产品加工题目描述某加工厂有A.B两台机器,来加工的产品可以由其中任何一台机器完成,或者两台机器共同完成.由于受到机器性能和产品特性的限制,不同的机器加工同一产品所需的时间会不同,若同时 ...
Eclipse内嵌的webservice客户端
概述 Eclipse内嵌的webservice客户端,可用于发起请求,查看结果,展示请求和响应的报文. 详情在Java EE视图,可以看到内嵌的webservice客户端浏览器登陆按钮点击打开浏览 ...
Python安装scrapy提示 error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools&quo ...
Flask实战第46天：完成前台登录功能
后台逻辑首先进行表单验证, 编辑front.froms.py ... class SignInForm(BaseForm): telephone = StringField(validators=[ ...

solr 自聚类实现

1、配置文件

2、测试索引的文件

solr 自聚类实现的更多相关文章

随机推荐

热门专题

　　1、配置文件

　　2、测试索引的文件