solr实时更新mysql数据的方法

第一步：创建core

core是solr的特有概念，每个core是一个查询数据,、索引等的集合体，你可以把它想象成一个独立数据库，我们创建一个新core：名字[core1]

进入linux命令行,进入到solr的bin目录,执行如下命令:

cd /usr/local/solr/bin

./solr create -c core1 -force // -c 指定要创建的Core名称 root用户启动需要增加 -force

第二步：准备数据导入配置

1、修改/usr/local/solr/server/solr/core1/conf 目录下的solrconfig.xml

增加内容如下：注意不要放到其他requestHandler里面,放在已经存在的</requestHandler> 节点后面!

<str name="config">data-config.xml</str>

</lst>

</requestHandler>

requestHandler请求处理器，定义了索引和搜索的访问方式。

通过/dataimport进行数据导入，可以完成从MYSQL数据库导入数据到Solr的操作。

data-config.xml文件是自己制定的数据源描述文件，名字可以随便取。

2、/usr/local/solr/server/solr/core1/conf 目录下新建data-config.xml

</entity>

</document>

</dataConfig>

2、修改/usr/local/solr/server/solr/core1/conf 目录下managed-schema文件中添加如下内容：

field配置参数说明:

l Name：域的名称

l Type：域的类型

l Indexed：是否索引

l Stored：是否存储

l multiValued：是否是多值，存储多个值时设置为true，solr允许一个Field存储多个值，比如存储一个用户的好友id（多个），商品的图片（多个，大图和小图）

copyField（复制域）:

可以将多个Field复制到一个Field中，以便进行统一的检索。当创建索引时，solr服务器会自动的将源域的内容复制到目标域中。

l source：源域

l dest：目标域，搜索时，指定目标域为默认搜索域，可以提供查询效率。

第三步：安装中文分词

因为solr虽然内置中文分词，但效果并不好，我们需要添加IKAnalyzer中文分词引擎来查询中文。在https://github.com/EugenePig/ik-analyzer-solr5

1、下载IKAnalyzer for solr5的源码包，然后使用Maven编译，得到一个文件IKAnalyzer-5.0.jar，把它放入/usr/local/solr/server/solr-webapp/webapp/WEB-INF/lib目录中

2、修改managed-schema在最后加入以下内容：

</fieldType>

第四步：上传相关类库

1、要导入mysql数据，需要MYSQL类库：

mysql-connector-java-bin.jar

2、导入数据需要 /usr/local/solr/dist目录下面的2个jar包

solr-dataimporthandler-7.2.1.jar

solr-dataimporthandler-extras-7.2.1.jar

3、上传3个jar包到 /usr/local/solr/server/solr-webapp/webapp/WEB-INF/lib目录中

4、重启solr

cd /usr/local/solr/bin

./solr restart -force //root用户启动需要增加 -force

第五步：添加更新配置文件

solr-6.6.0/server/solr下新建conf文件夹

在solr-6.6.0/server/solr/conf/conf下新建dataimport.properties文件

配置：

#################################################

#                                               #

#       dataimport scheduler properties         #

#                                               #

#################################################

#  to sync or not to sync

#  1 - active; anything else - inactive

syncEnabled=1

#  which cores to schedule

#  in a multi-core environment you can decide which cores you want syncronized

#  leave empty or comment it out if using single-core deployment

#syncCores=game,resource

syncCores=search_node

#  solr server name or IP address

#  [defaults to localhost if empty]

server=localhost

#  solr server port

#  [defaults to 80 if empty]

port=9090

interval=2

#  application name/context

#  [defaults to current ServletContextListener's context (app) name]

webapp=solr

#  URL params [mandatory]

#  remainder of URL

params=/dataimport?command=delta-import&clean=false&commit=true

#  schedule interval

#  number of minutes between two runs

#  [defaults to 30 if empty]

#  重做索引的时间间隔，单位分钟，默认7200，即5天;

#  为空,为0,或者注释掉:表示永不重做索引

reBuildIndexInterval=7200

#  重做索引的参数

reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true

#  重做索引时间间隔的计时开始时间，第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000；

#  两种格式：2012-04-11 03:10:00 或者  03:10:00，后一种会自动补全日期部分为服务启动时的日期

reBuildIndexBeginTime=03:10:00

如果不配置reBuildIndexBeginTime会导致报错， Unable to convert 'interval' to number 00:00:00

这个是源码中的bug,

if ((this.reBuildIndexBeginTime == null) || (this.reBuildIndexBeginTime.isEmpty()))

      this.interval = "00:00:00";

启动 solr-6.6.0/bin/solr start -port 9090

重启 solr-6.6.0/bin/solr restart -port 9090