Centos下Sphinx中文分词编译安装测试---CoreSeek

要支持中文分词，还需要下载Coreseek，可以去官方搜索下载，这里我用的4.1

百度云下载地址： https://pan.baidu.com/s/1slNIyHf

tar -zxvf coreseek-4.1-beta.tar.gz

cd coreseek-4.1-beta

cd mmseg-3.2.14/

./bootstrap   //测试安装环境

libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.

libtoolize: copying file `config/ltmain.sh'

libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.in and

libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.

libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.

+ autoheader

+ automake --add-missing --copy

+ autoconf

./configure --prefix=/usr/local/mmseg3

------------------------------------------------------------------------

Configuration:

  Source code location:       .

  Compiler:                   gcc

  Compiler flags:             -g -O2

  Host System Type:           x86_64-redhat-linux-gnu

  Install path:               /usr/local/mmseg3

  See config.h for further configuration information.

------------------------------------------------------------------------

make && make install

在原安装目录下创建一个文本文档测试一下

cd /usr/local/mmseg3

cd /usr/local/src/coreseek-4.1-beta/mmseg-3.2.14/src

vim test.txt

山东省德州市

北京朝阳市 

中国北京 

中国德州 

中国山东德州

cd /usr/local/mmsge3/bin

./mmseg -d /usr/local/mmseg3/etc/ /usr/local/src/coreseek-4.1-beta/mmseg-3.2.14/src/test.txt

山东省/x 德州市/x  /x  /x 

北京/x 朝阳市/x 

中国/x 北京/x 

中国/x 德州/x 

中国/x 山东/x 德州/x 

Word Splite took: 0 ms.

cd /usr/local/src/coreseek-4.1-beta/csft-4.1  //可以把csft当做sphinx了

sh buildconf.sh    //执行脚本测试，如果不出问题，证明可以使用

./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/ 

/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql

You can now run 'make install' to build and install Sphinx binaries.

On a multi-core machine, try 'make -j4 install' to speed up the build.

Updates, articles, help forum, and commercial support, consulting, training,

and development services are available at http://sphinxsearch.com/

Thank you for choosing Sphinx!

make && make install

make[3]: Entering directory `/usr/local/src/coreseek-4.1-beta/csft-4.1'

mkdir -p /usr/local/coreseek/var/data && mkdir -p /usr/local/coreseek/var/log

make[3]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1'

make[2]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1'

make[1]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1'

然后进入mysql客户端创建一个表测试一下

create table kecheng(id int primary key auto_increment,name varchar(50),info varchar(50))charset utf8;

insert into kecheng(name,info) values('java','java是一门很牛的语言，性能整体来说比PHP要强，但是不如php开发速度快');

insert into kecheng(name,info) values('redis','redis是一种内存缓存数据库，比memcache支持的数据格式多');

insert into kecheng(name,info) values('memcache','memcache支持简单的key value形式，不像redis支持持久化');

insert into kecheng(name,info) values('jquery','jquery是一种前端脚本，结合php和java可以做web开发');

cd /usr/local/coreseek/    //也就是sphinx目录了

cd bin

ls  //类似于原版sphinx目录结构

cd /usr/local/coreseek/etc

cp sphinx.conf.dist csft.conf

CREATE TABLE index_table(        //此表为了存放更新完的索引id，不用每次更新全表

    Counter_id int unsigned not null primary key auto,

    Max_id int unsigned not null comment'已经创建完索引的最大id'

)

编辑配置文件csft.conf

 13 source src1

 14 {

 15         # data source type. mandatory, no default value

 16         # known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc

 17         type                    = mysql                  --库类型

 18 

 19         #####################################################################

 20         ## SQL settings (for 'mysql' and 'pgsql' types)

 21         #####################################################################

 22 

 23         # some straightforward parameters for SQL source types

 24         sql_host                = localhost                  --不做解释

 25         sql_user                = root

 26         sql_pass                =

 27         sql_db                  = test

 28         sql_port                = 3306  # optional, default is 3306

.....

 79          sql_query_pre          = SET NAMES utf8                  --设置字符集

 80          sql_query_pre          = SET SESSION query_cache_type=OFF      --关闭mysql查询缓存

 84         # mandatory, integer document ID field MUST be the first selected column

 85         #sql_query              = \

 86         #       SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \

 87         #       FROM documents--关掉默认的查询表

           #设置要查询的信息，如果表主键不叫id，那么还需要别名为id，如 select tid id from tableName；

 88         sql_query               = SELECT id,name,info FROM kecheng 

           #主查询执行完之后执行的SQL     index_table是存放最后更新的主键id，不用每次更新全表，只更新最新数据

            sql_query_post = REPLACE INTO index_table SELECT 1,MAX(id) FROM kecheng;        

.....         #当使用search检索文件的时候，返回的记录字段，这里是所有（测试而已）

 241         sql_query_info          = SELECT * FROM kecheng WHERE id=$id

.....

index test1

 318 {

.....

 331         path                 = /usr/local/coreseek/var/data/test1   --索引文件创建的位置

 332 

 333         # document attribute values (docinfo) storage mode

 391         charset_type            = zh_cn.utf-8                  --改为中文

 392         charset_dictpath        = /usr/local/mmseg3/etc/ --词典目录

 

#---------------- 

source zengliangsuoyin : src1{

	#取出还没有创建索引的数据

	sql_query 	= SELECT id,name,info FROM kecheng WHERE id > （SELECT max_id FROM index_table ）

	#再把最后一个id更新到index_table

	。。不用写了，因为是继承上一个

}

index zengliangsuoyin : src1{

	source 			= zengliangsuoyin

	path                    = /usr/local/coreseek/var/data/test1

}

保存退出

cd /usr/local/coreseek/bin/

./indexer --all

using config file '/usr/local/coreseek/etc/csft.conf'...        --指定的配置文档，之前复制的文件命名一致

indexing index 'test1'...

WARNING: attribute 'group_id' not found - IGNORING

WARNING: attribute 'date_added' not found - IGNORING

WARNING: Attribute count is 0: switching to none docinfo

collected 5 docs, 0.0 MB

sorted 0.0 Mhits, 100.0% done

total 5 docs, 351 bytes

total 0.178 sec, 1971 bytes/sec, 28.07 docs/sec

indexing index 'test1stemmed'...

WARNING: attribute 'group_id' not found - IGNORING

WARNING: attribute 'date_added' not found - IGNORING

WARNING: Attribute count is 0: switching to none docinfo

collected 5 docs, 0.0 MB                  --发现五个文档也就是mysql五条记录,连接库没问题了

sorted 0.0 Mhits, 100.0% done

total 5 docs, 351 bytes

total 0.007 sec, 47677 bytes/sec, 679.16 docs/sec

skipping non-plain index 'dist1'...

skipping non-plain index 'rt'...

total 4 reads, 0.000 sec, 0.3 kb/call avg, 0.0 msec/call avg

total 12 writes, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg

./search php

Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)]

Copyright (c) 2007-2011,

Beijing Choice Software Technologies Inc (http://www.coreseek.com)

 using config file '/usr/local/coreseek/etc/csft.conf'...

index 'test1': query 'php ': returned 3 matches of 3 total in 0.000 sec

displaying matches:

1. document=1, weight=2500

id=1

group_id=1

group_id2=5

date_added=2017-02-08 06:22:36

title=test one

content=this is my test document number one. also checking search within phrases.

2. document=2, weight=1500

id=2

group_id=1

group_id2=6

date_added=2017-02-08 06:22:36

title=test two

content=this is my test document number two

3. document=5, weight=1500

(document not found in db)

words:

1. 'php': 3 documents, 5 hits ---出现的次数

index 'test1stemmed': query 'php ': returned 3 matches of 3 total in 0.000 sec

displaying matches:

1. document=1, weight=2500

id=1

group_id=1

group_id2=5

date_added=2017-02-08 06:22:36

title=test one

content=this is my test document number one. also checking search within phrases.

2. document=2, weight=1500

id=2

group_id=1

group_id2=6

date_added=2017-02-08 06:22:36

title=test two

content=this is my test document number two

3. document=5, weight=1500

(document not found in db)

words:

1. 'php': 3 documents, 5 hits

测试完成，下面就开始php扩展的安装了

Centos下Sphinx中文分词编译安装测试---CoreSeek的更多相关文章

CentOS下MySQL 5.7编译安装
CentOS下MySQL 5.7编译安装文章目录安装依赖包下载相应源码包添加mysql用户预编译编译安装启动脚本,设置开机自启动 /etc/my.cnf,仅供参考初始化数据库设置 ...
Sphinx中文分词详细安装配置及API调用实战
这几天项目中需要重新做一个关于商品的全文搜索功能,于是想到了用Sphinx,因为需要中文分词,所以选择了Sphinx for chinese,当然你也可以选择coreseek,建议这两个中选择一个,暂 ...
centos 下nginx源码编译安装
1.下载nginx 进入nginx官网下载nginx的稳定版本,我下载的是1.10.3. 下载:wget http://nginx.org/download/nginx-1.10.3.tar.gz 解 ...
CentOS 下 MySQL 5.7 编译安装
MySQL5.7主要特性: 1—更好的性能:对于多核CPU.固态硬盘.锁有着更好的优化,每秒100W QPS已不再是MySQL的追求,下个版本能否上200W QPS才是吾等用户更关心的 2—更好的In ...
centos下httpd-2.4的编译安装
httpd-2.4编译安装依赖于更高版本的apr和apr-util apr 全称 apache portable runtime 首先停用低版本的httpd服务 service ...
Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装
由于elasticsearch基于lucene,所以天然地就多了许多lucene上的中文分词的支持,比如 IK, Paoding, MMSEG4J等lucene中文分词原理上都能在elasticsea ...
CentOS 7上源码编译安装和配置LNMP Web+phpMyAdmin服务器环境
CentOS 7上源码编译安装和配置LNMP Web+phpMyAdmin服务器环境什么是LNMP? LNMP(别名LEMP)是指由Linux, Nginx, MySQL/MariaDB, PHP/ ...
CentOS 7.4 源码编译安装 Redis
一.CentOS 7.4 源码编译安装 Redis 1.下载源码并解压 wget http://download.redis.io/releases/redis-4.0.10.tar.gz tar ...
CentOS 下 Java 的下载、安装、配置
CentOS 下 Java 的下载.安装.配置系统: CentOS 7 x86_64 Java 版本: 1.8.0_171 本文将 Java 目录放在 /usr/local/java 文件夹下,读者 ...

随机推荐

mybatis配置多个数据源事务(Transaction)处理
当mybatis配置文件中只有一个数据源的时候,按照正常的事务注解形式@Transaction是没有问题的,但是当配置文件中有多个数据源的时候发现事务不起作用了,怎么解决这个问题呢?看下面的案例:
redis 安装实战(10步完成安装)
1 下载zip :https://redis.io/download ---->redis-4.0.6 2 上传:利用wcp 上传到/usr/local/soft/ 3 解压:tar -zv ...
scss 初学笔记三继承
//继承 .btn{ padding: 4px 10px; font-size: 14px; } .primary{ background:red; @extend .btn; } //%placeh ...
linux_思想
linux有哪些重要的思想? 1. 做的越多错的越多 2. 纸包不住火 3. 操作重要文件前备份,操作后查看结果 4. 看到命令输出结果,可能命令有个选择直接获得对应值 5. 先定行,再定列
python2.7.5 安装pip
1 先安装setuptools 下载地址:https://pypi.python.org/pypi/setuptools#downloads 将下载后的tar文件解压,用CMD模式进入到解压后的文件所 ...
linkin大话数据结构--Google commons工具类
package tz.web.dao.bean; import java.util.Arrays; import java.util.Collection; import java.util.List ...
HttpClient方式调用接口的实例
使用HttpClient的方式调用接口的实例. public class TestHttpClient { public static void main(String[] args) { // 请求 ...
tcpdump 使用
例子: 首先切换到root用户 tcpdump -w aaa.cap -i eth7 -nn -x 'port 9999' -c 1 以例子说明参数: -w:输出到文件aaa.cap ...
Android 使用EventBus发送消息接收消息
基本使用自定义一个类 public class LoginEvent { private String code;//是否成功 public LoginEvent(String code) { th ...
【转】AWK常用
awk是个优秀文本处理工具,可以说是一门程序设计语言.下面是awk内置变量. 一.内置变量表属性说明 $0 当前记录(作为单个变量) $1~$n 当前记录的第n个字段,字段间由FS分隔 FS 输入 ...

Centos下Sphinx中文分词编译安装测试---CoreSeek

Centos下Sphinx中文分词编译安装测试---CoreSeek的更多相关文章

随机推荐

热门专题