在MYSQL中运用全文索引(FULLTEXT index)
在MYSQL中使用全文索引(FULLTEXT index)
MYSQL的一个很有用的特性是使用全文索引(FULLTEXT index)查找文本的能力.目前只有使用MyISAM类型表的时候有效(MyISAM是默认的表类型,如果你不知道使用的是什么类型的表,那很可能就是 MyISAM).全文索引可以建立在TEXT,CHAR或者VARCHAR类型的字段,或者字段组合上.我们将建立一个简单的表用来解释各种特性.
简单用法(MATCH()函数)对3.23.23以后的版本有效,复杂的用法(IN BOOLEAN MODE修饰语)对4以后的版本有效,本文的第一部分着重简单用法,第二部分讲复杂用法.
一个简单的表
我们将在整个过程中使用下面的表.CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;
如果你没有把默认的表类型设置成MyISAM以外的类型那么TYPE=MyISAM可以省略.建表之后,向其中填充一些数据,例如:INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');
如果你已经建立好了一个表,你可以使用ALTER TABLE(就像CREATE INDEX语句一样)语句添加一个全文索引,例如:ALTER TABLE fulltext_sample ADD FULLTEXT(copy)
查找文本
全文索引搜索的语法很简单,你只要MATCH字段,AGAINST你要查找的文本,例如:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('love');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
在全文索引上进行搜索是不区分大小写的,因此下面的语句也可以正常运行:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('LOVE');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
全文索引通常用来搜索自然语言文本,例如报纸文章,网页内容等等.因此MySQL为这类搜索添加了很多特性.MySQL不索引任何长度小于等于3的文本, 也不索引有50%机会出现的单词.这意味着如果你的表少于2条记录,基于全文索引的搜索不会返回任何东西.将来,MySQL会使这项功能更灵活,但是现在 它应该可以适合大部分自然语言的使用.如果你的数据库中的大部分记录都包含”music”,你很可能不希望返回这些记录,你可以使用IN BOOLEAN MODE修饰符来获得50%左右的阀值,见本文第二部分.
结果将按照关联性从高到底的顺序返回.
主要特性
下面是标准的全文索引搜索的主要特性:
1.排除重复词语
2.排除长度小于4的词语
3.排除在多于一半记录中出现的词语(就是说只要要有3条记录)
4.带连字符的词语被认为两个词语
5.结果按照关联度降序返回
6.忽略列表中的词语也被从搜索结果中排除.忽略列表基于普通的英文单词,因此如果你的数据用作不同的目的,你可能希望改变忽略列表.不幸的是,这样作并 不容易.你需要编辑文件myisam/ft_static.c,重新编辑MySQL,并重建索引!这里有一个忽略列表.注意,这些在不同的版本里有所更 改.
忽略列表"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", "zero",
让我们看一下其中的一些词.如果你懒的输入,但是想查找”love”这个词,象下面这样:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('lov');
Empty set (0.00 sec)
什么都没返回,因为全文索引只包含完整的单词,不是部分单词.如果想得到返回,你必须把单词写完整,就像第一个例子里一样.
就像我们提过的,连字符单词在全文索引中被排除(它们被作为单独的单词索引),因此下面的语句什么都不返回:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('all-out');
Empty set (0.00 sec)
很不幸,两个单词都小于4个字符,因此单独搜索时也不会出现,而且通常的搜索中也不会出现.本文的第二部分中使用BOOLEAN MODE搜索可以搜索部分的或者包含连字符的单词.
你也可以一次搜索多个单词,用逗号分隔.下面的例子查找包含”here”和”appears”的记录:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('here','appears');
Empty set (0.01 sec)
出乎意料这个语句没有返回.但是仔细看看忽略列表,这个词被列在其中,因此被从索引中排除了.忽略列表可能是人们解释MySQL全文索引没有生效的通常原因.如果你的查询返回了一个结果,那么你的版本的MySQL的忽略列表不包含”here”这个词.
关联度
下面的例子说明记录返回的优先级mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+
| copy |
+---------------------------+
| A good alert |
| It appears good from here |
| An all-out alert |
+---------------------------+
记录”A good alert”首先出现,因为它同时包含要搜索的两个词.你不必相信我-只需要看看MySQL在结果中显示的优先级.简单的在字段列表中重复MATCH()函数,例如:mysql> SELECT copy,MATCH(copy) AGAINST('good,alert') AS relevance
FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+------------------+
| copy | relevance |
+---------------------------+------------------+
| A good alert | 1.3551264824316 |
| An all-out alert | 0.68526663197496 |
| It appears good from hear | 0.67003110026735 |
+---------------------------+------------------+
关联度的计算非常复杂,它基于索引中单词的数量,记录中不同单词的个数,索引和返回结果中单词的总数,以及单词的重要程度.这个数字可能在你的MySQL版本中有所不同,MySQL偶尔会强化计算逻辑.
对大多数应用来说标准的全文索引搜索非常有用而充分,MySQL 4让它更加强大.
在MYSQL中运用全文索引(FULLTEXT index)的更多相关文章
- MySQL使用全文索引(fulltext index)---高性能
转载地址:https://blog.csdn.net/u011734144/article/details/52817766/ 1.创建全文索引(FullText index) 旧版的MySQL的全文 ...
- MySQL使用全文索引(fulltext index)
1.创建全文索引(FullText index) 旧版的MySQL的全文索引只能用在MyISAM表格的char.varchar和text的字段上. 不过新版的MySQL5.6.24上InnoDB引擎也 ...
- MySQL中的索引提示Index Hint
MySQL数据库支持索引提示(INDEX HINT)显式的高速优化器使用了哪个索引.以下是可能需要用到INDEX HINT的情况 a)MySQL数据库的优化器错误的选择了某个索引,导致SQL运行很慢. ...
- MySQL中的全文索引
之前曾经发表了一篇关于SQL Server全文索引的文章.现在将MySQL全文索引的配置过程记录一下. Step1:创建Student表 CREATE TABLE `student` ( `id` I ...
- CNS数据库网站只用mysql自带的fulltext index功能就可实现了。
1)编辑脚本script.sql如下 ALTER TABLE `table_name` ADD FULLTEXT (`column_name`) 2)在mysql console下输入命令 SOURC ...
- MYSQL中的普通索引,主健,唯一,全文索引区别
MYSQL索引用来快速地寻找那些具有特定值的记录,所有MySQL索引都以B-树的形式保存.如果没有索引,执行查询时MySQL必须从第一个记录开始扫描整个表的所有记录,直至找到符合要求的记录.表里面的记 ...
- mysql中,主键与普通索引
一.什么是索引?索引用来快速地寻找那些具有特定值的记录,所有MySQL索引都以B-树的形式保存.如果没有索引,执行查询时MySQL必须从第一个记录开始扫描整个表的所有记录,直至找到符合要求的记录.表里 ...
- mysql中主外键关系
一.外键: 1.什么是外键 2.外键语法 3.外键的条件 4.添加外键 5.删除外键 1.什么是外键: 主键:是唯一标识一条记录,不能有重复的,不允许为空,用来保证数据完整性 外键:是另一表的主键, ...
- mysql|中主外键关系(转)
http://my.oschina.net/liting/blog/356150 一.外键: 1.什么是外键 2.外键语法 3.外键的条件 4.添加外键 5.删除外键 1.什么是外键: 主键:是唯一标 ...
随机推荐
- Java 数据类型及转换
整形: byte(1个字节) 范围:-128~127 short(2个字节) 范围:-215~215-1 (-32768~32767) int(4个字节) 范围:-231~231-1 (-214748 ...
- linux用rdate命令实现同步时间
用rdate命令实现同步时间 前两天说到用ntp时间服务器和ntpdate命令同步时间,今天简单记录下用rdate同步时间 http://blog.csdn.net/wyzxg/archive/201 ...
- RHEL6 64位ASM方式安装oracle 11gR2(二)
本文转载自:http://vnimos.blog.51cto.com/2014866/1221377 三.安装数据库软件 1 2 3 4 5 6 7 8 # unzip -d /stage/ linu ...
- S3C2440 SPI驱动框架
S3C2440 SPI驱动代码详细解读: https://www.linuxidc.com/Linux/2012-08/68402p4.htm 一.platform device and board_ ...
- 使用VS2017 编写Linux系统上的Opencv程序
背景 之前写图像算法的程序都是在window10下使用VS编写,VS这个IDE结合“ImageWatch.vsix“插件,用于调试opencv相关的图像算法程序十分方便.后因项目需要,需将相关程序移植 ...
- 【转】使用ant来调用Jmeter,并定制运行时参数
为了应对不同的运行需求(主要是不同的线程数),以及可能的变化(host ip),在nongui运行时我对ant build.xml进行了一些修改 1. log目录备份与运行前清除 <tstamp ...
- 解决在“Resources”参数中指定了项“obj\Debug\KaiShiHID.Form1.resources”多次。“Resources”参数不支持重复项
错误截图: 发生原因描述: 窗体Form1, 有一个分部类; 叫做FormPartial.cs; 双击了FormPartial这个窗体, 然后生成了一个Load事件, 即时就报了编译错误, 然后就删除 ...
- Rest之路 - 第一个Rest程序
在 Eclipse 里新建一个 Dynamic project 将 Jersey 的 jar 包,拷贝到 WebContent -> WEB-INF -> lib 文件夹 Add jars ...
- Py修行路 python基础(二)变量 字符 列表
变量 容器 变量名 标记 数据的作用 字符编码 二进制位 = bit1个二进制位是计算机里的最小表示单元 1个字节是计算机里最小的存储单位 8bits = 1Byte =1字节1024Bytes = ...
- 转 -ALSA 配置
原文地址:-ALSA 配置">转 -ALSA 配置作者:超级大苹果 alsa 音频路径的问题: 在sound/soc/codecs目录中有很多音频codec的codec驱动,我使用的是 ...