From http://lucene.apache.org/solr/guide/7_1/overview-of-documents-fields-and-schema-design.html

The fundamental premise of Solr is simple. You give it a lot of information, then later you can ask it questions and find the piece of information you want. The part where you feed in all the information is called indexing or updating. When you ask a question, it’s called a query.

One way to understand how Solr works is to think of a loose-leaf book of recipes. Every time you add a recipe to the book, you update the index at the back. You list each ingredient and the page number of the recipe you just added. Suppose you add one hundred recipes. Using the index, you can very quickly find all the recipes that use garbanzo beans, or artichokes, or coffee, as an ingredient. Using the index is much faster than looking through each recipe one by one. Imagine a book of one thousand recipes, or one million.

Solr allows you to build an index with many different fields, or types of entries. The example above shows how to build an index with just one field, ingredients. You could have other fields in the index for the recipe’s cooking style, like Asian, Cajun, or vegan, and you could have an index field for preparation times. Solr can answer questions like "What Cajun-style recipes that have blood oranges as an ingredient can be prepared in fewer than 30 minutes?"

The schema is the place where you tell Solr how it should build indexes from input documents.

How Solr Sees the World

Solr’s basic unit of information is a document, which is a set of data that describes something. A recipe document would contain the ingredients, the instructions, the preparation time, the cooking time, the tools needed, and so on. A document about a person, for example, might contain the person’s name, biography, favorite color, and shoe size. A document about a book could contain the title, author, year of publication, number of pages, and so on.

In the Solr universe, documents are composed of fields, which are more specific pieces of information. Shoe size could be a field. First name and last name could be fields.

Fields can contain different kinds of data. A name field, for example, is text (character data). A shoe size field might be a floating point number so that it could contain values like 6 and 9.5. Obviously, the definition of fields is flexible (you could define a shoe size field as a text field rather than a floating point number, for example), but if you define your fields correctly, Solr will be able to interpret them correctly and your users will get better results when they perform a query.

You can tell Solr about the kind of data a field contains by specifying its field type. The field type tells Solr how to interpret the field and how it can be queried.

When you add a document, Solr takes the information in the document’s fields and adds that information to an index. When you perform a query, Solr can quickly consult the index and return the matching documents.

Field Analysis

Field analysis tells Solr what to do with incoming data when building an index. A more accurate name for this process would be processing or even digestion, but the official name is analysis.

Consider, for example, a biography field in a person document. Every word of the biography must be indexed so that you can quickly find people whose lives have had anything to do with ketchup, or dragonflies, or cryptography.

However, a biography will likely contains lots of words you don’t care about and don’t want clogging up your index—words like "the", "a", "to", and so forth. Furthermore, suppose the biography contains the word "Ketchup", capitalized at the beginning of a sentence. If a user makes a query for "ketchup", you want Solr to tell you about the person even though the biography contains the capitalized word.

The solution to both these problems is field analysis. For the biography field, you can tell Solr how to break apart the biography into words. You can tell Solr that you want to make all the words lower case, and you can tell Solr to remove accents marks.

Field analysis is an important part of a field type. Understanding Analyzers, Tokenizers, and Filters is a detailed description of field analysis.

Solr’s Schema File

Solr stores details about the field types and fields it is expected to understand in a schema file. The name and location of this file may vary depending on how you initially configured Solr or if you modified it later.

  • managed-schema is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API, or Schemaless Mode features. You may explicitly configure the managed schema features to use an alternative filename if you choose, but the contents of the files are still updated automatically by Solr.

  • schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory.

  • If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.

Whichever name of the file in use in your installation, the structure of the file is not changed. However, the way you interact with the file will change. If you are using the managed schema, it is expected that you only interact with the file with the Schema API, and never make manual edits. If you do not use the managed schema, you will only be able to make manual edits to the file, the Schema API will not support any modifications.

Note that if you are not using the Schema API yet you do use SolrCloud, you will need to interact with schema.xml through ZooKeeper using upconfig and downconfig commands to make a local copy and upload your changes. The options for doing this are described in Solr Control Script Reference and Using ZooKeeper to Manage Configuration Files.

Solr Principal - 工作原理/机制的更多相关文章

  1. solr查询工作原理深入内幕

    1.什么是Lucene? 作为一个开放源代码项目,Lucene从问世之后,引发了开放源代码社群的巨大反响,程序员们不仅使用它构建具体的全文检索应用,而且将之集成到各种系统软件中去,以及构建Web应用, ...

  2. springMVC 的工作原理和机制

    工作原理上面的是springMVC的工作原理图: 1.客户端发出一个http请求给web服务器,web服务器对http请求进行解析,如果匹配DispatcherServlet的请求映射路径(在web. ...

  3. Android消息机制之ThreadLocal的工作原理

    来源: http://blog.csdn.net/singwhatiwanna/article/details/48350919 很多人认为Handler的作用是更新UI,这说的的确没错,但是更新UI ...

  4. springMVC 的工作原理和机制(转)

    工作原理上面的是springMVC的工作原理图: 1.客户端发出一个http请求给web服务器,web服务器对http请求进行解析,如果匹配DispatcherServlet的请求映射路径(在web. ...

  5. 1 weekend110的NN元数据管理机制 + NN工作机制 + DN工作原理

    第一天的笔记,是伪分布hadoop集群搭建, 后面是hadoop Ha的分布式集群搭建 第一天,是HDFS的shell操作 NN工作机制 里面是二进制 DN工作原理 上传完了之后,在hdfs的虚拟路径 ...

  6. Java垃圾回收机制的工作原理

    Java垃圾回收机制的工作原理 [博主]高瑞林 [博客地址]http://www.cnblogs.com/grl214 获取更多内容,请关注小编个人微信公众平台: 一.Java中引入垃圾回收机制的作用 ...

  7. 170529、springMVC 的工作原理和机制

    工作原理上面的是springMVC的工作原理图: 1.客户端发出一个http请求给web服务器,web服务器对http请求进行解析,如果匹配DispatcherServlet的请求映射路径(在web. ...

  8. LVS负载均衡机制之LVS-DR模式工作原理以及简单配置

    本博文主要简单介绍一下LVS负载均衡集群的一个基本负载均衡机制:LVS-DR:如有汇总不当之处,请各位在评论中多多指出. LVS-DR原理: LVS的英文全称是Linux Virtual Server ...

  9. Apache Lucene评分机制的内部工作原理

    Apache Lucene评分机制的内部工作原理' 第5章

随机推荐

  1. python 爬取bilibili 视频信息

    抓包时发现子菜单请求数据时一般需要rid,但的确存在一些如游戏->游戏赛事不使用rid,对于这种未进行处理,此外rid一般在主菜单的响应中,但有的如番剧这种,rid在子菜单的url中,此外返回的 ...

  2. BZOJ1010玩具装箱 - 斜率优化dp

    传送门 题目分析: 设\(f[i]\)表示装前i个玩具的花费. 列出转移方程:\[f[i] = max\{f[j] + ((i - (j + 1)) + sum[i] - sum[j] - L))^2 ...

  3. scala 中的修饰符

    package cn.scala_base.oop.scalaclass import scala.beans.BeanProperty; /** * scala中的field,类中定义的是方法,函数 ...

  4. CentOS 配置远程主机ssh免密登录

    ssh针对的是用户不是机器,同一机器不同用户需要单独配置ssh,才能实现该用户的免密登录 cd ~ cd ./.ssh 在./ssh目录下生成公钥与私钥(如果没有.ssh先使用ssh命令连接到一台远程 ...

  5. 从文件 I/O 看 Linux 的虚拟文件系统

    1 引言 Linux 中允许众多不同的文件系统共存,如 ext2, ext3, vfat 等.通过使用同一套文件 I/O 系统 调用即可对 Linux 中的任意文件进行操作而无需考虑其所在的具体文件系 ...

  6. 利用WPF建立自己的3d gis软件(非axhost方式)(十二)SDK中的导航系统

    原文:利用WPF建立自己的3d gis软件(非axhost方式)(十二)SDK中的导航系统 先下载SDK:https://pan.baidu.com/s/1M9kBS6ouUwLfrt0zV0bPew ...

  7. 解决无法定位程序输入点SymEnumSymbols于动态链接库dbghelp.dll

    作者:朱金灿 来源:http://blog.csdn.net/clever101 下载一个源码,使用VS2008编译链接无问题,运行时出现一个错误:无法定位程序输入点SymEnumSymbols于动态 ...

  8. so文件成品评论【整理】

    这是我的 @布加迪20 AZ在一篇文章中写道:<汉化so文件的心得>中的技术附件做的简洁性整理.原来的看起来不是非常方便.一起分享学习.. 正文 SO文件汉化心得 --By布加迪20   ...

  9. adb删除系统软件

    ZTE V970Android OS 4.1.2OS version: LeWa_13.04.03系统内存划分很小,才500M. 幸好开发者设置里面有一项:ROOT 授权管理adb root // 没 ...

  10. C# WPF 左侧菜单右侧内容布局效果实现

    原文:C# WPF 左侧菜单右侧内容布局效果实现 我们要做的效果是这样的,左侧是可折叠的菜单栏,右侧是内容区域,点击左侧的菜单项右侧内容区域则相应地切换. wpf实现的话,我的办法是用一个tabcon ...