Lucene的DocFieldProcessor类

DocFieldProcessor类的任务
1 按顺序存储所有的field和对应的fieldinfo
2 为当前这篇doc的field按照fieldname来建立hash索引
3 调用InvertedDocConsumer类（抽象），对field的内容分词和建立内存索引

DocFieldProcessor类主要用到的其他类
DocFieldProcessorPerField类的对象负责存放一个fieldinfo和其对应的field（可能是多个，他们的fieldinfo相同），next成员可以指向下一个DocFieldProcessorPerField类对象，构成链表（用于解决fieldhash冲突）
DocInverterPerField是DocFieldProcessorPerField内用到的类，负责解析同一个fieldinfo里的field，建立索引

final class DocFieldProcessor extends DocConsumer {

  final DocFieldConsumer consumer;

  final StoredFieldsConsumer storedConsumer;

  final Codec codec;

  // 存储doc里所有的DocFieldProcessorPerField（这个类里面会存放多个fieldinfo相同的field），位置按顺序存储，一个位置里面放一个

  DocFieldProcessorPerField[] fields = new DocFieldProcessorPerField[1];

  int fieldCount;

  // 存储doc里所有的DocFieldProcessorPerField（这个类里面会存放多个fieldinfo相同的field），位置按fieldname的hash来存放，一个位置可能匹配多个，以链表结构存放

  DocFieldProcessorPerField[] fieldHash = new DocFieldProcessorPerField[2];

  int hashMask = 1;

  int totalFieldCount;

  int fieldGen;

//浅拷贝 要分析的doument

  final DocumentsWriterPerThread.DocState docState;

  final Counter bytesUsed;

DocFieldProcessor ::processDocument(FieldInfos.Builder fieldInfos)是DocFieldProcessor类主要功能的实现函数

  public void processDocument(FieldInfos.Builder fieldInfos) throws IOException {

    consumer.startDocument();

    storedConsumer.startDocument();

    fieldCount = 0;

    final int thisFieldGen = fieldGen++;

//循环doc里的field，合并同fieldinfo的field进入一个DocFieldProcessorPerField，并且如果此轮循环有新创建的DocFieldProcessorPerField对象，则加入到顺序数组和hash数组里面

    for(IndexableField field : docState.doc) {

      final String fieldName = field.name();

      //计算fieldname在hash数组里对应的位置

      final int hashPos = fieldName.hashCode() & hashMask;

      DocFieldProcessorPerField fp = fieldHash[hashPos];

     //查找对应位置下fieldname符合当前field的DocFieldProcessorPerField对象

      while(fp != null && !fp.fieldInfo.name.equals(fieldName)) {

        fp = fp.next;

      }

      if (fp == null) {

        //表示该fieldinfo是第一次出现，把field的info信息添加到fieldinfos里面，并且创建对应的fieldinfo对象

        FieldInfo fi = fieldInfos.addOrUpdate(fieldName, field.fieldType());

        //创建一个DocFieldProcessorPerField

        fp = new DocFieldProcessorPerField(this, fi);

        //加入到hash链表里面

        fp.next = fieldHash[hashPos];

        fieldHash[hashPos] = fp;

        totalFieldCount++;

        //hash表快满了，重新做hash

        if (totalFieldCount >= fieldHash.length/2) {

          rehash();

        }

      } else {

      //更新fieldinfo对象

        FieldInfo fi = fieldInfos.addOrUpdate(fieldName, field.fieldType());

        assert fi == fp.fieldInfo : "should only have updated an existing FieldInfo instance";

      }

      if (thisFieldGen != fp.lastGen) {

      //说明这个fieldinfo是第一次出现，即本次循环new了DocFieldProcessorPerField

        fp.fieldCount = 0;

        //判断顺序数组是不是满了

        if (fieldCount == fields.length) {

          final int newSize = fields.length*2;

          DocFieldProcessorPerField newArray[] = new DocFieldProcessorPerField[newSize];

          System.arraycopy(fields, 0, newArray, 0, fieldCount);

          fields = newArray;

        }

        fields[fieldCount++] = fp;

        fp.lastGen = thisFieldGen;

      }

     //把field加入到fp中，同fieldinfo的field会加入到同一个fp里面

      fp.addField(field);

      storedConsumer.addField(docState.docID, field, fp.fieldInfo);

    }

    //遍历顺序数组里的DocFieldProcessorPerField，调用DocInverterPerField对其里面的field进行分词建立索引之类的操作

    ArrayUtil.introSort(fields, 0, fieldCount, fieldsComp);

    for(int i=0;i<fieldCount;i++) {

      final DocFieldProcessorPerField perField = fields[i];

      perField.consumer.processFields(perField.fields, perField.fieldCount);

    }

  }

Lucene的DocFieldProcessor类的更多相关文章

lucene.net helper类【结合盘古分词进行搜索的小例子（分页功能）】
转自:http://blog.csdn.net/pukuimin1226/article/details/17558247 添加:2013-12-25 更新:2013-12-26 新增分页功能. ...
Lucene的Query类介绍
把Lucene的查询当成sql的查询,也许会笼统的明白些query的真相了. 查询分为大致两类,1:精准查询.2,模糊查询. 创建测试数据. private Directory directory; ...
全文检索解决方案（lucene工具类以及sphinx相关资料）
介绍两种全文检索的技术. 1. lucene+ 中文分词(IK) 关于lucene的原理,在这里可以得到很好的学习. http://www.blogjava.net/zhyiwww/archive/ ...
lucene 基础知识点
部分知识点的梳理,参考<lucene实战>及网络资料 1.基本概念 lucence 可以认为分为两大组件: 1)索引组件 a.内容获取:即将原始的内容材料,可以是数据库.网站(爬虫).文本 ...
lucene全文检索---打酱油的日子
检索内容,一般的程序员第一时间想到的是sql的like来做模糊查询,其实这样的搜索是比较耗时的.已经有lucene帮我们封装好了,lucene采用的是分词检索等策略. 1.lucene中的类描述 I ...
【转载】Lucene.Net入门教程及示例
本人看到这篇非常不错的Lucene.Net入门基础教程,就转载分享一下给大家来学习,希望大家在工作实践中可以用到. 一.简单的例子 //索引Private void Index(){ Index ...
Lucene.net站内搜索—3、最简单搜索引擎代码
目录 Lucene.net站内搜索—1.SEO优化 Lucene.net站内搜索—2.Lucene.Net简介和分词Lucene.net站内搜索—3.最简单搜索引擎代码Lucene.net站内搜索—4 ...
通过lucene的StandardAnalyzer分析器来了解分词
本文转载http://blog.csdn.net/jspamd/article/details/8194919 不同的Lucene分析器Analyzer,它对TokenStream进行分词的方法是不同 ...
Lucene 基础理论 (zhuan)
http://www.blogjava.net/hoojo/archive/2012/09/06/387140.html**************************************** ...

随机推荐

实现JQuery EasyUI右键菜单变灰不可用效果
使用过EasyUI的朋友想必都知道疯狂秀才写的后台界面吧,作为一个初学者我不敢妄自评论它的好坏,不过它确实给我们提供了一个很好框架,只要在它的基础上进行修改,基本上都可以满足我们开发的需要. 知道“疯 ...
《Linux设备驱动程序》笔记1
驱动程序的任务通常来讲,驱动(模块)要执行两类任务: 模块中的某些函数作为系统调用的一部分执行(按照既定规则填补必需的系统调用模块) 其他函数负责终端处理内核中的并发为什么考虑并发问题: Lin ...
Chapter 7 Windows下pycaffe的使用之draw_net.py
Chapter 6 中完成了在Windows下,对pycaffe的编译,如果编译存在问题,请参考:http://www.cnblogs.com/xiaopanlyu/p/6158902.html 本文 ...
前N个自然数的随机置换
来自:[数据结构与算法分析——C语言描述]练习2.7 问题描述:假设需要生成前N个自然数的一个随机置换.例如,{4,1,2,5,2}和{3,1,4,2,5}就是合法的置换,但{5,4,1,2,1}却不 ...
Fedora 14 x64 试用手记
欢迎大家给我投票: http://2010blog.51cto.com/350944 刊登在: http://os.51cto.com/art/201011/235506.htm FC14桌面使用体验 ...
第三百五十三天 how can I 坚持
今天买了床被子,凑合盖吧,也不是多好. 下午去了趟华北电力大学,和刘路聊了聊,还是话太多了..不好. 还有买了桶油和大米.. 洗澡,睡觉,一天过得好快.
FZU 8月有奖月赛A Daxia & Wzc's problem (Lucas)
Problem A Daxia & Wzc's problem Accept: 42 Submit: 228Time Limit: 1000 mSec Memory Limit : ...
HDU 1394Minimum Inversion Number（线段树）
题目大意是说给你一个数组(N个),没戏可以将其首部的k(k<N)个元素移动至尾部,这样总共会形成N个序列现在要求这n个序列中逆序对数最少的那一个序列有多少个逆序对最初的确是没太多思路,就算知 ...
使用struts2实现文件下载
<action name="downloadAction" class=""> <result type="stream" ...
android ViewConfiguration
ViewConfiguration 1.有时候要获取一些android UI的中一些默认参数的来进行操作设置,就要用到ViewConfiguration 官方飞解释是:ViewConfiguratio ...

Lucene的DocFieldProcessor类

Lucene的DocFieldProcessor类的更多相关文章

随机推荐

热门专题