企业搜索引擎开发之连接器connector（二十）

连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象，该类实现了Traverser接口

/**

 * Interface presented by a Traverser.  Used by the Scheduler.

 */

public interface Traverser {

  /**

   * Interval to wait after a transient error before retrying a traversal.

   */

  public static final int ERROR_WAIT_MILLIS = 15 * 60 * 1000;

  /**

   * Runs a batch of documents. The Traversal method may be hard (impossible?)

   * to interrupt while it is executing runBatch(). It is expected that a

   * thread loop running a traversal method would call runBatch(), then check

   * for InterruptedException, then decide whether it wants to stop of itself,

   * for scheduling reasons, or for a clean shutdown. It could then re-adjust

   * the batch hint if desired, then repeat.

   *

   * @param  batchSize A {@link BatchSize} instructs the traversal method to

   *         process approximately {@code batchSize.getHint()}, but no more

   *         than {@code batchSize.getMaximum()} number of documents in this

   *         batch.

   * @return A {@link BatchResult} containing the actual number of documents

   *         from this batch given to the feed and a possible policy to delay

   *         before requesting another batch.

   */

  public BatchResult runBatch(BatchSize batchSize);

  /**

   * Cancel the Batch in progress.  Discard the batch.  This might be called

   * when the workItem times out, connector deletion or reconfiguration, or

   * during shutdown.

   */

  public void cancelBatch();

}

即上面的BatchResult runBatch(BatchSize batchSize)方法，参数BatchSize batchSize表示批次大小

QueryTraverser类对象通过引用TraversalManager queryTraversalManager对象实例获取数据源数据，同时引用PusherFactory pusherFactory对象实例实例化docPuser对象实例发送document对象数据，成员变量TraversalStateStore stateStore用于获取状态和保存状态(用于断点发送)

 @Override

  public BatchResult runBatch(BatchSize batchSize) {

      //开始时间

    final long startTime = clock.getTimeMillis();

    //超时时间

    final long timeoutTime = startTime

      + traversalContext.traversalTimeLimitSeconds() * 1000;

    //已取消

    if (isCancelled()) {

        LOGGER.warning("Attempting to run a cancelled QueryTraverser");

      return new BatchResult(TraversalDelayPolicy.ERROR);

    }

    try {

      //批次大小

      queryTraversalManager.setBatchHint(batchSize.getHint());

    } catch (RepositoryException e) {

      LOGGER.log(Level.WARNING, "Unable to set batch hint", e);

    }

    String connectorState;

    try {

      if (stateStore != null) {

        //获取断点状态

        connectorState = stateStore.getTraversalState();

      } else {

        throw new IllegalStateException("null TraversalStateStore");

      }

    } catch (IllegalStateException ise) {

      // We get here if the store for the connector is disabled.

      // That happens if the connector was deleted while we were asleep.

      // Our connector seems to have been deleted.  Don't process a batch.

      LOGGER.fine("Halting traversal for connector " + connectorName

                  + ": " + ise.getMessage());

      return new BatchResult(TraversalDelayPolicy.ERROR);

    }

    DocumentList resultSet = null;

    if (connectorState == null) {

      try {

        LOGGER.fine("START TRAVERSAL: Starting traversal for connector "

                    + connectorName);

        resultSet = queryTraversalManager.startTraversal();

      } catch (Exception e) {

        LOGGER.log(Level.WARNING, "startTraversal threw exception: ", e);

        return new BatchResult(TraversalDelayPolicy.ERROR);

      }

    } else {

      try {

        LOGGER.fine("RESUME TRAVERSAL: Resuming traversal for connector "

            + connectorName + " from checkpoint " + connectorState);

        resultSet = queryTraversalManager.resumeTraversal(connectorState);

      } catch (Exception e) {

        LOGGER.log(Level.WARNING, "resumeTraversal threw exception: ", e);

        return new BatchResult(TraversalDelayPolicy.ERROR);

      }

    }

    // If the traversal returns null, that means that the repository has

    // no new content to traverse.

    if (resultSet == null) {

      LOGGER.fine("Result set from connector " + connectorName

                  + " is NULL, no documents returned for traversal.");

      return new BatchResult(TraversalDelayPolicy.POLL, 0);

    }

    Pusher pusher = null;

    //反馈信息

    BatchResult result = null;

    int counter = 0;

    try {

        //同一批次同一个pusher实例

      // Get a Pusher for feeding the returned Documents.

      pusher = pusherFactory.newPusher(connectorName);

      while (true) {

        if (Thread.currentThread().isInterrupted() || isCancelled()) {

          LOGGER.fine("Traversal for connector " + connectorName

                      + " has been interrupted; breaking out of batch run.");

          break;

        }

        if (clock.getTimeMillis() >= timeoutTime) {

          LOGGER.fine("Traversal batch for connector " + connectorName

              + " is completing due to time limit.");

          break;

        }

        String docid = null;

        try {

          LOGGER.finer("Pulling next document from connector " + connectorName);         

          Document nextDocument = resultSet.nextDocument();

          //该resultSet数据集合批次已发送完毕

          if (nextDocument == null) {

            LOGGER.finer("Traversal batch for connector " + connectorName

                + " at end after processing " + counter + " documents.");

            break;

          } else {

            //System.out.println("resultSet.getClass().getName():"+resultSet.getClass().getName());

            //System.out.println("nextDocument.getClass().getName():"+nextDocument.getClass().getName());

            // Since there are a couple of places below that could throw

            // exceptions but not exit the while loop, the counter should be

            // incremented here to insure it represents documents returned from

            // the list.  Note the call to nextDocument() could also throw a

            // RepositoryDocumentException signaling a skipped document in which

            // case the call will not be counted against the batch maximum.

            counter++;

            // Fetch DocId to use in messages.

            try {

              docid = Value.getSingleValueString(nextDocument,

                                                 SpiConstants.PROPNAME_DOCID);

            } catch (IllegalArgumentException e1) {

                LOGGER.finer("Unable to get document id for document ("

                             + nextDocument + "): " + e1.getMessage());

            } catch (RepositoryException e1) {

                LOGGER.finer("Unable to get document id for document ("

                             + nextDocument + "): " + e1.getMessage());

            }

          }

          LOGGER.finer("Sending document (" + docid + ") from connector "

              + connectorName + " to Pusher");

          //发布document

          if (pusher.take(nextDocument) != PusherStatus.OK) {

            LOGGER.fine("Traversal batch for connector " + connectorName

                + " is completing at the request of the Pusher,"

                + " after processing " + counter + " documents.");

            break;

          }

        } catch (SkippedDocumentException e) {

          /* TODO (bmj): This is a temporary solution and should be replaced.

           * It uses Exceptions for non-exceptional cases.

           */

          // Skip this document.  Proceed on to the next one.

          logSkippedDocument(docid, e);

        } catch (RepositoryDocumentException e) {

          // Skip individual documents that fail.  Proceed on to the next one.

          logSkippedDocument(docid, e);

        } catch (RuntimeException e) {

          // Skip individual documents that fail.  Proceed on to the next one.

          logSkippedDocument(docid, e);

        }

      }

      // No more documents. Wrap up any accumulated feed data and send it off.

      if (!isCancelled()) {

        pusher.flush();

      }

    } catch (OutOfMemoryError e) {

      pusher.cancel();

      System.runFinalization();

      System.gc();

      result = new BatchResult(TraversalDelayPolicy.ERROR);

      try {

        LOGGER.severe("Out of JVM Heap Space.  Will retry later.");

        LOGGER.log(Level.FINEST, e.getMessage(), e);

      } catch (Throwable t) {

        // OutOfMemory state may prevent us from logging the error.

        // Don't make matters worse by rethrowing something meaningless.

      }

    } catch (RepositoryException e) {

      // Drop the entire batch on the floor.  Do not call checkpoint

      // (as there is a discrepancy between what the Connector thinks

      // it has fed, and what actually has been pushed).

      LOGGER.log(Level.SEVERE, "Repository Exception during traversal.", e);

      result = new BatchResult(TraversalDelayPolicy.ERROR);

    } catch (PushException e) {

      LOGGER.log(Level.SEVERE, "Push Exception during traversal.", e);

      // Drop the entire batch on the floor.  Do not call checkpoint

      // (as there is a discrepancy between what the Connector thinks

      // it has fed, and what actually has been pushed).

      result = new BatchResult(TraversalDelayPolicy.ERROR);

    } catch (FeedException e) {

      LOGGER.log(Level.SEVERE, "Feed Exception during traversal.", e);

      // Drop the entire batch on the floor.  Do not call checkpoint

      // (as there is a discrepancy between what the Connector thinks

      // it has fed, and what actually has been pushed).

      result = new BatchResult(TraversalDelayPolicy.ERROR);

    } catch (Throwable t) {

      LOGGER.log(Level.SEVERE, "Uncaught Exception during traversal.", t);

      // Drop the entire batch on the floor.  Do not call checkpoint

      // (as there is a discrepancy between what the Connector thinks

      // it has fed, and what actually has been pushed).

      result = new BatchResult(TraversalDelayPolicy.ERROR);

   } finally {

      // If we have cancelled the work, abandon the batch.

      if (isCancelled()) {

        result = new BatchResult(TraversalDelayPolicy.ERROR);

      }

      //更新断点状态

      // Checkpoint completed work as well as skip past troublesome documents

      // (e.g. documents that are too large and will always fail).

      if ((result == null) && (checkpointAndSave(resultSet) == null)) {

        // Unable to get a checkpoint, so wait a while, then retry batch.

        result = new BatchResult(TraversalDelayPolicy.ERROR);

      }

    }

    if (result == null) {

      result = new BatchResult(TraversalDelayPolicy.IMMEDIATE, counter,

                               startTime, clock.getTimeMillis());

    } else if (pusher != null) {

      // We are returning an error from this batch. Cancel any feed that

      // might be in progress.

      pusher.cancel();

    }

    return result;

  }

关键代码本人已作了注释，通过遍历该数据集合批次，向docPusher对象提交document对象，遍历document对象执行完毕后更新断点状态用于下次从数据源获取数据

/**

   * 保存断点状态

   * @param pm

   * @return

   */

  private String checkpointAndSave(DocumentList pm) {

    String connectorState = null;

    LOGGER.fine("CHECKPOINT: Generating checkpoint for connector "

                + connectorName);

    try {

      connectorState = pm.checkpoint();

    } catch (RepositoryException re) {

      // If checkpoint() throws RepositoryException, it means there is no

      // new checkpoint.

      LOGGER.log(Level.FINE, "Failed to obtain checkpoint for connector "

                 + connectorName, re);

      return null;

    } catch (Exception e) {

      LOGGER.log(Level.INFO, "Failed to obtain checkpoint for connector "

                 + connectorName, e);

      return null;

    }

    try {

      if (connectorState != null) {

        if (stateStore != null) {

          stateStore.storeTraversalState(connectorState);

        } else {

          throw new IllegalStateException("null TraversalStateStore");

        }

        LOGGER.fine("CHECKPOINT: " + connectorState);

      }

      return connectorState;

    } catch (IllegalStateException ise) {

      // We get here if the store for the connector is disabled.

      // That happens if the connector was deleted while we were working.

      // Our connector seems to have been deleted.  Don't save a checkpoint.

      LOGGER.fine("Checkpoint discarded: " + connectorState);

    }

    return null;

  }

取消执行方法通过设置布尔变量值，注意需要考虑同步

/**

   * 取消执行

   */

  @Override

  public void cancelBatch() {

    synchronized(cancelLock) {

      cancelWork = true;

    }

    LOGGER.fine("Cancelling traversal for connector " + connectorName);

  }

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处博客园刺猬的温驯

本人邮箱： chenying998179@163#com （#改为.）

本文链接 http://www.cnblogs.com/chenying99/p/3775534.html

企业搜索引擎开发之连接器connector（二十）的更多相关文章

企业搜索引擎开发之连接器connector（十九）
连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...
企业搜索引擎开发之连接器connector（十八）
创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...
企业搜索引擎开发之连接器connector（十六）
本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...
企业搜索引擎开发之连接器connector（二十九）
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
企业搜索引擎开发之连接器connector（二十八）
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
企业搜索引擎开发之连接器connector（二十六）
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...
企业搜索引擎开发之连接器connector（二十五）
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...
企业搜索引擎开发之连接器connector（二十四）
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制上文中分析了连接器的自动更新机制,即定时器执行定时任务那么,如果我们手动操作连接器实 ...
企业搜索引擎开发之连接器connector（二十二）
下面来分析线程执行类,线程池ThreadPool类对该类的理解需要对java的线程池比较熟悉该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...

随机推荐

Oracle SYS_CONTEXT用法
1. USERENV(OPTION) 返回当前的会话信息. OPTION='ISDBA'若当前是DBA角色,则为TRUE,否则FALSE. OPTION='LANGUAGE'返回数据库的字符集. OP ...
erlang的一些系统限制修改
atom个数限制 +t xxx 进程数限制 +P xxxx ets表个数限制 +e xxx ports个数限制 +Q xxxx 查看限制 string:tokens(binary_to_list(er ...
20181123_控制反转(IOC)和依赖注入(DI)
一. 控制反转和依赖注入: 控制反转的前提, 是依赖倒置原则, 系统架构时,高层模块不应该依赖于低层模块,二者通过抽象来依赖 (依赖抽象,而不是细节) 如果要想做到控制反转(IOC), 就必须要使 ...
接口自动化（二）--操作Excel获取需要数据
这一部分的内容记述一下对Excel表格的操作,本实战中的测试用例是由Excel来管理的,因此操作Excel是重要的一部分. 再次贴出这张图,所有的测试用例都在这个sheet内,请求数据真实存放在jso ...
windows7配置Nginx+php+mysql的详细教程
windows7配置Nginx+php+mysql的详细教程作者:Vincent.李字体:[增加减小] 类型:转载时间:2016-09-04我要评论这篇文章主要介绍了windows7配置Ng ...
Django学习---原生ajax
Ajax 原生ajax Ajax主要就是使用 [XmlHttpRequest]对象来完成请求的操作,该对象在主流浏览器中均存在(除早起的IE),Ajax首次出现IE5.5中存在(ActiveX控件). ...
YUM CentOS 7 64位下mysql5.7安装配置
配置YUM源在MySQL官网中下载YUM源rpm安装包:http://dev.mysql.com/downloads/repo/yum/ #下载mysql源安装包 # wget http://dev ...
MySQL GTID (四)
七. GTID的限制以及解决方案 7.1 事务中混合多个存储引擎,会产生多个GTID. 当使用GTID,在同一个事务中,更新包括了非事务引擎(MyISAM)和事务引擎(InnoDB)表的操作,就会导致 ...
Redis 主从分离
首先配置redis.conf文件如下6个位置 cp 多个redis.conf文件开启daemonize yes PID文件名字端口 log文件名字 dump.rdb名字配置: 主机不动,配置从机 ...
迷你MVVM框架 avalonjs 沉思录第2节 DOM操作的三大问题
jQuery之所以击败Prototype.js,是因为它自一开始就了解这三大问题,并提出完善的解决方案. 第一个问题,DOM什么时候可用.JS不像C那样有一个main函数,里面的逻辑不分主次.但JS是 ...

企业搜索引擎开发之连接器connector（二十）

企业搜索引擎开发之连接器connector（二十）的更多相关文章

随机推荐

热门专题