企业搜索引擎开发之连接器connector(二十六)
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中迭代获取数据
监视器类DocumentSnapshotRepositoryMonitor在其构造方法初始化相关成员变量,这些成员属性都是与数据获取及数据处理逻辑相关的对象
/** This connector instance's current traversal schedule. */
private volatile TraversalSchedule traversalSchedule; /** Directory that contains snapshots. */
private final SnapshotStore snapshotStore; /** The root of the repository to monitor */
private final SnapshotRepository<? extends DocumentSnapshot> query; /** Reader for the current snapshot. */
private SnapshotReader snapshotReader; /** Callback to invoke when a change is detected. */
private final Callback callback; /** Current record from the snapshot. */
private DocumentSnapshot current; /** The snapshot we are currently writing */
private OrderedSnapshotWriter snapshotWriter; private final String name; private final DocumentSnapshotFactory documentSnapshotFactory; private final DocumentSink documentSink; /* Contains a checkpoint confirmation from CM. */
private MonitorCheckpoint guaranteeCheckpoint; /* The monitor should exit voluntarily if set to false */
private volatile boolean isRunning = true; /**
* Creates a DocumentSnapshotRepositoryMonitor that monitors the
* Repository rooted at {@code root}.
*
* @param name the name of this monitor (a hash of the start path)
* @param query query for files
* @param snapshotStore where snapshots are stored
* @param callback client callback
* @param documentSink destination for filtered out file info
* @param initialCp checkpoint when system initiated, could be {@code null}
* @param documentSnapshotFactory for un-serializing
* {@link DocumentSnapshot} objects.
*/
public DocumentSnapshotRepositoryMonitor(String name,
SnapshotRepository<? extends DocumentSnapshot> query,
SnapshotStore snapshotStore, Callback callback,
DocumentSink documentSink, MonitorCheckpoint initialCp,
DocumentSnapshotFactory documentSnapshotFactory) {
this.name = name;
this.query = query;
this.snapshotStore = snapshotStore;
this.callback = callback;
this.documentSnapshotFactory = documentSnapshotFactory;
this.documentSink = documentSink;
guaranteeCheckpoint = initialCp;
}
同时实现了Runnable接口,在override的run方法里面实现数据的处理逻辑
@Override
public void run() {
// Call NDC.push() via reflection, if possible.
invoke(ndcPush, "Monitor " + name);
try {
while (true) {
tryToRunForever();
// TODO: Remove items from this monitor that are in queues.
// Watch out for race conditions. The queues are potentially
// giving docs to CM as bad things happen in monitor.
// This TODO would be mitigated by a reconciliation with GSA.
performExceptionRecovery();
}
} catch (InterruptedException ie) {
LOG.info("Repository Monitor " + name + " received stop signal. " + this);
} finally {
// Call NDC.remove() via reflection, if possible.
invoke(ndcRemove);
}
}
进一步调用tryToRunForever()方法
private void tryToRunForever() throws InterruptedException {
try {
while (true) {
if (traversalSchedule == null || traversalSchedule.shouldRun()) {
// Start traversal
doOnePass();
}
else {
LOG.finest("Currently out of traversal window. "
+ "Sleeping for 15 minutes.");
// TODO(nashi): Calculate when it should wake up while
// handling TraversalScheduleAware events properly.
//没到点,休息
callback.passPausing(15*60*1000);
}
}
} catch (SnapshotWriterException e) {
String msg = "Failed to write to snapshot file: " + snapshotWriter.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotReaderException e) {
String msg = "Failed to read snapshot file: " + snapshotReader.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotStoreException e) {
String msg = "Problem with snapshot store.";
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotRepositoryRuntimeException e) {
String msg = "Failed reading repository.";
LOG.log(Level.SEVERE, msg, e);
}
}
在doOnePass()方法实现从仓库对象SnapshotRepository中获取数据,并将数据快照持久化到快照文件,并实现相关的数据处理逻辑(判断是新增 删除或更新等,
这些数据最后通过回调Callback接口添加到ChangeQueue对象中的阻塞队列)
/**
* 在doOnePass()方法中生成独立的快照读写器
* Makes one pass through the repository, notifying {@code visitor} of any
* changes.
*
* @throws InterruptedException
*/
private void doOnePass() throws SnapshotStoreException,
InterruptedException {
callback.passBegin();
try {
//快照读取器
// Open the most recent snapshot and read the first record.
this.snapshotReader = snapshotStore.openMostRecentSnapshot();
current = snapshotReader.read();
//快照写入器
// Create an snapshot writer for this pass.
this.snapshotWriter =
new OrderedSnapshotWriter(snapshotStore.openNewSnapshotWriter());
//下面代码为从仓库里面获取数据
for(DocumentSnapshot ss : query) {
//检查是否停止
if (false == isRunning) {
LOG.log(Level.INFO, "Exiting the monitor thread " + name
+ " " + this);
throw new InterruptedException();
} if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException();
}
processDeletes(ss);
safelyProcessDocumentSnapshot(ss);
} //迭代完数据后,删除快照读取器后面多出来的部分(考虑数据源删除了后面的数据)
// Take care of any trailing paths in the snapshot.
processDeletes(null); } finally {
try {
snapshotStore.close(snapshotReader, snapshotWriter);
} catch (IOException e) {
LOG.log(Level.WARNING, "Failed closing snapshot reader and writer.", e);
// Try to proceed anyway. Weird they are not closing.
}
}
if (current != null) {
throw new IllegalStateException(
"Should not finish pass until entire read snapshot is consumed.");
}
//完工了,休息
callback.passComplete(getCheckpoint(-1));
snapshotStore.deleteOldSnapshots();
if (!callback.hasEnqueuedAtLeastOneChangeThisPass()) {
// No monitor checkpoints from this pass went to queue because
// there were no changes, so we can delete the snapshot we just wrote.
new java.io.File(snapshotWriter.getPath()).delete();
// TODO: Check return value; log trouble.
}
snapshotWriter = null;
snapshotReader = null;
}
processDeletes方法实现数据删除逻辑的处理
/**
* Process snapshot entries as deletes until {@code current} catches up with
* {@code documentSnapshot}. Or, if {@code documentSnapshot} is {@code null},
* process all remaining snapshot entries as deletes.
*
* @param documentSnapshot where to stop
* @throws SnapshotReaderException
* @throws InterruptedException
*/
private void processDeletes(DocumentSnapshot documentSnapshot)
throws SnapshotReaderException, InterruptedException {
//参数documentSnapshot大于当前current的,则删除当前的current;然后继续迭代快照里面下一个documentSnapshot
while (current != null
&& (documentSnapshot == null
|| COMPARATOR.compare(documentSnapshot, current) > 0)) {
callback.deletedDocument(
new DeleteDocumentHandle(current.getDocumentId()), getCheckpoint());
current = snapshotReader.read();
}
}
下面跟踪safelyProcessDocumentSnapshot方法
private void safelyProcessDocumentSnapshot(DocumentSnapshot snapshot)
throws InterruptedException, SnapshotReaderException,
SnapshotWriterException {
try {
processDocument(snapshot);
} catch (RepositoryException re) {
//TODO Log the exception or its message? in document sink perhaps.
//处理异常的snapshot
documentSink.add(snapshot.getDocumentId(), FilterReason.IO_EXCEPTION);
}
}
进一步调用processDocument方法,里面包括更新和新增数据的处理逻辑
/**
* Processes a document found in the document repository.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotReaderException
* @throws SnapshotWriterException
*/
private void processDocument(DocumentSnapshot documentSnapshot)
throws InterruptedException, RepositoryException, SnapshotReaderException,
SnapshotWriterException {
// At this point 'current' >= 'file', or possibly current == null if
// we've processed the previous snapshot entirely.
if (current != null
&& COMPARATOR.compare(documentSnapshot, current) == 0) {
//处理发生变化的documentSnapshot,并更新当前的documentSnapshot
processPossibleChange(documentSnapshot);
} else {
// This file didn't exist during the previous scan.
//不存在该documentSnapshot
DocumentHandle documentHandle = documentSnapshot.getUpdate(null);
snapshotWriter.write(documentSnapshot); // Null if filtered due to mime-type.
if (documentHandle != null) {
callback.newDocument(documentHandle, getCheckpoint(-1));
}
}
}
处理更新情况
/**
* Processes a document found in the document repository that also appeared
* in the previous scan. Determines whether the document has changed,
* propagates changes to the client and writes the snapshot record.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotWriterException
* @throws SnapshotReaderException
*/
private void processPossibleChange(DocumentSnapshot documentSnapshot)
throws RepositoryException, InterruptedException, SnapshotWriterException,
SnapshotReaderException {
//大概是对比hash值
DocumentHandle documentHandle = documentSnapshot.getUpdate(current);
//写入快照文件
snapshotWriter.write(documentSnapshot);
if (documentHandle == null) {
// No change.
//如果未发生改变,则不发送
} else {
// Normal change - send the gsa an update.
callback.changedDocument(documentHandle, getCheckpoint());
}
current = snapshotReader.read();
}
更新数据的快照和新增数据的快照首先持久化到最新的快照文件
数据提交通过回调callback成员的相关方法,最后将数据提交到ChangeQueue队列对象
Callback接口定义了数据处理的相关方法
/**
* 回调接口
* The client provides an implementation of this interface to receive
* notification of changes to the repository.
*/
public static interface Callback {
public void passBegin() throws InterruptedException; public void newDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void deletedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void changedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void passComplete(MonitorCheckpoint mcp) throws InterruptedException; public boolean hasEnqueuedAtLeastOneChangeThisPass(); public void passPausing(int sleepms) throws InterruptedException;
}
在ChangeQueue队列类内部定义了内部类Callback,实现了该接口,在其实现方法里面将提交的数据添加到ChangeQueue队列类的成员阻塞队列之中
/**
* 回调接口实现:向阻塞队列pendingChanges加入Change元素
* Adds {@link Change Changes} to this queue.
*/
private class Callback implements DocumentSnapshotRepositoryMonitor.Callback {
private int changeCount = 0; public void passBegin() {
changeCount = 0;
activityLogger.scanBeginAt(new Timestamp(System.currentTimeMillis()));
} /* @Override */
public void changedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotChangedDocument(dh.getDocumentId());
} /* @Override */
public void deletedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.INTERNAL, dh, mcp));
activityLogger.gotDeletedDocument(dh.getDocumentId());
} /* @Override */
public void newDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotNewDocument(dh.getDocumentId());
} /* @Override */
public void passComplete(MonitorCheckpoint mcp) throws InterruptedException {
activityLogger.scanEndAt(new Timestamp(System.currentTimeMillis()));
if (introduceDelayAfterEveryScan || changeCount == 0) {
Thread.sleep(sleepInterval);
}
} public boolean hasEnqueuedAtLeastOneChangeThisPass() {
return changeCount > 0;
} /* @Override */
public void passPausing(int sleepms) throws InterruptedException {
Thread.sleep(sleepms);
}
}
---------------------------------------------------------------------------
本系列企业搜索引擎开发之连接器connector系本人原创
转载请注明出处 博客园 刺猬的温驯
本人邮箱: chenying998179@163#com (#改为.)
本文链接 http://www.cnblogs.com/chenying99/p/3789505.html
企业搜索引擎开发之连接器connector(二十六)的更多相关文章
- 企业搜索引擎开发之连接器connector(十六)
本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...
- 企业搜索引擎开发之连接器connector(十九)
连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...
- 企业搜索引擎开发之连接器connector(十八)
创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...
- 企业搜索引擎开发之连接器connector(二十九)
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
- 企业搜索引擎开发之连接器connector(二十八)
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
- 企业搜索引擎开发之连接器connector(二十五)
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...
- 企业搜索引擎开发之连接器connector(二十四)
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...
- 企业搜索引擎开发之连接器connector(二十二)
下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...
- 企业搜索引擎开发之连接器connector(二十)
连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...
随机推荐
- [转]无网络环境,在Windows Server 2008 R2和SQL Server 2008R2环境安装SharePoint2013 RT
无网络环境,在Windows Server 2008 R2和SQL Server 2008R2环境安装SharePoint2013 RT,这个还有点麻烦,所以记录一下,下次遇到省得绕弯路.进入正题: ...
- cassandra多数据中心的配置
cassandra默认建keyspace的时候,是需要制定拓扑策略的,小数据就直接用单数据中心的simpleStrategy了,网上资料都没具体提如何配置多数据中心,这里简单贴一下 cassandra ...
- Unicode化
为了程序编写方便,根除乱码问题等等需求,很多新项目都采用了Unicode编码.同时,不少使用MBCS多字节编码的旧项目为了升级,也有了转向Unicode编码的意向.不过,从MBCS升级到Unicode ...
- 黄聪:WordPress 多站点建站教程(二):后台(管理网络)设置详解,如何管理子站的用户、主题、插件、设置等功能
建立好了子站,我们需要有个地方配置所有子站的主题.插件等功能,我们可以在后台看到 我的站点--管理网络 如下图: 在 管理网络--仪表盘 里面,我们可以创新用户和站点,也提供了查询功能. 要注意的是: ...
- 注解反射原理(IOC框架)
IOC(Inversion of Control):控制反转.采用配置文件和注解的方式,将成员变量通过反射注入,舍弃new的方式,降低了耦合度. 反射:JAVA反射机制是在运行状态中,对于任意一个类, ...
- springboot中对yaml文件的解析
一.YAML是“YAML不是一种标记语言”的外语缩写 (见前方参考资料原文内容):但为了强调这种语言以数据做为中心,而不是以置标语言为重点,而用返璞词重新命名.它是一种直观的能够被电脑识别的数据序列化 ...
- NP、NPC、NP-hard问题的定义
NP-hard问题 定义:NP-hard问题是这样的问题,只要其中某个问题可以在P时间内解决,那么所有的NP问题就都可以在P时间内解决了.NP-c问题就是NP-hard问题.但注意NP-hard ...
- Mongodb 集群加keyFile认证
介绍 自从远古计绳结开始,数据库的存储就注定了今天的地位和多样性,Nosql的出现更是解决了现有的关系型数据库无法解决的一些难题,对高性能,灵活度,扩展性,海量数据的问题.随之而出现的高速内存索引数据 ...
- 设置VMWARE通过桥接方式使用主机无线网卡上网(转载)
其它人的评论:好文,使用bridged最简单;桥接也可选择自定义的虚拟网络-->VMnet1,这样可以让虚机的IP于主机不同,与VMnet1相同.将主机的的网络访问共享给VMnet1(这个法子没 ...
- FireDAC 汉字字段名称过滤
[FireDAC][Stan][Eval]-107. Invalid character found [ 拼音码 like '%A%' ] 英文字段名称过滤正常 汉字字段名过滤报错. 莫非不支持汉字字 ...