Solr4.8.0源码分析(3)之index的线程池管理
Solr4.8.0源码分析(3)之index的线程池管理
Solr建索引时候是有最大的线程数限制的,它由solrconfig.xml的<maxIndexingThreads>8</maxIndexingThreads>控制的,该值等于8就是说Solr最多只能用8个线程来进行updatedocument。
那么Solr建索引时候是怎么管理线程池的呢,主要是通过ThreadAffinityDocumentsWriterThreadPool来进行管理的,它继承了DocumentsWriterPerThreadPool类。ThreadAffinityDocumentsWriterThreadPool的结构并不复杂,主要的一个函数是getAndLock()。
在建索引时候即updatedocuments时候,Solr先要调用getAndLock去获取ThreadState这个锁。而ThreadState这个锁就是存放在ThreadAffinityDocumentsWriterThreadPool的threadBings这个线程池里面。
首先先看下什么是ThreadState锁,源码如下:
ThreadState是DocumentsWriterPerThreadPool的一个内部类。它包含了一个DocumentsWriterPerThread类的实例以及状态控制,DocumentsWriterPerThread是线程池的一个线程,主要作用是索引的建立。该类比较简单就不详细介绍了。
/**
* {@link ThreadState} references and guards a
* {@link DocumentsWriterPerThread} instance that is used during indexing to
* build a in-memory index segment. {@link ThreadState} also holds all flush
* related per-thread data controlled by {@link DocumentsWriterFlushControl}.
* <p>
* A {@link ThreadState}, its methods and members should only accessed by one
* thread a time. Users must acquire the lock via {@link ThreadState#lock()}
* and release the lock in a finally block via {@link ThreadState#unlock()}
* before accessing the state.
*/
@SuppressWarnings("serial")
final static class ThreadState extends ReentrantLock {
DocumentsWriterPerThread dwpt;
// TODO this should really be part of DocumentsWriterFlushControl
// write access guarded by DocumentsWriterFlushControl
volatile boolean flushPending = false;
// TODO this should really be part of DocumentsWriterFlushControl
// write access guarded by DocumentsWriterFlushControl
long bytesUsed = 0;
// guarded by Reentrant lock
private boolean isActive = true; ThreadState(DocumentsWriterPerThread dpwt) {
this.dwpt = dpwt;
} /**
* Resets the internal {@link DocumentsWriterPerThread} with the given one.
* if the given DWPT is <code>null</code> this ThreadState is marked as inactive and should not be used
* for indexing anymore.
* @see #isActive()
*/ private void deactivate() {
assert this.isHeldByCurrentThread();
isActive = false;
reset();
} private void reset() {
assert this.isHeldByCurrentThread();
this.dwpt = null;
this.bytesUsed = 0;
this.flushPending = false;
} /**
* Returns <code>true</code> if this ThreadState is still open. This will
* only return <code>false</code> iff the DW has been closed and this
* ThreadState is already checked out for flush.
*/
boolean isActive() {
assert this.isHeldByCurrentThread();
return isActive;
} boolean isInitialized() {
assert this.isHeldByCurrentThread();
return isActive() && dwpt != null;
} /**
* Returns the number of currently active bytes in this ThreadState's
* {@link DocumentsWriterPerThread}
*/
public long getBytesUsedPerThread() {
assert this.isHeldByCurrentThread();
// public for FlushPolicy
return bytesUsed;
} /**
* Returns this {@link ThreadState}s {@link DocumentsWriterPerThread}
*/
public DocumentsWriterPerThread getDocumentsWriterPerThread() {
assert this.isHeldByCurrentThread();
// public for FlushPolicy
return dwpt;
} /**
* Returns <code>true</code> iff this {@link ThreadState} is marked as flush
* pending otherwise <code>false</code>
*/
public boolean isFlushPending() {
return flushPending;
}
}
/**
* A {@link DocumentsWriterPerThreadPool} implementation that tries to assign an
* indexing thread to the same {@link ThreadState} each time the thread tries to
* obtain a {@link ThreadState}. Once a new {@link ThreadState} is created it is
* associated with the creating thread. Subsequently, if the threads associated
* {@link ThreadState} is not in use it will be associated with the requesting
* thread. Otherwise, if the {@link ThreadState} is used by another thread
* {@link ThreadAffinityDocumentsWriterThreadPool} tries to find the currently
* minimal contended {@link ThreadState}.
*/
class ThreadAffinityDocumentsWriterThreadPool extends DocumentsWriterPerThreadPool {
private Map<Thread, ThreadState> threadBindings = new ConcurrentHashMap<>(); /**
* Creates a new {@link ThreadAffinityDocumentsWriterThreadPool} with a given maximum of {@link ThreadState}s.
*/
public ThreadAffinityDocumentsWriterThreadPool(int maxNumPerThreads) {
super(maxNumPerThreads);
assert getMaxThreadStates() >= 1;
} @Override
public ThreadState getAndLock(Thread requestingThread, DocumentsWriter documentsWriter) {
ThreadState threadState = threadBindings.get(requestingThread);
if (threadState != null && threadState.tryLock()) {
return threadState;
}
ThreadState minThreadState = null; /* TODO -- another thread could lock the minThreadState we just got while
we should somehow prevent this. */
// Find the state that has minimum number of threads waiting
minThreadState = minContendedThreadState();
if (minThreadState == null || minThreadState.hasQueuedThreads()) {
final ThreadState newState = newThreadState(); // state is already locked if non-null
if (newState != null) {
assert newState.isHeldByCurrentThread();
threadBindings.put(requestingThread, newState);
return newState;
} else if (minThreadState == null) {
/*
* no new threadState available we just take the minContented one
* This must return a valid thread state since we accessed the
* synced context in newThreadState() above.
*/
minThreadState = minContendedThreadState();
}
}
assert minThreadState != null: "ThreadState is null"; minThreadState.lock();
return minThreadState;
} @Override
public ThreadAffinityDocumentsWriterThreadPool clone() {
ThreadAffinityDocumentsWriterThreadPool clone = (ThreadAffinityDocumentsWriterThreadPool) super.clone();
clone.threadBindings = new ConcurrentHashMap<>();
return clone;
}
}
再回到ThreadAffinityDocumentWriterThreadPool类。getAndLock的主要流程如下:
1. 请求线程requestingThread需要进行updatedocument操作,它首先会尝试从线程池threadBings获取自身线程的ThreadState锁并尝试去锁它即trylock。如果锁成功了,那么它就能再度获取到自身线程的ThreadState,这是最好的一种情况。
2. 如果自身线程的trylock失败,说明该ThreadState已经被别的requestingThread线程抢去,那么请求线程requestingThread只能去线程池threadBings获取别的线程。获取的规则是minContendedThreadState(),源码如下所示.
minContendedThreadState的规则就是遍历所有活跃的ThreadState,如果ThreadState的队列内元素个数最少(即等待这个ThreadState的线程最少),那么这个ThreadState就是返回的那个ThreadState,即minThreadState.
/**
* Returns the ThreadState with the minimum estimated number of threads
* waiting to acquire its lock or <code>null</code> if no {@link ThreadState}
* is yet visible to the calling thread.
*/
ThreadState minContendedThreadState() {
ThreadState minThreadState = null;
final int limit = numThreadStatesActive;
for (int i = 0; i < limit; i++) {
final ThreadState state = threadStates[i];
if (minThreadState == null || state.getQueueLength() < minThreadState.getQueueLength()) {
minThreadState = state;
}
}
return minThreadState;
}
3. 如果minThreadState==null(一般是第一个获取ThreadState这种情况)或者minThreadState有其他线程在等待(正常情况下都会有线程在等的),那么requestingThread会去申请新的ThreadState,即从maxIndexingThreads的线程里申请,源码如下。
threadStates是一个ThreadStates的数组,当需要threadBings的ThreadState个数(也就是活跃的线程)小于threadStates的元素个数(maxIndexingThreads)时就能申请到新的ThreadState。
/**
* Returns a new {@link ThreadState} iff any new state is available otherwise
* <code>null</code>.
* <p>
* NOTE: the returned {@link ThreadState} is already locked iff non-
* <code>null</code>.
*
* @return a new {@link ThreadState} iff any new state is available otherwise
* <code>null</code>
*/
synchronized ThreadState newThreadState() {
if (numThreadStatesActive < threadStates.length) {
final ThreadState threadState = threadStates[numThreadStatesActive];
threadState.lock(); // lock so nobody else will get this ThreadState
boolean unlock = true;
try {
if (threadState.isActive()) {
// unreleased thread states are deactivated during DW#close()
numThreadStatesActive++; // increment will publish the ThreadState
assert threadState.dwpt == null;
unlock = false;
return threadState;
}
// unlock since the threadstate is not active anymore - we are closed!
assert assertUnreleasedThreadStatesInactive();
return null;
} finally {
if (unlock) {
// in any case make sure we unlock if we fail
threadState.unlock();
}
}
}
return null;
}
4. 如果minContentedThreadState获取成功,那么threadBings的线程池就会得到更新。如果minContentedThreadState获取失败,那么说明threadStates数组以及分配完全,那么请求线程会再去取获取minContentedThreadState。
5. 最后请求线程会去lock minThreadState,如果lock失败就进入休眠,一直等到lock成功。这是最不好的一种结果。
最后在源码说道,请求线程在获取minThreadState时候别的线程也有可能获取到该minThreadState,目前来说这是一种缺陷。
<maxIndexingThreads>8</maxIndexingThreads>这个配置对建索引的性能有较大影响,如果太小那么建索引时候等待情况就会较多。如果太大又增加服务器的负荷,所以要综合选择。
Solr4.8.0源码分析(3)之index的线程池管理的更多相关文章
- Solr4.8.0源码分析(25)之SolrCloud的Split流程
		
Solr4.8.0源码分析(25)之SolrCloud的Split流程(一) 题记:昨天有位网友问我SolrCloud的split的机制是如何的,这个还真不知道,所以今天抽空去看了Split的原理,大 ...
 - Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五)
		
Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五) 题记:关于SolrCloud的Recovery策略已经写了四篇了,这篇应该是系统介绍Recovery策略的最后一篇了 ...
 - Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四)
		
Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四) 题记:本来计划的SolrCloud的Recovery策略的文章是3篇的,但是没想到Recovery的内容蛮多的,前面 ...
 - Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三)
		
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三) 本文是SolrCloud的Recovery策略系列的第三篇文章,前面两篇主要介绍了Recovery的总体流程,以及P ...
 - Solr4.8.0源码分析(21)之SolrCloud的Recovery策略(二)
		
Solr4.8.0源码分析(21)之SolrCloud的Recovery策略(二) 题记: 前文<Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一)>中提 ...
 - Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一)
		
Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一) 题记: 我们在使用SolrCloud中会经常发现会有备份的shard出现状态Recoverying,这就表明Solr ...
 - Solr4.8.0源码分析(14)之SolrCloud索引深入(1)
		
Solr4.8.0源码分析(14) 之 SolrCloud索引深入(1) 上一章节<Solr In Action 笔记(4) 之 SolrCloud分布式索引基础>简要学习了SolrClo ...
 - Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2)
		
Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2) 上一节主要介绍了SolrCloud分布式索引的整体流程图以及索引链的实现,那么本节开始将分别介绍三个索引过程即LogUpdat ...
 - Solr4.8.0源码分析(19)之缓存机制(二)
		
Solr4.8.0源码分析(19)之缓存机制(二) 前文<Solr4.8.0源码分析(18)之缓存机制(一)>介绍了Solr缓存的生命周期,重点介绍了Solr缓存的warn过程.本节将更深 ...
 
随机推荐
- M - 昂贵的聘礼 - poj1062
			
Description 年轻的探险家来到了一个印第安部落里.在那里他和酋长的女儿相爱了,于是便向酋长去求亲.酋长要他用10000个金币作为聘礼才答应把女儿嫁给他.探险家拿不出这么多金币,便请求酋长降低 ...
 - C#中的 IList, ICollection ,IEnumerable 和 IEnumerator
			
IList, ICollection ,IEnumerable 很显然,这些都是集合接口的定义,先看看定义: // 摘要: // 表示可按照索引单独访问的对象的非泛型集合. [ComVisible(t ...
 - Redis 实现用户积分排行榜
			
排行榜功能是一个很普遍的需求.使用 Redis 中有序集合的特性来实现排行榜是又好又快的选择. 一般排行榜都是有实效性的,比如“用户积分榜”.如果没有实效性一直按照总榜来排,可能榜首总是几个老用户,对 ...
 - [转] Nginx模块开发入门
			
前言 Nginx是当前最流行的HTTP Server之一,根据W3Techs的统计,目前世界排名(根据Alexa)前100万的网站中,Nginx的占有率为6.8%.与Apache相比,Nginx在高并 ...
 - TCP/IP协议原理与应用笔记11:TCP/IP中地址与层次关系
			
1. 网络中常用的地址: 2. TCP/IP中地址与层次关系 :
 - Android(java)学习笔记240:多媒体之图形颜色的变化
			
1.相信大家都用过美图秀秀中如下的功能,调整颜色: 2. 下面通过案例说明Android中如何调色: 颜色矩阵 ColorMatrix cm = new ColorMatrix(); paint.se ...
 - SPOJ 4053 - Card Sorting 最长不下降子序列
			
我们的男主现在手中有n*c张牌,其中有c(<=4)种颜色,每种颜色有n(<=100)张,现在他要排序,首先把相同的颜色的牌放在一起,颜色相同的按照序号从小到大排序.现在他想要让牌的移动次数 ...
 - ACM vim配置
			
ACM现场赛时用的,比较简短,但是主要的功能都有了. 直接打开终端输入gedit ~/.vimrc 把下面的东西复制到里面就行了. filetype plugin indent on colo eve ...
 - <html:form>、 <html:text>、<html:password>、<html:submit> 标签
			
用Struts标签来写表单元件, 引用: <%@ taglib uri="/tags/struts-html" prefix="html" %> 例 ...
 - C#截取字符串的方法小结
			
1.根据单个分隔字符用split截取 string st="GT123_1"; string[] sArray=st.split("_"); 输出:sArray ...