Solr4.8.0源码分析(3)之index的线程池管理

Solr建索引时候是有最大的线程数限制的，它由solrconfig.xml的<maxIndexingThreads>8</maxIndexingThreads>控制的，该值等于8就是说Solr最多只能用8个线程来进行updatedocument。

那么Solr建索引时候是怎么管理线程池的呢，主要是通过ThreadAffinityDocumentsWriterThreadPool来进行管理的，它继承了DocumentsWriterPerThreadPool类。ThreadAffinityDocumentsWriterThreadPool的结构并不复杂，主要的一个函数是getAndLock()。

在建索引时候即updatedocuments时候，Solr先要调用getAndLock去获取ThreadState这个锁。而ThreadState这个锁就是存放在ThreadAffinityDocumentsWriterThreadPool的threadBings这个线程池里面。

首先先看下什么是ThreadState锁，源码如下：

ThreadState是DocumentsWriterPerThreadPool的一个内部类。它包含了一个DocumentsWriterPerThread类的实例以及状态控制，DocumentsWriterPerThread是线程池的一个线程，主要作用是索引的建立。该类比较简单就不详细介绍了。

  /**

    * {@link ThreadState} references and guards a

    * {@link DocumentsWriterPerThread} instance that is used during indexing to

    * build a in-memory index segment. {@link ThreadState} also holds all flush

    * related per-thread data controlled by {@link DocumentsWriterFlushControl}.

    * <p>

    * A {@link ThreadState}, its methods and members should only accessed by one

    * thread a time. Users must acquire the lock via {@link ThreadState#lock()}

    * and release the lock in a finally block via {@link ThreadState#unlock()}

    * before accessing the state.

    */

   @SuppressWarnings("serial")

   final static class ThreadState extends ReentrantLock {

     DocumentsWriterPerThread dwpt;

     // TODO this should really be part of DocumentsWriterFlushControl

     // write access guarded by DocumentsWriterFlushControl

     volatile boolean flushPending = false;

     // TODO this should really be part of DocumentsWriterFlushControl

     // write access guarded by DocumentsWriterFlushControl

     long bytesUsed = 0;

     // guarded by Reentrant lock

     private boolean isActive = true;

     ThreadState(DocumentsWriterPerThread dpwt) {

       this.dwpt = dpwt;

     }

     /**

      * Resets the internal {@link DocumentsWriterPerThread} with the given one.

      * if the given DWPT is <code>null</code> this ThreadState is marked as inactive and should not be used

      * for indexing anymore.

      * @see #isActive()

      */

     private void deactivate() {

       assert this.isHeldByCurrentThread();

       isActive = false;

       reset();

     }

     private void reset() {

       assert this.isHeldByCurrentThread();

       this.dwpt = null;

       this.bytesUsed = 0;

       this.flushPending = false;

     }

     /**

      * Returns <code>true</code> if this ThreadState is still open. This will

      * only return <code>false</code> iff the DW has been closed and this

      * ThreadState is already checked out for flush.

      */

     boolean isActive() {

       assert this.isHeldByCurrentThread();

       return isActive;

     }

     boolean isInitialized() {

       assert this.isHeldByCurrentThread();

       return isActive() && dwpt != null;

     }

     /**

      * Returns the number of currently active bytes in this ThreadState's

      * {@link DocumentsWriterPerThread}

      */

     public long getBytesUsedPerThread() {

       assert this.isHeldByCurrentThread();

       // public for FlushPolicy

       return bytesUsed;

     }

     /**

      * Returns this {@link ThreadState}s {@link DocumentsWriterPerThread}

      */

     public DocumentsWriterPerThread getDocumentsWriterPerThread() {

       assert this.isHeldByCurrentThread();

       // public for FlushPolicy

       return dwpt;

     }

     /**

      * Returns <code>true</code> iff this {@link ThreadState} is marked as flush

      * pending otherwise <code>false</code>

      */

     public boolean isFlushPending() {

       return flushPending;

     }

   }

 /**

  * A {@link DocumentsWriterPerThreadPool} implementation that tries to assign an

  * indexing thread to the same {@link ThreadState} each time the thread tries to

  * obtain a {@link ThreadState}. Once a new {@link ThreadState} is created it is

  * associated with the creating thread. Subsequently, if the threads associated

  * {@link ThreadState} is not in use it will be associated with the requesting

  * thread. Otherwise, if the {@link ThreadState} is used by another thread

  * {@link ThreadAffinityDocumentsWriterThreadPool} tries to find the currently

  * minimal contended {@link ThreadState}.

  */

 class ThreadAffinityDocumentsWriterThreadPool extends DocumentsWriterPerThreadPool {

   private Map<Thread, ThreadState> threadBindings = new ConcurrentHashMap<>();

   /**

    * Creates a new {@link ThreadAffinityDocumentsWriterThreadPool} with a given maximum of {@link ThreadState}s.

    */

   public ThreadAffinityDocumentsWriterThreadPool(int maxNumPerThreads) {

     super(maxNumPerThreads);

     assert getMaxThreadStates() >= 1;

   }

   @Override

   public ThreadState getAndLock(Thread requestingThread, DocumentsWriter documentsWriter) {

     ThreadState threadState = threadBindings.get(requestingThread);

     if (threadState != null && threadState.tryLock()) {

       return threadState;

     }

     ThreadState minThreadState = null;

     /* TODO -- another thread could lock the minThreadState we just got while

      we should somehow prevent this. */

     // Find the state that has minimum number of threads waiting

     minThreadState = minContendedThreadState();

     if (minThreadState == null || minThreadState.hasQueuedThreads()) {

       final ThreadState newState = newThreadState(); // state is already locked if non-null

       if (newState != null) {

         assert newState.isHeldByCurrentThread();

         threadBindings.put(requestingThread, newState);

         return newState;

       } else if (minThreadState == null) {

         /*

          * no new threadState available we just take the minContented one

          * This must return a valid thread state since we accessed the

          * synced context in newThreadState() above.

          */

         minThreadState = minContendedThreadState();

       }

     }

     assert minThreadState != null: "ThreadState is null";

     minThreadState.lock();

     return minThreadState;

   }

   @Override

   public ThreadAffinityDocumentsWriterThreadPool clone() {

     ThreadAffinityDocumentsWriterThreadPool clone = (ThreadAffinityDocumentsWriterThreadPool) super.clone();

     clone.threadBindings = new ConcurrentHashMap<>();

     return clone;

   }

 }

再回到ThreadAffinityDocumentWriterThreadPool类。getAndLock的主要流程如下:

1. 请求线程requestingThread需要进行updatedocument操作，它首先会尝试从线程池threadBings获取自身线程的ThreadState锁并尝试去锁它即trylock。如果锁成功了，那么它就能再度获取到自身线程的ThreadState，这是最好的一种情况。

2. 如果自身线程的trylock失败，说明该ThreadState已经被别的requestingThread线程抢去，那么请求线程requestingThread只能去线程池threadBings获取别的线程。获取的规则是minContendedThreadState(),源码如下所示.

minContendedThreadState的规则就是遍历所有活跃的ThreadState，如果ThreadState的队列内元素个数最少(即等待这个ThreadState的线程最少)，那么这个ThreadState就是返回的那个ThreadState，即minThreadState.

   /**

    * Returns the ThreadState with the minimum estimated number of threads

    * waiting to acquire its lock or <code>null</code> if no {@link ThreadState}

    * is yet visible to the calling thread.

    */

   ThreadState minContendedThreadState() {

     ThreadState minThreadState = null;

     final int limit = numThreadStatesActive;

     for (int i = 0; i < limit; i++) {

       final ThreadState state = threadStates[i];

       if (minThreadState == null || state.getQueueLength() < minThreadState.getQueueLength()) {

         minThreadState = state;

       }

     }

     return minThreadState;

   }

3. 如果minThreadState==null(一般是第一个获取ThreadState这种情况)或者minThreadState有其他线程在等待(正常情况下都会有线程在等的)，那么requestingThread会去申请新的ThreadState，即从maxIndexingThreads的线程里申请，源码如下。

threadStates是一个ThreadStates的数组，当需要threadBings的ThreadState个数(也就是活跃的线程)小于threadStates的元素个数（maxIndexingThreads）时就能申请到新的ThreadState。

   /**

    * Returns a new {@link ThreadState} iff any new state is available otherwise

    * <code>null</code>.

    * <p>

    * NOTE: the returned {@link ThreadState} is already locked iff non-

    * <code>null</code>.

    *

    * @return a new {@link ThreadState} iff any new state is available otherwise

    *         <code>null</code>

    */

   synchronized ThreadState newThreadState() {

     if (numThreadStatesActive < threadStates.length) {

       final ThreadState threadState = threadStates[numThreadStatesActive];

       threadState.lock(); // lock so nobody else will get this ThreadState

       boolean unlock = true;

       try {

         if (threadState.isActive()) {

           // unreleased thread states are deactivated during DW#close()

           numThreadStatesActive++; // increment will publish the ThreadState

           assert threadState.dwpt == null;

           unlock = false;

           return threadState;

         }

         // unlock since the threadstate is not active anymore - we are closed!

         assert assertUnreleasedThreadStatesInactive();

         return null;

       } finally {

         if (unlock) {

           // in any case make sure we unlock if we fail

           threadState.unlock();

         }

       }

     }

     return null;

   }

4. 如果minContentedThreadState获取成功，那么threadBings的线程池就会得到更新。如果minContentedThreadState获取失败，那么说明threadStates数组以及分配完全，那么请求线程会再去取获取minContentedThreadState。

5. 最后请求线程会去lock minThreadState，如果lock失败就进入休眠，一直等到lock成功。这是最不好的一种结果。

最后在源码说道，请求线程在获取minThreadState时候别的线程也有可能获取到该minThreadState，目前来说这是一种缺陷。

<maxIndexingThreads>8</maxIndexingThreads>这个配置对建索引的性能有较大影响，如果太小那么建索引时候等待情况就会较多。如果太大又增加服务器的负荷，所以要综合选择。

Solr4.8.0源码分析(3)之index的线程池管理的更多相关文章

Solr4.8.0源码分析(25)之SolrCloud的Split流程
Solr4.8.0源码分析(25)之SolrCloud的Split流程(一) 题记:昨天有位网友问我SolrCloud的split的机制是如何的,这个还真不知道,所以今天抽空去看了Split的原理,大 ...
Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五)
Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五) 题记:关于SolrCloud的Recovery策略已经写了四篇了,这篇应该是系统介绍Recovery策略的最后一篇了 ...
Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四)
Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四) 题记:本来计划的SolrCloud的Recovery策略的文章是3篇的,但是没想到Recovery的内容蛮多的,前面 ...
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三)
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三) 本文是SolrCloud的Recovery策略系列的第三篇文章,前面两篇主要介绍了Recovery的总体流程,以及P ...
Solr4.8.0源码分析(21)之SolrCloud的Recovery策略(二)
Solr4.8.0源码分析(21)之SolrCloud的Recovery策略(二) 题记: 前文<Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一)>中提 ...
Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一)
Solr4.8.0源码分析(20)之SolrCloud的Recovery策略(一) 题记: 我们在使用SolrCloud中会经常发现会有备份的shard出现状态Recoverying,这就表明Solr ...
Solr4.8.0源码分析(14)之SolrCloud索引深入(1)
Solr4.8.0源码分析(14) 之 SolrCloud索引深入(1) 上一章节<Solr In Action 笔记(4) 之 SolrCloud分布式索引基础>简要学习了SolrClo ...
Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2)
Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2) 上一节主要介绍了SolrCloud分布式索引的整体流程图以及索引链的实现,那么本节开始将分别介绍三个索引过程即LogUpdat ...
Solr4.8.0源码分析(19)之缓存机制(二)
Solr4.8.0源码分析(19)之缓存机制(二) 前文<Solr4.8.0源码分析(18)之缓存机制(一)>介绍了Solr缓存的生命周期,重点介绍了Solr缓存的warn过程.本节将更深 ...

随机推荐

Android中调用Paint的measureText()方法取得字符串显示的宽度值
1 public static float GetTextWidth(String text, float Size) { //第一个参数是要计算的字符串,第二个参数是字提大小 2 T ...
Android网络：开发浏览器（一）——基本的浏览网页功能开发
我们定义这个版本为1.0版本. 首先,因为要制作一个浏览器,那么就不能通过调用内置浏览器来实现网页的浏览功能,但是可以使用WebView组件来进行. 在此之前,我们可以来看看两种网页显示方式: ...
Tomcat Server Locations
SEDA工作笔记（一）
摘要在普遍认知中,软件开发实践是一项充满不确定性的工作,这是由于编码工作占据了其绝大部分的工作,而编码本身就是具有极大不确定性的.同样,计算机科学被视作一门门槛低,基于经验,而无理论意义的纯工程类学 ...
linux防火墙基础知识
转 http://drops.wooyun.org/tips/1424 iptables介绍 linux的包过滤功能,即linux防火墙,它由netfilter 和 iptables 两个组件组成. ...
合肥三洋股份，惠而浦家电携四大品牌-Take ，所有的市场
大家都知道,数家电企业的日子并不好过.一方面,产品同质化竞争越发激烈.家电市场已进入了恶性价格战时代.还有一方面,消费者对家电产品的需求越发多元化.个性化.这意味着无法满足消费者需求的产品非常 ...
iOS：编译错误Undefined symbols for architecture i386: _OBJC_CLASS_$_XXX", referenced from: error
Undefined symbols for architecture i386: _OBJC_CLASS_$_XXX", referenced from: error 这个意思为无法找到名为 ...
Effective C++ 总结（三）
五.实现条款26:尽可能延后变量定义式的出现时间如果你定义了一个变量且该类型带一个构造函数或析构函数,当程序到达该变量时,你要承受构造成本,而离开作用域时,你要承受析构成本.为了减少这个成本,最 ...
Android -- 官方下拉刷新SwipeRefreshLayout
V4的兼容包 API 大概就这4个常用的方法. code 布局 <RelativeLayout xmlns:android="http://schemas.android.com/ap ...
HDU 2476 String painter(区间dp)
题意: 给定两个字符串,让求最少的变化次数从第一个串变到第二个串思路: 区间dp, 直接考虑两个串的话太困难,就只考虑第二个串,求从空白串变到第二个串的最小次数,dp[i][j] 表示i->j ...

Solr4.8.0源码分析(3)之index的线程池管理

Solr4.8.0源码分析(3)之index的线程池管理

Solr4.8.0源码分析(3)之index的线程池管理的更多相关文章

随机推荐

热门专题