原创博文,转载请注明出处

今天在学习python进程与线程时,无意间发现了线程池threadpool模块,见官方文档

模块使用非常简单,前提是得需要熟悉线程池的工作原理。

我们知道系统处理任务时,需要为每个请求创建和销毁对象。当有大量并发任务需要处理时,再使用传统的多线程就会造成大量的资源创建销毁导致服务器效率的下降。这时候,线程池就派上用场了。线程池技术为线程创建、销毁的开销问题和系统资源不足问题提供了很好的解决方案。

优点:

(1)可以控制产生线程的数量。通过预先创建一定数量的工作线程并限制其数量,控制线程对象的内存消耗。(2)降低系统开销和资源消耗。通过对多个请求重用线程,线程创建、销毁的开销被分摊到了多个请求上。另外通过限制线程数量,降低虚拟机在垃圾回收方面的开销。(3)提高系统响应速度。线程事先已被创建,请求到达时可直接进行处理,消除了因线程创建所带来的延迟,另外多个线程可并发处理。

 线程池的基本实现方法:
  (1)线程池管理器。创建并维护线程池,根据需要调整池的大小,并监控线程泄漏现象。
  (2)工作线程。它是一个可以循环执行任务的线程,没有任务时处于 Wait 状态,新任务到达时可被唤醒。
  (3)任务队列。它提供一种缓冲机制,用以临时存放待处理的任务,同时作为并发线程的 monitor 对象。
  (4)任务接口。它是每个任务必须实现的接口,工作线程通过该接口调度任务的执行。
      构建线程池管理器时,首先初始化任务队列(Queue),运行时通过调用添加任务的方法将任务添加到任务队列中。之后创建并启动一定数量的工作线程,将这些线程保存在线程队列中。线程池管理器在运行时可根据需要增加或减少工作线程数量。工作线程运行时首先锁定任务队列,以保证多线程对任务队列的正确并发访问,如队列中有待处理的任务,工作线程取走一个任务并释放对任务队列的锁定,以便其他线程实现对任务队列的访问和处理。在获取任务之后工作线程调用任务接口完成对任务的处理。当任务队列为空时,工作线程加入到任务队列的等待线程列表中,此时工作线程处于 Wait 状态,几乎不占 CPU 资源。一旦新的任务到达,通过调用任务列表对象的notify方法,从等待线程列表中唤醒一个工作线程以对任务进行处理。通过这种协作模式,既节省了线程创建、销毁的开销,又保证了对任务的并发处理,提高了系统的响应速度。

简而言之:就是把并发执行的任务传递给一个线程池,来替代为每个并发执行的任务都启动一个新的线程。只要池里有空闲的线程,任务就会分配给一个线程执行。

 pool = ThreadPool(poolsize)
requests = makeRequests(some_callable,list_of_args,callback)
[pool.putRequest(req) for req in requests]
pool.wait()

第一行的意思是创建一个可存放poolsize个数目的线程的线程池。

第二行的意思是调用makeRequests创建请求。 some_callable是需要开启多线程处理的函数,list_of_args是函数参数,callback是可选参数回调,默认是无。

第三行的意思是把运行多线程的函数放入线程池中。

最后一行的意思是等待所有的线程完成工作后退出。

通过分析源代码,其实发现里面的内容很简单。

 import sys
import threading
import Queue
import traceback # exceptions
class NoResultsPending(Exception):
"""All work requests have been processed."""
pass class NoWorkersAvailable(Exception):
"""No worker threads available to process remaining requests."""
pass # internal module helper functions
def _handle_thread_exception(request, exc_info):
"""Default exception handler callback function. This just prints the exception info via ``traceback.print_exception``. """
traceback.print_exception(*exc_info) # utility functions
def makeRequests(callable_, args_list, callback=None, #用来创建多个任务请求 callback是回调函数处理结果,exc_callback是用来处理发生的异常
exc_callback=_handle_thread_exception):
"""Create several work requests for same callable with different arguments. Convenience function for creating several work requests for the same
callable where each invocation of the callable receives different values
for its arguments. ``args_list`` contains the parameters for each invocation of callable.
Each item in ``args_list`` should be either a 2-item tuple of the list of
positional arguments and a dictionary of keyword arguments or a single,
non-tuple argument. See docstring for ``WorkRequest`` for info on ``callback`` and
``exc_callback``. """
requests = []
for item in args_list:
if isinstance(item, tuple):
requests.append(
WorkRequest(callable_, item[0], item[1], callback=callback,
exc_callback=exc_callback)
)
else:
requests.append(
WorkRequest(callable_, [item], None, callback=callback,
exc_callback=exc_callback)
)
return requests # classes
class WorkerThread(threading.Thread): #工作线程
"""Background thread connected to the requests/results queues. A worker thread sits in the background and picks up work requests from
one queue and puts the results in another until it is dismissed. """ def __init__(self, requests_queue, results_queue, poll_timeout=5, **kwds):
"""Set up thread in daemonic mode and start it immediatedly. ``requests_queue`` and ``results_queue`` are instances of
``Queue.Queue`` passed by the ``ThreadPool`` class when it creates a new
worker thread. """
threading.Thread.__init__(self, **kwds)
self.setDaemon(1)
self._requests_queue = requests_queue #任务队列
self._results_queue = results_queue #结果队列
self._poll_timeout = poll_timeout
self._dismissed = threading.Event()
self.start() def run(self):
"""Repeatedly process the job queue until told to exit."""
while True:
if self._dismissed.isSet(): #如果标识位设为True,则表示线程非阻塞
# we are dismissed, break out of loop
break
# get next work request. If we don't get a new request from the
# queue after self._poll_timout seconds, we jump to the start of
# the while loop again, to give the thread a chance to exit.
try:
request = self._requests_queue.get(True, self._poll_timeout)#获取待处理任务,block设为True,标识线程同步 ,并设置超时时间
except Queue.Empty:
continue
else:
if self._dismissed.isSet():再次判断,因为在取任务期间,线程有可能被挂起
# we are dismissed, put back request in queue and exit loop
self._requests_queue.put(request) #添加任务到任务队列
break
try:
result = request.callable(*request.args, **request.kwds)
self._results_queue.put((request, result))
except:
request.exception = True
self._results_queue.put((request, sys.exc_info())) def dismiss(self):
"""Sets a flag to tell the thread to exit when done with current job."""
self._dismissed.set() class WorkRequest: #创建单个任务请求
"""A request to execute a callable for putting in the request queue later. See the module function ``makeRequests`` for the common case
where you want to build several ``WorkRequest`` objects for the same
callable but with different arguments for each call. """ def __init__(self, callable_, args=None, kwds=None, requestID=None,
callback=None, exc_callback=_handle_thread_exception):
"""Create a work request for a callable and attach callbacks. A work request consists of the a callable to be executed by a
worker thread, a list of positional arguments, a dictionary
of keyword arguments. A ``callback`` function can be specified, that is called when the
results of the request are picked up from the result queue. It must
accept two anonymous arguments, the ``WorkRequest`` object and the
results of the callable, in that order. If you want to pass additional
information to the callback, just stick it on the request object. You can also give custom callback for when an exception occurs with
the ``exc_callback`` keyword parameter. It should also accept two
anonymous arguments, the ``WorkRequest`` and a tuple with the exception
details as returned by ``sys.exc_info()``. The default implementation
of this callback just prints the exception info via
``traceback.print_exception``. If you want no exception handler
callback, just pass in ``None``. ``requestID``, if given, must be hashable since it is used by
``ThreadPool`` object to store the results of that work request in a
dictionary. It defaults to the return value of ``id(self)``. """
if requestID is None:
self.requestID = id(self) #id返回对象的内存地址
else:
try:
self.requestID = hash(requestID) #哈希处理
except TypeError:
raise TypeError("requestID must be hashable.")
self.exception = False
self.callback = callback
self.exc_callback = exc_callback
self.callable = callable_
self.args = args or []
self.kwds = kwds or {} def __str__(self):
return "<WorkRequest id=%s args=%r kwargs=%r exception=%s>" % \
(self.requestID, self.args, self.kwds, self.exception) class ThreadPool: #线程池管理器
"""A thread pool, distributing work requests and collecting results. See the module docstring for more information. """ def __init__(self, num_workers, q_size=0, resq_size=0, poll_timeout=5):
"""Set up the thread pool and start num_workers worker threads. ``num_workers`` is the number of worker threads to start initially. If ``q_size > 0`` the size of the work *request queue* is limited and
the thread pool blocks when the queue is full and it tries to put
more work requests in it (see ``putRequest`` method), unless you also
use a positive ``timeout`` value for ``putRequest``. If ``resq_size > 0`` the size of the *results queue* is limited and the
worker threads will block when the queue is full and they try to put
new results in it. .. warning:
If you set both ``q_size`` and ``resq_size`` to ``!= 0`` there is
the possibilty of a deadlock, when the results queue is not pulled
regularly and too many jobs are put in the work requests queue.
To prevent this, always set ``timeout > 0`` when calling
``ThreadPool.putRequest()`` and catch ``Queue.Full`` exceptions. """
self._requests_queue = Queue.Queue(q_size) #任务队列
self._results_queue = Queue.Queue(resq_size) #结果队列
self.workers = [] #工作线程
self.dismissedWorkers = [] #睡眠线程
self.workRequests = {} #一个字典 键是id 值是request
self.createWorkers(num_workers, poll_timeout) def createWorkers(self, num_workers, poll_timeout=5):
"""Add num_workers worker threads to the pool. ``poll_timout`` sets the interval in seconds (int or float) for how
ofte threads should check whether they are dismissed, while waiting for
requests. """
for i in range(num_workers):
self.workers.append(WorkerThread(self._requests_queue,
self._results_queue, poll_timeout=poll_timeout)) def dismissWorkers(self, num_workers, do_join=False):
"""Tell num_workers worker threads to quit after their current task."""
dismiss_list = []
for i in range(min(num_workers, len(self.workers))):
worker = self.workers.pop()
worker.dismiss()
dismiss_list.append(worker) if do_join:
for worker in dismiss_list:
worker.join()
else:
self.dismissedWorkers.extend(dismiss_list) def joinAllDismissedWorkers(self):
"""Perform Thread.join() on all worker threads that have been dismissed.
"""
for worker in self.dismissedWorkers:
worker.join()
self.dismissedWorkers = [] def putRequest(self, request, block=True, timeout=None):
"""Put work request into work queue and save its id for later."""
assert isinstance(request, WorkRequest)
# don't reuse old work requests
assert not getattr(request, 'exception', None)
self._requests_queue.put(request, block, timeout)
self.workRequests[request.requestID] = request #确立一对一对应关系 一个id对应一个request def poll(self, block=False):#处理任务,
"""Process any new results in the queue."""
while True:
# still results pending?
if not self.workRequests: #没有任务
raise NoResultsPending
# are there still workers to process remaining requests?
elif block and not self.workers:#无工作线程
raise NoWorkersAvailable
try:
# get back next results
request, result = self._results_queue.get(block=block)
# has an exception occured?
if request.exception and request.exc_callback:
request.exc_callback(request, result)
# hand results to callback, if any
if request.callback and not \
(request.exception and request.exc_callback):
request.callback(request, result)
del self.workRequests[request.requestID]
except Queue.Empty:
break def wait(self):
"""Wait for results, blocking until all have arrived."""
while 1:
try:
self.poll(True)
except NoResultsPending:
break

有三个类 ThreadPool,workRequest,workThread,

第一步我们需要建立一个线程池调度ThreadPool实例(根据参数而产生多个线程works),然后再通过makeRequests创建具有多个不同参数的任务请求workRequest,然后把任务请求用putRequest放入线程池中的任务队列中,此时线程workThread就会得到任务callable,然后进行处理后得到结果,存入结果队列。如果存在callback就对结果调用函数。

注意:结果队列中的元素是元组(request,result)这样就一一对应了。

在我的下一篇文章关于爬虫方面的,我将尝试使用线程池来加强爬虫的爬取效率 。

线程池python的更多相关文章

  1. [python] ThreadPoolExecutor线程池 python 线程池

    初识 Python中已经有了threading模块,为什么还需要线程池呢,线程池又是什么东西呢?在介绍线程同步的信号量机制的时候,举得例子是爬虫的例子,需要控制同时爬取的线程数,例子中创建了20个线程 ...

  2. python自带的进程池及线程池

    进程池 """ python自带的进程池 """ from multiprocessing import Pool from time im ...

  3. Python并发编程之进程池与线程池

    一.进程池与线程池 python标准模块concurrent.futures(并发未来) 1.concurrent.futures模块是用来创建并行的任务,提供了更高级别的接口,为了异步执行调用 2. ...

  4. python day 20: 线程池与协程,多进程TCP服务器

    目录 python day 20: 线程池与协程 2. 线程 3. 进程 4. 协程:gevent模块,又叫微线程 5. 扩展 6. 自定义线程池 7. 实现多进程TCP服务器 8. 实现多线程TCP ...

  5. python线程池实现

    python 的线程池主要有threadpool,不过它并不是内置的库,每次使用都需要安装,而且使用起来也不是那么好用,所以自己写了一个线程池实现,每次需要使用直接import即可.其中还可以根据传入 ...

  6. python——有一种线程池叫做自己写的线程池

    这周的作业是写一个线程池,python的线程一直被称为鸡肋,所以它也没有亲生的线程池,但是竟然被我发现了野生的线程池,简直不能更幸运~~~于是,我开始啃源码,实在是虐心,在啃源码的过程中,我简略的了解 ...

  7. 《转》python线程池

    线程池的概念是什么? 在IBM文档库中这样的一段描写:“在面向对象编程中,创建和销毁对象是很费时间的,因为创建一个对象要获取内存资源或者其它更多资源.在Java中更是 如此,虚拟机将试图跟踪每一个对象 ...

  8. 一个简单的python线程池框架

    初学python,实现了一个简单的线程池框架,线程池中除Wokers(工作线程)外,还单独创建了一个日志线程,用于日志的输出.线程间采用Queue方式进行通信. 代码如下:(不足之处,还请高手指正) ...

  9. 一个python线程池的源码解析

    python为了方便人们编程高度封装了很多东西,比如进程里的进程池,大大方便了人们编程的效率,但是默认却没有线程池,本人前段时间整理出一个线程池,并进行了简单的解析和注释,本人水平有限,如有错误希望高 ...

随机推荐

  1. 用Markdown来写作

    Markdown 是一种简单的.轻量级的标记语法.github上面很多的README就是用markdonw语法写的. Markdown 的语法十分简单,常用的标记符号也不超过十个,且一旦熟悉这种语法规 ...

  2. OS X升级到10.10使用后pod故障解决方案出现

    最新的mac 10.10强大的好奇心,所以,你的系统升级到10.10.结果表明,使用pod出现下述问题: /System/Library/Frameworks/Ruby.framework/Versi ...

  3. Android自己定义组件系列【8】——面膜文字动画

    我们掩盖文字动画Flash中非经货共同体共同,由于Android应用程序开发人员做你想要做这个动画在应用程序中去?本文中,我们看的是如何自己的定义ImageView来实现让一张文字图片实现文字的遮罩闪 ...

  4. hdu Lowest Bit

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1196 大水题   lowbit   的应用    (可以取出一个数二进制中的最后一个1.树状数组常用, ...

  5. 蓝桥杯 BASIC 27 矩阵乘法(矩阵、二维数组)

    [思路]:注意0次幂是单位矩阵. [AC代码]: #include <iostream> #include <algorithm> #include <iomanip&g ...

  6. UIAppDelegate介绍

    #import "GLAppDelegate.h" @implementation GLAppDelegate // 当应用程序启动完毕的时候就会调用(系统自动调用) - (BOO ...

  7. 6、Cocos2dx 3.0游戏开发的基本概念找个小三场比赛

    重开发人员的劳动成果,转载的时候请务必注明出处:http://blog.csdn.net/haomengzhu/article/details/27689713 郝萌主友情提示: 人是习惯的产物,当你 ...

  8. IBM BigInsights 3.0.0.2 集群环境搭建

    1. 改动hosts文件和永久主机名 由于BigInsights 3.0版本号不像之前的版本号能够直接用IP来添加节点,因此我们须要更改每台server的hosts文件和主机名: vim/etc/ho ...

  9. linux 在系统启动过程

    从学习<鸟哥linux私人厨房> 用于在计算机系统启动,计算机硬件和软件由(它包含的操作系统软件)包括.对于操作系统在同一台计算机硬件方面的表现,该系统配备有硬件是公用,不同的系统是 的操 ...

  10. ASP.NET4.0新特性

    原文:ASP.NET4.0新特性 在以前试用VS2010的时候已经关注到它在Web开发支持上的一些变化了,为此我还专门做了一个ppt,当初是计划在4月12日那天讲的,结果因为莫名其妙的原因导致没有语音 ...