python多线程下载文件

从文件中读取图片url和名称，将url中的文件下载下来。文件中每一行包含一个url和文件名，用制表符隔开。

1、使用requests请求url并下载文件

def download(img_url, img_name):

    with closing(requests.get(img_url, stream=True)) as r:

        with open(os.path.join(out_dir, img_name), 'wb') as f:

            for data in r.iter_content(1024):

                f.write(data)

2、从文件中读取url，考虑文件较大，使用生成器的方式读取。

def get_imgurl_generate():

    with open('./example.txt', 'r') as f:

        for line in f:

            line = line.strip()

            yield imgs

3、使用多线程进行下载

lock = threading.Lock()

def loop(imgs):

    while True:

        try:

            with lock:

                img_url, img_name = next(imgs)

        except StopIteration:

            break

        download_pic(img_url, img_name)

img_gen = imgurl_generate()

for i in range(0, thread_num):

    t = threading.Thread(target=loop, args=(img_gen,))

    t.start()

完整代码，加入异常处理

 # -*- coding: utf-8 -*-

 import os

 from contextlib import closing

 import threading

 import requests

 import time

 headers = {

 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'

 }

 #输出文件夹

 out_dir = './output'

 #线程数

 thread_num = 20

 #http请求超时设置

 timeout = 5

 if not os.path.exists(out_dir):

     os.mkdir(out_dir)

 def download(img_url, img_name):

     if os.path.isfile(os.path.join(out_dir, img_name)):

         return

     with closing(requests.get(img_url, stream=True, headers=headers, timeout=timeout)) as r:

         rc = r.status_code

         if 299 < rc or rc < 200:

             print 'returnCode%s\t%s' % (rc, img_url)

             return

         content_length = int(r.headers.get('content-length', ''))

         if content_length == 0:

             print 'size0\t%s' % img_url

             return

         try:

             with open(os.path.join(out_dir, img_name), 'wb') as f:

                 for data in r.iter_content(1024):

                     f.write(data)

         except:

             print 'savefail\t%s' % img_url

 def get_imgurl_generate():

     with open('./final.scp', 'r') as f:

         index = 0

         for line in f:

             index += 1

             if index % 500 == 0:

                 print 'execute %s line at %s' % (index, time.time())

             if not line:

                 print ur'line %s is empty "\t"' % index

                 continue

             line = line.strip()

             try:

                 imgs = line.split('\t')

                 if len(imgs) != 2:

                     print ur'line %s splite error' % index

                     continue

                 if not imgs[0] or not imgs[1]:

                     print ur'line %s img is empty' % index

                     continue

                 yield imgs

             except:

                 print ur'line %s can not split by "\t"' % index

 lock = threading.Lock()

 def loop(imgs):

     print 'thread %s is running...' % threading.current_thread().name

     while True:

         try:

             with lock:

                 img_url, img_name = next(imgs)

         except StopIteration:

             break

         try:

             download(img_url, img_name)

         except:

             print 'exceptfail\t%s' % img_url

     print 'thread %s is end...' % threading.current_thread().name

 img_gen = get_imgurl_generate()

 for i in range(0, thread_num):

     t = threading.Thread(target=loop, name='LoopThread %s' % i, args=(img_gen,))

     t.start()

python多线程下载文件的更多相关文章

Python之FTP多线程下载文件之分块多线程文件合并
Python之FTP多线程下载文件之分块多线程文件合并欢迎大家阅读Python之FTP多线程下载系列之二:Python之FTP多线程下载文件之分块多线程文件合并,本系列的第一篇:Python之FTP ...
Python之FTP多线程下载文件之多线程分块下载文件
Python之FTP多线程下载文件之多线程分块下载文件 Python中的ftplib模块用于对FTP的相关操作,常见的如下载,上传等.使用python从FTP下载较大的文件时,往往比较耗时,如何提高从 ...
python爬虫下载文件
python爬虫下载文件下载东西和访问网页差不多,这里以下载我以前做的一个安卓小游戏为例地址为:http://hjwachhy.site/game/only_v1.1.1.apk 首先下载到内存 ...
多线程下载文件，ftp文件服务器
1: 多线程下载文件 package com.li.multiplyThread; import org.apache.commons.lang3.exception.ExceptionUtils; ...
教你如何在 Android 使用多线程下载文件
# 教你如何在 Android 使用多线程下载文件前言在 Android 日常开发中,我们会经常遇到下载文件需求,这里我们也可以用系统自带的 api DownloadManager 来解决这个问题 ...
java 多线程下载文件以及URLConnection和HttpURLConnection的区别
使用 HttpURLConnection 实现多线程下载文件注意GET大写//http public class MultiThreadDownload { public static void m ...
java 多线程下载文件并实时计算下载百分比（断点续传）
多线程下载文件多线程同时下载文件即:在同一时间内通过多个线程对同一个请求地址发起多个请求,将需要下载的数据分割成多个部分,同时下载,每个线程只负责下载其中的一部分,最后将每一个线程下载的部分组装起来 ...
java 网络编程基础 InetAddress类；URLDecoder和URLEncoder；URL和URLConnection；多线程下载文件示例
什么是IPV4,什么是IPV6: IPv4使用32个二进制位在网络上创建单个唯一地址.IPv4地址由四个数字表示,用点分隔.每个数字都是十进制(以10为基底)表示的八位二进制(以2为基底)数字,例如: ...
python多线程下载ts文件
# -*- coding: utf-8 -*- """ Created on Wed Aug 22 15:56:19 2018 @author: Administrato ...

随机推荐

Unity3D for VR 学习(9): Unity Shader 光照模型 (illumination model)
关于光照模型所谓模型,一般是由学术算法发起, 经过大量实际数据验证而成的可靠公式现在还记得2009年做TD-SCDMA移动通信算法的时候,曾经看过自由空间传播模型(Free space propa ...
POJ.1426 Find The Multiple （BFS）
POJ.1426 Find The Multiple (BFS) 题意分析给出一个数字n,求出一个由01组成的十进制数,并且是n的倍数. 思路就是从1开始,枚举下一位,因为下一位只能是0或1,故这个 ...
DotNet,PHP,Java的数据库连接代码大全(带演示代码)
C#数据库连接字符串 Web.config文件 <connectionStrings>  <add name="con ...
【CodeChef】Chef and Graph Queries
Portal --> CC Chef and Graph Queries Solution 快乐数据结构题(然而好像有十分优秀的莫队+可撤销并查集搞法qwq) 首先考虑一种方式来方便一点地..计 ...
【数学】【背包】【NOIP2018】P5020 货币系统
传送门 Description 在网友的国度中共有 \(n\) 种不同面额的货币,第 \(i\) 种货币的面额为 \(a[i]\),你可以假设每一种货币都有无穷多张.为了方便,我们把货币种数为 \(n ...
在Mac上安装mysql数据库
安装登录MySQL网站用dmg的方式安装.Download MySQL Community Server 或者常规方式,打开官网 : http://www.mysql.com/downloads/ ...
Codeforces Round #209 (Div. 2)A贪心 B思路 C思路+快速幂
A. Table time limit per test 1 second memory limit per test 256 megabytes input standard input outpu ...
hihocoder 1509异或排序
描述给定一个长度为 n 的非负整数序列 a[1..n] 你需要求有多少个非负整数 S 满足以下两个条件: (1).0 ≤ S < 2^60 (2).对于所有 1 ≤ i < n ,有 ( ...
vmware中无法ping通主机的问题
虚拟机使用NAT方式运行一段时间后,发现无法ping通主机(物理机),显示错误如下 ipconfig如下查看虚拟机中的网络连接,显示"未识别网络" 分析: 查看了网络上的一些资料 ...
linux中操作数据库的使用命令记录
1,mysql 查看数据库表编码格式: show create table widget; 修改数据库表编码格式: alter table widget default character set u ...

python多线程下载文件

python多线程下载文件的更多相关文章

随机推荐

热门专题