python——批量下载图片

前言

批量下载网页上的图片需要三个步骤：

获取网页的URL
获取网页上图片的URL
下载图片

例子

from html.parser import HTMLParser

import urllib.request

import os,uuid,sys

#第1步：

class PageLinkParser(HTMLParser):

  def __init__(self,strict=False):

    HTMLParser.__init__(self,strict)

    self.all=[]

  def handle_starttag(self,tag,attrs):

    if tag=='a':

      for i in attrs:

        if i[0]=='href':

          if i[1] not in self.all:

            self.all.append(i[1])

def getPageLinks(url):

  doing=[url]

  done=[]

  while len(doing)>=1:

    x=doing.pop();

    done.append(x)

    print(x)

    try:

      f=urllib.request.urlopen(x)

      parser=PageLinkParser(strict=False)

      parser.feed(f.read().decode('utf-8'))

      for i in parser.all:

        if i not in done:

          #doing.insert(0,i) #在此就不遍历了。

          done.append(i)

      parser.all=[]

    except:

      continue

  return done

#第2步：

class ImgLinkParser(HTMLParser):

  def __init__(self,strict=False):

    HTMLParser.__init__(self,strict)

    self.all=[]

  def handle_starttag(self,tag,attrs):

    if tag=='img':

      for i in attrs:

        if i[0]=='src':

          if i[1] not in self.all:

            self.all.append(i[1])

def getImgLinks(url):

  parser=ImgLinkParser(strict=False)

  try:

    f=urllib.request.urlopen(url)

    parser.feed(f.read().decode('utf-8'))#解码格式，根据网页的编码格式而定。

  finally:

    return parser.all

#第3步：

def loadImg(l):

  for i in l:

    i=i.strip()

    print(i)

    try:

      f=open(os.path.join(os.getcwd(),uuid.uuid4().hex+'.jpg'),'wb') #防止文件名重复，使用UUID

      f.write(urllib.request.urlopen(i).read())

      f.close()

    except:

      print('error:',i)

      continue

#使用

if __name__=='__main__':

  for i in getPageLinks('http://www.cnblogs.com/'):

    loadImg(getImgLinks(i))

抛砖引玉

可以写一个函数，用于判断网页的编码格式
网页的遍历可以增加一些控制功能：比如只遍历同一个网站等。
下载功能可以使用多线程。

python——批量下载图片的更多相关文章

用python批量下载图片
一写爬虫注意事项网络上有不少有用的资源, 如果需要合理的用爬虫去爬取资源是合法的,但是注意不要越界,前一阶段有个公司因为一个程序员写了个爬虫,导致公司200多个人被抓,所以先进入正题之前了解下什么 ...
python批量下载图片的三种方法
一是用微软提供的扩展库win32com来操作IE: win32com可以获得类似js里面的document对象,但貌似是只读的(文档都没找到). 二是用selenium的webdriver: sele ...
python 批量下载图片
#coding=utf-8import re,sysimport urllib def getHtml(url): page = urllib.urlopen(url) html = page.rea ...
python批量下载图片
从数据库拿了一批图片地址,需要一张一张的把图片下载下来,自从有了python,想到能省事就琢磨如何省事. 代码如下: import urllib.requestf=open("E:\999\ ...
python批量下载图片3
import urllib.request import os def url_open(url): req = urllib.request.Request(url) req.add_header( ...
【Python】nvshens按目录批量下载图片爬虫1.00(单线程版)
# nvshens按目录批量下载图片爬虫1.00(单线程版) from bs4 import BeautifulSoup import requests import datetime import ...
用Python批量下载DACC的MODIS数据
本人初次尝试用Python批量下载DACC的MODIS数据,记下步骤,提醒自己,数据还在下载,成功是否未知,等待结果中...... 若有大佬发现步骤有不对之处,望指出,不胜感激. 1.下载Python ...
Python批量修改图片格式和尺寸
Python批量修改图片格式和尺寸备注: 1.导入了PIL库,是处理图片用的,很强大; 2.导入了的win32库,是判断隐藏文件用的,我们的项目需要删除隐藏文件,不需要的可以直接找到删除. 3.导入 ...
scrapy操作mysql/批量下载图片
1.操作mysql items.py meiju.py 3.piplines.py 4.settings.py -------------------------------------------- ...

随机推荐

python openpyxl 操作 excel
初识与安装 Openpyxl is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. 安装 ...
使用EntityFramework6.1的DbCommandInterceptor拦截生成的SQL语句
开始 EF6.1也出来不少日子了,6.1相比6.0有个很大的特点就是新增了System.Data.Entity.Infrastructure.Interception 命名空间,此命名空间下的对象可以 ...
linux 下 ls 文件夹和文件没有颜色的解决办法
.bashrc 中加入 alias ls="ls --color"
Windows系统bug
今天,发现Windows系统的一个bug 也不知道是不是bug,未深入在网上下载图片,将图像另存为到本地的时候,图片文件名可以为空(后缀要保留) 但是,在本地,是无法直接将文件名命名为空的~
svn强制解锁的几种做法
标签: svn强制解锁 2013-12-16 17:40 12953人阅读评论(0) 收藏举报分类: SoftwareProject(23) 版权声明:本文为博主原创文章,未经博主允许不得转 ...
MYSQL双主故障解决实例。
根据报错得知,获取到的主库文件格式错误. 一.锁住主从正常的库 Mysql> flush tables with read lock; 锁表 unlock tables; 解锁 SHOW MAS ...
两种html幻灯片效果
650) this.width=650;" src="http://img1.51cto.com/attachment/201307/165757318.jpg" tit ...
如何判断PHP 是线程安全还是非线程安全的
什么是线程安全与非线程安全? 线程安全就是在多线程环境下也不会出现数据不一致,而非线程安全就有可能出现数据不一致的情况. 线程安全由于要确保数据的一致性,所以对资源的读写进行了控制,换句话说增加了系统 ...
PHP中spl_autoload_register()函数
spl_autoload_register — 注册给定的函数作为 __autoload 的实现官方地址:http://php.net/manual/zh/function.spl-autoload ...
ls按时间排序输出文件列表
文件转自:http://www.2cto.com/os/201303/197829.html ls按时间排序输出文件列表首先,ls --help查看ls相关的与时间排序相关的参数: > ...

python——批量下载图片

前言

例子

抛砖引玉

python——批量下载图片的更多相关文章

随机推荐

热门专题