使用python来批量抓取网站图片

今天"无意"看美女无意溜达到一个网站，发现妹子多多，但是可恨一个page只显示一张或两张图片，家里WiFi也难用，于是发挥"程序猿"的本色，写个小脚本，把图片扒下来再看，类似功能已有不少大师实现了，但本着学习锻炼的精神，自己折腾一遍，涨涨姿势！

先来效果展示下：

python代码：

# -*- coding:utf8 -*-

import urllib2

import re

import requests

from lxml import etree

import os

def check_save_path(save_path):

    try:

        os.mkdir(save_path)

    except:

        pass

def get_image_name(image_link):

    file_name = os.path.basename(image_link)

    return file_name

def save_image(image_link, save_path):

    file_name = get_image_name(image_link)

    file_path = save_path + "\\" + file_name

    print("准备下载%s" % image_link)

    try:

        file_handler = open(file_path, "wb")

        image_handler = urllib2.urlopen(url=image_link, timeout=5).read()

        file_handler.write(image_handler)

        file_handler.closed()

    except Exception, ex:

        print(ex.message)

def get_image_link_from_web_page(web_page_link):

    image_link_list = []

    print(web_page_link)

    try:

        html_content = urllib2.urlopen(url=web_page_link, timeout=5).read()

        html_tree = etree.HTML(html_content)

        print(str(html_tree))

        link_list = html_tree.xpath('//p/img/@src')

        for link in link_list:

            # print(link)

            if str(link).find("uploadfile"):

                image_link_list.append("http://www.xgyw.cc/" + link)

    except Exception, ex:

        pass

    return image_link_list

def get_page_link_list_from_index_page(base_page_link):

    try:

        html_content = urllib2.urlopen(url=base_page_link, timeout=5).read()

        html_tree = etree.HTML(html_content)

        print(str(html_tree))

        link_tmp_list = html_tree.xpath('//div[@class="page"]/a/@href')

        page_link_list = []

        for link_tmp in link_tmp_list:

            page_link_list.append("http://www.xgyw.cc/" + link_tmp)

        return page_link_list

    except Exception, ex:

        print(ex.message)

        return []

def get_page_title_from_index_page(base_page_link):

    try:

        html_content = urllib2.urlopen(url=base_page_link, timeout=5).read()

        html_tree = etree.HTML(html_content)

        print(str(html_tree))

        page_title_list = html_tree.xpath('//td/div[@class="title"]')

        page_title_tmp = page_title_list[0].text

        print(page_title_tmp)

        return page_title_tmp

    except Exception, ex:

        print(ex.message)

        return ""

def get_image_from_web(base_page_link, save_path):

    check_save_path(save_path)

    page_link_list = get_page_link_list_from_index_page(base_page_link)

    for page_link in page_link_list:

        image_link_list = get_image_link_from_web_page(page_link)

        for image_link in image_link_list:

            save_image(image_link, save_path)

base_page_link = "http://www.xgyw.cc/tuigirl/tuigirl1346.html"

page_title = get_page_title_from_index_page(base_page_link)

if page_title <> "":

    save_path = "N:\\PIC\\" + page_title

else:

    save_path = "N:\\PIC\\other\\"

get_image_from_web(base_page_link, save_path)

代码思路：

使用urllib2.urlopen(url).open来获取页面数据，再使用etree.HTML()将页面解析成xml格式，方便使用xmlpath方式来获取特定node的值，最终遍历所有页面得到要下载的图片，将图片保存到本地。

--=========================================================

python包安装：

很多python包没有windows安装包，或者没有X64版本的安装包，对于新手来说，很难快速上手，可以使用pip或easy_install来安装要使用的安装包，相关安装方式：https://pypi.python.org/pypi/setuptools

本人采用easy_install方式，我电脑安装python2.7，安装路径为：C:\Python27\python.exe，下载ez_setup.py文件后到c盘保存，然后运行cmd执行以下命令：

C:\Python27\python.exe "c:\ez_setup.py"

即可安装easy_install，安装结束后可以C:\Python27\Scripts下看到easy_install-2.7.exe，如果我们想在本地安装requests包，那么可以运行以下命令来试下：

"C:\Python27\Scripts\easy_install-2.7.exe" requests

--==========================================================

依旧是妹子压贴，推女郎第68期，想要图的自己百度

使用python来批量抓取网站图片的更多相关文章

Python入门-编写抓取网站图片的爬虫-正则表达式
//生命太短我用Python! //Python真是让一直用c++的村里孩子长知识了! 这个仅仅是一个测试,成功抓取了某网站1000多张图片. 下一步要做一个大新闻大工程 #config = ut ...
python网络爬虫抓取网站图片
本文介绍两种爬取方式: 1.正则表达式 2.bs4解析Html 以下为正则表达式爬虫,面向对象封装后的代码如下: import urllib.request # 用于下载图片 import os im ...
Python3利用BeautifulSoup4批量抓取站点图片的代码
边学边写代码,记录下来.这段代码用于批量抓取主站下所有子网页中符合特定尺寸要求的的图片文件,支持中断. 原理很简单:使用BeautifulSoup4分析网页,获取网页<a/>和<im ...
php远程抓取网站图片并保存
以前看到网上别人说写程序抓取网页图片的,感觉挺神奇,心想什么时候我自己也写一个抓取图片的方法! 刚好这两天没什么事,就参考了网上一个php抓取图片代码,重点借鉴了匹配img标签和其src属性正则的写 ...
python爬虫批量抓取ip代理
使用爬虫抓取数据时,经常要用到多个ip代理,防止单个ip访问太过频繁被封禁.ip代理可以从这个网站获取:http://www.xicidaili.com/nn/.因此写一个python程序来获取ip代 ...
【Python】批量爬取网站URL测试Struts2-045漏洞
1.概述都懒得写了.... 就是批量测试用的,什么工具里扣出来的POC,然后根据自己的理解写了个爬网站首页URL的代码... #!/usr/bin/env python # -*- coding: u ...
Python -- 网络编程 -- 抓取网页图片 -- 豆瓣妹子
首先分析页面URL,形如http://dbmeizi.com/category/[1-14]?p=[0-476] 图片种类对应编号: 1:'性感', 2:'有沟', 3:'美腿', 4:'小露点', ...
Python -- 网络编程 -- 抓取网页图片 -- 图虫网
字符串(str)编码成字节码(bytes),字节码解码为字符串获取当前环境编码:sys.stdin.encoding url编码urllib.parse.quote() url解码urllib.pa ...
【转】Python 代码批量抓取免费高清图片！
import requests from bs4 import BeautifulSoup import random import time from fake_useragent import U ...

随机推荐

How to execute sudo command in remote host via SSH
Question: I have an interactive shell script, that at one place needs to ssh to another machine (Ubu ...
[Spark]What's the difference between spark.sql.shuffle.partitions and spark.default.parallelism?
From the answer here, spark.sql.shuffle.partitions configures the number of partitions that are used ...
摹客 · Veer 第二届设计大赛邀你来战！
2018年12月,摹客设计大赛一年一度一归来. 继2017年摹客全国首届原型设计大赛成功举办后,本次大赛是摹客第二届设计大赛.大赛由摹客主办,Veer独家冠名赞助,iSlide和创客贴协办,国内多家知 ...
leveldb skiplist的改编非并发去除内存池版本代码练习
// MuSkipList.cpp: 定义控制台应用程序的入口点. // #include "stdafx.h" #include <random> #include ...
activiti如何获取当前节点以及下一步路径或节点（转）
ACTIVITI相对于JBPM来说,比较年轻,用的人少,中文方面的资料更少,我根据网上到处找得资料以及看官方文档总结出来了代码,非常不容易啊.废话不多说,直接上代码吧: 首先是根据流程ID获取当前任务 ...
关于xp操作系统下使用VC6++编写的上位机软件在win10中运行的问题
将代码拷贝到win10操作系统中,在vs2015环境中重新编译即可. 编译生成的exe出现终止时考虑mscomm控件是否注册. 当win10环境64位操作系统时,将以下四个文件放置于C:\Window ...
2019.01.02 poj1322 Chocolate（生成函数+二项式定理）
传送门生成函数好题. 题意简述:一个袋子里有ccc种不同颜色的球,现要操作nnn次,每次等概率地从袋中拿出一个球放在桌上,如果桌上有两个相同的球就立刻消去,问最后桌上剩下mmm个球的概率. 第一眼反 ...
atcoder题目合集（持续更新中）
Choosing Points 数学 Integers on a Tree 构造 Leftmost Ball 计数dp+组合数学 Painting Graphs with AtCoDeer tarja ...
OpenCV(2):视频
播放AVI视频 #include<iostream> #include<opencv2/core/core.hpp> #include<opencv2/highgui/h ...
Le Chapitre VIII
J'appris bien vite à mieux connaître cette fleur. Il y avait toujours eu, sur la planète du petit pr ...

使用python来批量抓取网站图片

使用python来批量抓取网站图片的更多相关文章

随机推荐

热门专题