python爬虫挂代理

以下是GET的方法，使用的代理接口网站是 http://www.xicidaili.com/nn/

#-*- coding:utf-8 -*-

from bs4 import BeautifulSoup

import requests,chardet,urllib2

ip_list=[]

def get_ip_list(url, headers):

    web_data = requests.get(url, headers=headers)

    soup = BeautifulSoup(web_data.text, 'lxml')

    ips = soup.find_all('tr')

    ip_list = []

    for i in range(1, len(ips)):

        ip_info = ips[i]

        tds = ip_info.find_all('td')

        ip_list.append('http://' + tds[1].text + ':' + tds[2].text)

    return ip_list

def get_random_ip(ip_list):

    proxies = {'http': ip_list[0]}

    return proxies

def getip():

    global ip_list

    url = 'http://www.xicidaili.com/nn/'

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}

    if not ip_list:

        ip_list = get_ip_list(url, headers=headers)

    print ip_list

    proxies = get_random_ip(ip_list)

    return proxies

def deleteip():

    global ip_list

    ip_list.pop(0)

def urllink(link):  # 网页HTML获取以及编码转换

    for i in range(12) :

        try:

            ip = getip()

            print ip

            proxy_support = urllib2.ProxyHandler(ip)

            opener = urllib2.build_opener(proxy_support)

            urllib2.install_opener(opener)

            html_1 = urllib2.urlopen(link, timeout=10).read()

            break

        except Exception,e:

            deleteip()

            print '错误',i,e

            pass

    if i==11:

        return ''

    encoding_dict = chardet.detect(html_1)

    web_encoding = encoding_dict['encoding']

    if web_encoding == 'utf-8' or web_encoding == 'UTF-8':

        html = html_1

    else:

        html = html_1.decode('gbk', 'ignore').encode('utf-8')

    return html

print urllink("http://ccdas.ipmph.com/pc/clinicalExam/getClinicalExamDetail?articleId=8165")

python爬虫挂代理的更多相关文章

Python 爬虫的代理 IP 设置方法汇总
本文转载自:Python 爬虫的代理 IP 设置方法汇总 https://www.makcyun.top/web_scraping_withpython15.html 需要学习的地方:如何在爬虫中使用 ...
python爬虫构建代理ip池抓取数据库的示例代码
爬虫的小伙伴,肯定经常遇到ip被封的情况,而现在网络上的代理ip免费的已经很难找了,那么现在就用python的requests库从爬取代理ip,创建一个ip代理池,以备使用. 本代码包括ip的爬取,检 ...
设置python爬虫IP代理(urllib/requests模块)
urllib模块设置代理如果我们频繁用一个IP去爬取同一个网站的内容,很可能会被网站封杀IP.其中一种比较常见的方式就是设置代理IP from urllib import request proxy ...
python爬虫redis-ip代理池搭建几十万的ip数据--可以使用
from bs4 import BeautifulSoupimport requests,os,sys,time,random,redisfrom lxml import etreeconn = re ...
Python爬虫代理池
爬虫代理IP池在公司做分布式深网爬虫,搭建了一套稳定的代理池服务,为上千个爬虫提供有效的代理,保证各个爬虫拿到的都是对应网站有效的代理IP,从而保证爬虫快速稳定的运行,当然在公司做的东西不能开源出来 ...
Python爬虫代理IP池
目录[-] 1.问题 2.代理池设计 3.代码模块 4.安装 5.使用 6.最后在公司做分布式深网爬虫,搭建了一套稳定的代理池服务,为上千个爬虫提供有效的代理,保证各个爬虫拿到的都是对应网站有效的代 ...
python爬虫之反爬虫（随机user-agent，获取代理ip，检测代理ip可用性）
python爬虫之反爬虫(随机user-agent,获取代理ip,检测代理ip可用性) 目录随机User-Agent 获取代理ip 检测代理ip可用性随机User-Agent fake_usera ...
Python 爬虫入门（二）—— IP代理使用
上一节,大概讲述了Python 爬虫的编写流程, 从这节开始主要解决如何突破在爬取的过程中限制.比如,IP.JS.验证码等.这节主要讲利用IP代理突破. 1.关于代理简单的说,代理就是换个身份.网络 ...
Python爬虫教程-11-proxy代理IP，隐藏地址（猫眼电影）
Python爬虫教程-11-proxy代理IP,隐藏地址(猫眼电影) ProxyHandler处理(代理服务器),使用代理IP,是爬虫的常用手段,通常使用UserAgent 伪装浏览器爬取仍然可能被网 ...

随机推荐

H5兼容问题及解决方法
Meta基础知识: H5页面窗口自动调整到设备宽度,并禁止用户缩放页面 //一.HTML页面结构 <meta name="viewport" content="wi ...
剑指offer（35）数组中的逆序对
题目描述在数组中的两个数字,如果前面一个数字大于后面的数字,则这两个数字组成一个逆序对.输入一个数组,求出这个数组中的逆序对的总数P.并将P对1000000007取模的结果输出. 即输出P%1000 ...
[linux] grep 文本搜索工具
grep [option] pattern file Linux系统中grep命令是一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把匹配的行打印出来.grep全称是Global Regular ...
P4312 [COCI 2009] OTOCI / 极地旅行社
思路 LCT维护和的板子注意findroot的时候要先access一下,修改点权之前要先splay到根代码 #include <cstdio> #include <algorit ...
jQuery validator plugin之概要
jQuery validator 主页 github地址 demo学习效果: Validate forms like you've never validated before! 自定义Valida ...
Linux学习进阶示意图
Linux 基础 Linux 基础 Linux安装专题教程 Linux中文环境 Linux—从菜鸟到高手鸟哥的Linux私房菜基础学习篇(第二版) Ubuntu Linux入门到精通 Linux标 ...
【虚拟机】解决网络适配器没有 VirtualBox Host-Only Ethernet Adapter 问题
下面以windows系统来演示重新安装 VirtualBox Host-Only Ethernet Adapter的方法 1.“win+r”输入“devmgmt.msc”,出现如下界面: 2.点击菜单 ...
用docker-compose部署postgres+ postgis
20190411更新.之前写的太啰嗦,也不删了,重新来.小坑还是有的 psql 命令行客户端因为postgres用docker镜像安装,所以host不需要安装pg,只需要安装客户端 sudo apt ...
第 8 章容器网络 - 066 - Weave 如何与外网通信？
Weave 与外网通信 weave 是一个私有的 VxLAN 网络,默认与外部网络隔离. 外部网络如果要访问到 weave 中的容器:1.首先将主机加入到 weave 网络.2.然后把主机当作访问 w ...
Flask之项目创建,路由以及会话控制
Flask Flask诞生于2010年,是Armin ronacher(人名)用 Python 语言基于 Werkzeug 工具箱编写的轻量级Web开发框架. Flask 本身相当于一个内核,其他几乎 ...

python爬虫挂代理

python爬虫挂代理的更多相关文章

随机推荐

热门专题