Python爬取代理ip

 # -*- coding:utf-8 -*-

 #author : willowj

 import urllib

 import urllib2

 from bs4 import BeautifulSoup

 import re

 import bs4

 import sys

 reload(sys)

 sys.setdefaultencoding('utf8') 

 def ip_test(ip,url= "https://www.baidu.com"):

     #test ip if can be used

     #url = "http://ip.chinaz.com/getip.aspx"# 默认测试网址

     ip1="http://"+ip

     try :

         res = urllib.urlopen(url,proxies={'http:':ip1}).read() #尝试代理访问

         print 'ok',ip1 #,res

         return True

     except Exception,e:

         print "failed"

         return False

 def get_iphtml_inyoudaili():

     url='http://www.youdaili.net'

     html=urllib2.urlopen(url)

     code=html.read()

     #href="http://www.youdaili.net/Daili/http/26672.html" title="12月27号 最新代理http服务器ip地址"

     regexp='href="(.*?)" .*?最新代理http服务器ip地址'

     pat=re.compile(regexp)

     met=re.findall(pat,code)

     print met[0]

     #最新代理http服务器ip地址 html

     return met[0]

 def getIps(url):

     #getip from website, test,and  return,save aviable ips in 'ips.txt'

     htmlip=urllib2.urlopen(url)

     codeip=htmlip.read()

     regexpip='([1-9][0-9]{0,2}\.\S*?)@HTTP#'  #IP样式

     pat_ip=re.compile(regexpip) 

     met_ip=re.findall(pat_ip,codeip)

     ips=[]

     file_open=open('ips.txt','w')

     for x in met_ip:

         print x

         if ip_test(x):

             ips.append(x)

             file_open.write(x+'\n')

     file_open.close()

     #print ips,'youdaili'

     return ips

 def saveIps(list):

     file_open=open('ips.txt','w')

     for ip in list:

         file_open.write(ip+'\n')

     file_open.close()

 def read_ips(file='ips.txt'):

     '''读取IP 以list返回'''

     file_open=open(file)

     lines=file_open.readlines()

     ips=[]

     for line in  lines:

         ip=line.strip("\n")

         ips.append(ip)

     print ips

     return ips

 if __name__=="__main__":

     ips = getIps(get_iphtml_inyoudaili())

     saveIps(ips)

Python爬取代理ip的更多相关文章

使用Python爬取代理ip
本文主要代码用于有代理网站http://www.kuaidaili.com/free/intr中的代理ip爬取,爬虫使用过程中需要输入含有代理ip的网页链接. 测试ip是否可以用 import tel ...
python爬虫爬取代理IP
# #author:wuhao # #--*------------*-- #-****#爬取代理IP并保存到Excel----#爬取当日的代理IP并保存到Excel,目标网站xicidaili.co ...
python代理池的构建3——爬取代理ip
上篇博客地址:python代理池的构建2--代理ip是否可用的处理和检查一.基础爬虫模块(Base_spider.py) #-*-coding:utf-8-*- ''' 目标: 实现可以指定不同UR ...
python 批量爬取代理ip
import urllib.request import re import time import random def getResponse(url): req = urllib.request ...
爬虫爬取代理IP池及代理IP的验证
最近项目内容需要引入代理IP去爬取内容. 为了项目持续运行,需要不断构造.维护.验证代理IP. 为了绕过服务端对IP 和频率的限制,为了阻止服务端获取真正的主机IP. 一.服务器如何获取客户端IP ...
自动爬取代理IP例子
import time import json import datetime import threading import requests from lxml import etree from ...
爬取代理IP
现在爬虫好难做啊,有些网站直接封IP,本人小白一个,还没钱,只能找免费的代理IP,于是去爬了西刺免费代理,结果技术值太低,程序还没调试好, IP又被封了... IP又被封了... IP又被封了... ...
爬取代理IP，并判断是否可用。
# -*- coding:utf-8 -*- from gevent import monkey monkey.patch_all() import urllib2 from gevent.pool ...
原创:Python爬虫实战之爬取代理ip
编程的快乐只有在运行成功的那一刻才知道QAQ 目标网站:https://www.kuaidaili.com/free/inha/ #若有侵权请联系我因为上面的代理都是http的所以没写这个判断代 ...

随机推荐

Silverlight和WPF中DataContractJsonSerializer对时间的处理差异
原创文章转载请注明出处:@协思, http://zeeman.cnblogs.com Silverlight脱胎于WPF,他们的行为不完全并不完全相同,DataContractJsonSerializ ...
RabbitMQ Exchange & Queue Design Trade-off
In previous post, I mentioned the discussion on StackOverflow regarding designing exchanges. Usually ...
iOS-网络处理
1.iOS-网络基础 2.iOS-网络处理框架AFN 3.iOS-网络爬虫
jarsigner签名报错Invalid keystore format
由于之前在魅族市场的APK包都不是自己上传的,而是魅族从其他安卓市场帮拉去过来了. 所以需要我们自己去认领APK包. 这个时候就需要按照魅族给的未签名测试包给重新签名然后提交审核了. 1:看完以下说明 ...
Java易混淆的概率：成员变量、类变量、实例变量、局部变量
先看代码 public class Variable{ int b=0; //实例变量 static int a=0; //类变量 final String c="wws"; // ...
DAC Usage3：Monitor Data-tier Applications
If you deploy a DAC to a managed instance of the Database Engine, information about the deployed DAC ...
Distributed1：Linked Server 添加和删除
A linked server allows for access to distributed, heterogeneous queries against OLE DB data sources. ...
li 前面的缩进怎么去除？
异常处理汇总-前端系列 http://www.cnblogs.com/dunitian/p/4523015.html 设置margin和padding为0或者为比较小的值就可以了
前端学PHP之命名空间
× 目录 [1]定义 [2]多命名空间 [3]名称解析[4]访问内部元素[5]全局空间[6]别名和导入前面的话从广义上来说,命名空间是一种封装事物的方法.在很多地方都可以见到这种抽象概念.例如,在 ...
Util应用程序框架公共操作类(七):Lambda表达式公共操作类
前一篇扩展了两个常用验证方法,本文将封装两个Lambda表达式操作,用来为下一篇的查询扩展服务. Lambda表达式是一种简洁的匿名函数语法,可以用它将方法作为委托参数传递.在Linq中,大量使用La ...

Python爬取代理ip

Python爬取代理ip的更多相关文章

随机推荐

热门专题