python爬站长之家写一个信息搜集器

前言：
不知道写什么好，绕来绕去还是写回爬虫这一块。

之前的都爬了一遍。这次爬点好用一点的网站。

0x01:

自行备好requests模块

目标站：http://tool.chinaz.com/

0x2:

代码：

import optparse

import requests

import re

import sys

from bs4 import BeautifulSoup

def main():

    usage="[-z Subdomain mining]" \

          "[-p Side of the station inquiries]" \

          "[-x http status query]"

    parser=optparse.OptionParser(usage)

    parser.add_option('-z',dest="Subdomain",help="Subdomain mining")

    parser.add_option('-p',dest='Side',help='Side of the station inquiries')

    parser.add_option('-x',dest='http',help='http status query')

    (options,args)=parser.parse_args()

    if options.Subdomain:

        subdomain=options.Subdomain

        Subdomain(subdomain)

    elif options.Side:

        side=options.Side

        Side(side)

    elif options.http:

        http=options.http

        Http(http)

    else:

        parser.print_help()

        sys.exit()

def Subdomain(subdomain):

    print('-----------Subdomains quickly tap-----------')

    url="http://m.tool.chinaz.com/subdomain/?domain={}".format(subdomain)

    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

    r=requests.get(url,headers=header).content

    g = re.finditer('<td>\D[a-zA-Z0-9][-a-zA-Z0-9]{0,62}\D(\.[a-zA-Z0-9]\D[-a-zA-Z0-9]{0,62})+\.?</td>', str(r))

    for x in g:

        lik="".join(str(x))

        opg=BeautifulSoup(lik,'html.parser')

        for link in opg.find_all('td'):

            lops=link.get_text()

            print(lops)

def Side(side):

    print('--------Side of the station inquiries--------')

    url="http://m.tool.chinaz.com/same/?s={}".format(side)

    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

    r=requests.get(url,headers=header).content

    g=r.decode('utf-8')

    ksd=re.finditer('<a href=.*?>[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+\.?</a>',str(g))

    for l in ksd:

        ops="".join(str(l))

        pods=BeautifulSoup(ops,'html.parser')

        for xsd in pods.find_all('a'):

            sde=re.findall('[a-zA-z]+://[^\s]*',str(xsd))

            low="".join(sde)

            print(low)

def Http(http):

    print('--------Http status query--------')

    url="http://{}".format(http)

    header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

    r=requests.get(url,headers=header)

    b=r.headers

    for sdw in b:

        print(sdw,':',b[sdw])

if __name__ == '__main__':

    main()

　　运行截图：

-h 帮助

-z 子域名挖掘

-p 旁站查询

-x http状态查询

-z 截图

-p 截图

-x 截图

距离上学还有5天。啊啊啊啊啊啊啊啊啊啊啊

python爬站长之家写一个信息搜集器的更多相关文章

使用python爬取MedSci上的期刊信息
使用python爬取medsci上的期刊信息,通过设定条件,然后获取相应的期刊的的影响因子排名,期刊名称,英文全称和影响因子.主要过程如下: 首先,通过分析网站http://www.medsci.cn ...
python爬取当当网的书籍信息并保存到csv文件
python爬取当当网的书籍信息并保存到csv文件依赖的库: requests #用来获取页面内容 BeautifulSoup #opython3不能安装BeautifulSoup,但可以安装Bea ...
python 拼写检查代码（怎样写一个拼写检查器）
原文:http://norvig.com/spell-correct.html 翻译:http://blog.youxu.info/spell-correct.html 怎样写一个拼写检查器 Pete ...
Python+Flask+Gunicorn 项目实战(一) 从零开始，写一个Markdown解析器 —— 初体验
(一)前言在开始学习之前,你需要确保你对Python, JavaScript, HTML, Markdown语法有非常基础的了解.项目的源码你可以在 https://github.com/zhu-y ...
用 EPWA 写一个图片播放器 PicturePlayer
用 EPWA 写一个图片播放器 PicturePlayer . 有关 EPWA,见 <我发起并创立了一个 EPWA 的开源项目> https://www.cnblogs.com ...
Python的scrapy之爬取链家网房价信息并保存到本地
因为有在北京租房的打算,于是上网浏览了一下链家网站的房价,想将他们爬取下来,并保存到本地. 先看链家网的源码..房价信息都保存在 ul 下的li 里面爬虫结构: 其中封装了一个数据库处理模 ...
python写一个信息收集四大件的脚本
0x0前言: 带来一首小歌: 之前看了小迪老师讲的课,仔细做了些笔记然后打算将其写成一个脚本. 0x01准备: requests模块 socket模块 optparser模块 time模块 0x02 ...
Python爬取链家二手房源信息
爬取链家网站二手房房源信息,第一次做,仅供参考,要用scrapy. import scrapy,pypinyin,requests import bs4 from ..items import L ...
零基础爬虫----python爬取豆瓣电影top250的信息（转）
今天利用xpath写了一个小爬虫,比较适合一些爬虫新手来学习.话不多说,开始今天的正题,我会利用一个案例来介绍下xpath如何对网页进行解析的,以及如何对信息进行提取的. python环境:pytho ...

随机推荐

[国嵌笔记][017][Makefile工程管理]
Makefile的用途 1.make能够使整个程序的编译.链接只需一个命令就可以完成 2.make的工作主要依赖于Makefile的文件.Makefile文件描述了整个程序的编译.链接等规则,使之自动 ...
UE4 保存为bitmap
TArray<FColor> colorData; colorData.Init(FColor(0, 0, 255, 255), 1920 * 1080); for (int ...
setTimeout，setInterval运行原理
function a() { setTimeout(function(){alert(1)},0); alert(2); } a(); 和其他的编程语言一样,Javascript中的函数调用也是通 ...
DEDECMS万能标签{dede:sql}使用教程详解
http://www.dede58.com/a/dedebq/2015/0226/1737.html 1.首页在后台单页文档管理里添加一个单页文档,内容编辑框输入你要的内容生成. 2.在需要调用单页文 ...
基于TI CC2650的IPv6 over BLE(BLEach) demo
虽然BLE 5.0协议理论上已经开始支持IPv6了,但是目前市面上还没有可用的实现IPv6通信的BLE产品. 最近在网上看到一个开源的基于contiki系统,在CC2650上实现的IPv6 over ...
EL表达式得不到后台传过来的值
两种jsp获得action传过来的值第一种: <s:iterator value="#pagination.datas" var="supplier" ...
Spring学习之路二——概念上理解Spring
一.概念. Spring是一个开源框架,Spring是于2003 年兴起的一个轻量级的Java 开发框架,由Rod Johnson 在其著作Expert One-On-One J2EE Develop ...
Servlet--HttpServlet类
HttpServlet类定义 public class HttpServlet extends GenericServlet implements Serializable 这是一个抽象类,用来简化 ...
Unable to update index for central
Unable to update index for central http://repo1.maven.org/maven2/ 就是这句,myeclipse启动后控制台输出这句话:解决办法:1.在 ...
Css的优先级机制
样式的优先级多重样式(Multiple Style):如果外部样式.内部样式.内联样式同时应用于同一个元素,就是使用多重样式的情况. 一般情况优先级如下: (外部样式)External style ...

python爬站长之家写一个信息搜集器

python爬站长之家写一个信息搜集器的更多相关文章

随机推荐

热门专题