coding=UTF-8

# HTML输出器

import sys

class htmlOutputer():

    def __init__(self):

        self.data = []

    def collect_data(self, data):

        if data is None:

            return

        self.data.append(data)

    def output(self):

        global file

        try:

            file = open('output.html', 'w',encoding='utf-8')

            file.write('<html>')

            file.write('<body>')

            file.write('<table>')

            for data in self.data:

                file.write('<tr>')

                file.write('<td>%s</td>' % data['url'])

                file.write('<td>%s</td>' % data['title'].encode('utf-8').decode('utf-8'))

                file.write('<td>%s</td>' % data['summary'].encode('utf-8').decode('utf-8'))

                file.write('</tr>')

            file.write('</table>')

            file.write('</body>')

            file.write('</html>')

            file.close()

        except IOError as e:

            print(str(e))

        finally:

            if 'file' in locals():

                file.close()

html_outputer.py的更多相关文章

爬虫4 html输出器 html_outputer.py
#coding:utf8 __author__ = 'wang' class HtmlOutputer(object): def __init__(self): self.datas = []; de ...
python爬虫—爬取百度百科数据
爬虫框架:开发平台 centos6.7 根据慕课网爬虫教程编写代码片区百度百科url,标题,内容分为4个模块:html_downloader.py 下载器 html_outputer.py 爬取数 ...
Python开发轻量级爬虫
这两天自学了python写爬虫,总结一下: 开发目的:抓取百度百科python词条页面的1000个网页设计思路: 1,了解简单的爬虫架构: 2,动态的执行流程: 3,各部分的实现: URL管理器:p ...
Python抓取百度百科数据
前言本文整理自慕课网<Python开发简单爬虫>,将会记录爬取百度百科"python"词条相关页面的整个过程. 抓取策略确定目标:确定抓取哪个网站的哪些页面的哪部分 ...
python爬虫主要就是五个模块：爬虫启动入口模块，URL管理器存放已经爬虫的URL和待爬虫URL列表，html下载器，html解析器，html输出器同时可以掌握到urllib2的使用、bs4（BeautifulSoup）页面解析器、re正则表达式、urlparse、python基础知识回顾（set集合操作）等相关内容。
本次python爬虫百步百科,里面详细分析了爬虫的步骤,对每一步代码都有详细的注释说明,可通过本案例掌握python爬虫的特点: 1.爬虫调度入口(crawler_main.py) # coding: ...
自己动手python打造渗透工具集
难易程度:★★★阅读点:python;web安全;文章作者:xiaoye文章来源:i春秋关键字:网络渗透技术前言python是门简单易学的语言,强大的第三方库让我们在编程中事半功倍,今天我们就来谈谈 ...
Python开发简单爬虫
简单爬虫框架: 爬虫调度器 -> URL管理器 -> 网页下载器(urllib2) -> 网页解析器(BeautifulSoup) -> 价值数据 Demo1: # codin ...
python打造渗透工具集
python是门简单易学的语言,强大的第三方库让我们在编程中事半功倍,今天我们就来谈谈python在渗透测试中的应用,让我们自己动手打造自己的渗透工具集. 难易程度:★★★阅读点:python;web ...
python简单爬虫(二)
上一篇简单的实现了获取url返回的内容,在这一篇就要第返回的内容进行提取,并将结果保存到html中. 一 . 需求: 抓取主页面:百度百科Python词条 https://baike.baidu. ...

随机推荐

HashWithIndifferentAccess
The params method returns the parameters passed to the action, such as those fromthe form or query p ...
ubuntu linux double tab
在terminal中,输入部分指令,再按两下Tab键,可以显示以相关的指令
安装国际版firefox（火狐浏览器）并设置语言为中文
访问https://www.mozilla.org/zh-CN/firefox/new/?scene=2下载.安装: 访问https://addons.mozilla.org/zh-CN/firefo ...
*204. Count Primes (siecing prime)
Count the number of prime numbers less than a non-negative number, n. Example: Input: 10 Output: 4 E ...
POJ-2395 Out of Hay---MST最大边
题目链接: https://vjudge.net/problem/POJ-2395 题目大意: 求MST中的最大边,和POJ-2495类似思路: 模板直接过 #include<iostream ...
【转载】刘昕明:送给和我一样曾经浮躁过的PHP程序员
刘昕明:送给和我一样曾经浮躁过的PHP程序员来源:刘昕明博客作者:刘昕明 2012年偶决定开始写博客了,不为别的,就希望可以通过博客记录我的成长历程同时也希望可以帮助一些刚毕业,刚 ...
2017.9.18 HTMl学习总结----input标签的额type
2.1.3 HTML表单标签与表单设计 (1)表单的组成:文本框(text),密码框(password),多行文本框(Multiline text box). 单选按钮框(Single - rad ...
js判断移动端还是PC端
function isMobile(){ var sUserAgent= navigator.userAgent.toLowerCase(), bIsIpad= sUserAgent.match(/i ...
js最佳实践
JavaScript使用windows对象的open()方法来创建新的浏览器窗口,这个方法有三个参数:windows.open(url,name,features) 参数一:url:是想在新窗口里打开 ...
LeetCode567. Permutation in String
Given two strings s1 and s2, write a function to return true if s2 contains the permutation of s1. I ...

html_outputer.py

coding=UTF-8

html_outputer.py的更多相关文章

随机推荐

热门专题