python基础教程总结15—

1. 测试文档：

# test_input.txt

Welcome to World Wide Spam. Inc.

These are the corporate web pages of *World Wide Spam*, Inc. We hope you find your stay enjoyable, and that you will sample many of our products.

A short history of the company

World Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat online.

After receiving several complaints from customers who weren't satisfied by their bulk email, World Wide Spam altered their profile, and focused 100% on canned goods. Today, they rank as the world's 13,892 online supplier of SPAM.

Destinations

From this page you may visit several of our interesting web pages:

 - What is SPAM?(http://wwspam.fu/whatisspam)

 - How do they make it?(http://wwspam.fu/howtomakeit)

 - Why should I eat it?(http://wwspam.fu/whyeatit)

How to get in touch with us

You can get in touch with us in *many* ways: By phone (555-1234), by email (wwspam@wwspam.fu) or by visiting our customer feedback page (http://wwspam.fu/feedback)

2.实现

2.1 找出文本块

　　收集遇到的所有行，直到遇到一个空行，返回已经收集的行；

　　不需要收集空行，也不要返回空块（在遇到多个空行时）；

　　确保最后一行是空行，否则不知道最后一个块啥时候结束。

 #文本块生成器（util.py）

 def lines(file):     #文件末尾追加空行

     for line in file:yield line

     yield '\n'

 def blocks(file):

     block=[]

     for line in lines(file):

         if line.strip():    #非空行

             block.append(line)

         elif block:           #遇到空白行时（即文本块末尾），且block非空，则连接里面的行

             yield ' '.join(block).strip()

             bolck=[]

2.2 添加标记

　　打印一些开始标记；　　

　　打印每个用段落标签括起来的块；

　　打印一些结束标记

#简单的标记程序（simple_markup.py）

import  sys,re

from util import *

print '<html><head><title>...</title><body>'

title=True

for block in blocks(sys.stdin):

    block=re.sub(r'\*(.+?)\*',r'<em>\1<em>'，block)

    if title:

        print '<h1>'

        print block

        print '</h1>'

        title=False

    else:

        print '<p>'

        print block

        print '</p>'

        print '</body></html>'

3 模块化

　　语法分析器：读取文本，管理其他类的对象；

　　规则：为每个种类的块制定一条规则，规则能检测适用的快类型并且进行适当的格式化；

　　过滤器：包装一些处理内嵌元素的正则表达式；

　　处理程序：语法分析器使用处理程序来产生输出。每个处理程序能产生不同种类的标记

3.1 处理程序

补充：

1）getattr()函数是Python自省的核心函数，具体使用大体如下：获取对象引用getattr。 getattr用于返回一个对象属性，或者方法

class A:

    def __init__(self):

        self.name = 'zhangjing'

　　  #self.age='24'

    def method(self):

        print"method print"  

Instance = A()

print getattr(Instance , 'name, 'not find') #如果Instance 对象中有属性name则打印self.name的值，否则打印'not find'

print getattr(Instance , 'age', 'not find')   #如果Instance 对象中有属性age则打印self.age的值，否则打印'not find'

print getattr(a, 'method', 'default')

#如果有方法method，否则打印其地址，否则打印default

print getattr(a, 'method', 'default')()

#如果有方法method，运行函数并打印None否则打印default

2）callable（object）

　　中文说明：检查对象object是否可调用。如果返回True，object仍然可能调用失败；但如果返回False，调用对象ojbect绝对不会成功。

　　注意：类是可调用的，而类的实例实现了__call__()方法才可调用。

　　版本：该函数在python2.x版本中都可用。但是在python3.0版本中被移除，而在python3.2以后版本中被重新添加。

>>> callable(0)

False

>>> callable("mystring")

False

>>> def add(a, b):

…     return a + b

…

>>> callable(add)

True

>>> class A:

…      def method(self):

…         return 0

…

>>> callable(A)

True

>>> a = A()

>>> callable(a)

False

>>> class B:

…     def __call__(self):

…         return 0

…

>>> callable(B)

True

>>> b = B()

>>> callable(b)

True

处理程序：

class Handler:

    '''

    '''

    def callback(self, prefix, name, *args):

        method = getattr(self,prefix+name,None)

        if callable(method): return method(*args)

    def start(self, name):

        self.callback('start_', name)

    def end(self, name):

        self.callback('end_', name)

    def sub(self, name):

        def substitution(match):

            result = self.callback('sub_', name, match)

            if result is None: match.group(0)

            return result

        return substitution

class HTMLRenderer(Handler):

    '''

    '''

    def start_document(self):

        print '<html><head><title>...</title></head><body>'

    def end_document(self):

        print '</body></html>'

    def start_paragraph(self):

        print '<p>'

    def end_paragraph(self):

        print '</p>'

    def start_heading(self):

        print '<h2>'

    def end_heading(self):

        print '</h2>'

    def start_list(self):

        print '<ul>'

    def end_list(self):

        print '</ul>'

    def start_listitem(self):

        print '<li>'

    def end_listitem(self):

        print '</li>'

    def start_title(self):

        print '<h1>'

    def end_title(self):

        print '</h1>'

    def sub_emphasis(self, match):

        return '<em>%s</em>' % match.group(1)

    def sub_url(self,  match):

        return '<a href="%s">%s</a>' % (match.group(1),match.group(1))

    def sub_mail(self,  match):

        return '<a href="mailto:%s">%s</a>' % (match.group(1),match.group(1))

    def feed(self, data):

        print data

　　Handle类：

　　1）callback方法负责在给定一个前缀（比如'start_'）和一个名字（比如'paragraph'）后查找正确的方法（比如start_paragraph）,而且使用以 None作为默认值的getattr　　　方法来完成工作。如果从getattr返回的对象能被调用，那么对象就可以用提供的任意额外的参数调用。比如如果对应的对象是存在的，那么调用 handler.callback('start_','paragraph')就会调用不带参数的hander.start_paragraph。

　　2）start和end方法使用各自的前缀start_和end_调用callback方法的助手方法

　　3）sub方法，返回新的函数，这个函数会被当成re.sub中的替换函数来使用

3.2 规则

　　能识别自己适用于那种块（条件）——condition方法

　　能对快进行转换（操作）——action方法

class Rule:

    """

    所有规则的基类

    """

    def action(self,block,handler):

        handler.start(self,type)

        handler.feed(block)

        handler.end(self,type)

        return True

class HeadingRule(Rule):

    """

    标题占一行，最多70个字符，并且不以冒号结尾

    """

    type = 'heading'

    def condition(self,block):

        return not '\n' in block and len(block) <= 70 and not block[-1] == ':'

class TitleRule(HeadingRule):

    """

    题目是文档的第一个块，但前提是它是大标题

    """

    type = 'title'

    first = True

    def condition(self,block):

        if not self.first:

            return False

        self.first = False

        return HeadingRule.condition(self,block)

class ListItemRule(Rule):

    """

    列表项是以连字符开始的段落。作为格式化的一部分，要移除连字符

    """

    type = "listitem"

    def confition(self,block):

        return block[0] == '_'

    def action(self,block,handler):

        handler.start(self.type)

        handler.feed(block[1:].strip)

        handler.end(self,type)

        return True

class ListRule(ListItemRule):

    """

    列表从不是列表项的块和随后的列表项之间。在最后一个连续列表项之后结束

    """

    type = 'list'

    inside = False

    def condition(self,block):

        return True

    def action(self,block,handler):

        if not self.inside and ListItemRule.condition(self,block):

            handler.start(self.type)

            self.inside = True

        elif self.inside and not ListItemRule.condition(self,block):

            handler.end(self.type)

            self.inside = False

        return False

class ParagraphRule(Rule):

    """

    段落只是其他规则并没有覆盖到得块

    """

    type = 'paragraph'

    def condition(self,block):

        return True

3.3 过滤器

　　三个过滤器，分别是：关于强调的内容，关于URL，关于电子邮件地址

self.addFilter(r'\*(.+?)\*', 'emphasis')

self.addFilter(r'(http://[\.a-z0-9A-Z/]+)', 'url')

self.addFilter(r'([\.a-zA-Z]+@[\.a-zA-Z]+[a-zA-Z]+)','mail')

3.4 语法分析器

import sys, re

from handlers import *

from util import *

from rules import *

class Parser:

    def __init__(self,handler):

        self.handler = handler

        self.rules = []

        self.filters = []

    def addRule(self, rule):

        self.rules.append(rule)

    def addFilter(self,pattern,name):

        def filter(block, handler):

            return re.sub(pattern, handler.sub(name),block)

        self.filters.append(filter)

    def parse(self, file):

        self.handler.start('document')

        for block in blocks(file):

            for filter in self.filters:

                block = filter(block, self.handler)

            for rule in self.rules:

                if rule.condition(block):

                    last = rule.action(block, self.handler)

                    if last:break

        self.handler.end('document')

class BasicTextParser(Parser):

    def __init__(self,handler):

        Parser.__init__(self,handler)

        self.addRule(ListRule())

        self.addRule(ListItemRule())

        self.addRule(TitleRule())

        self.addRule(HeadingRule())

        self.addRule(ParagraphRule())

        self.addFilter(r'\*(.+?)\*', 'emphasis')

        self.addFilter(r'(http://[\.a-z0-9A-Z/]+)', 'url')

        self.addFilter(r'([\.a-zA-Z]+@[\.a-zA-Z]+[a-zA-Z]+)','mail')

handler = HTMLRenderer()

parser = BasicTextParser(handler)

parser.parse(sys.stdin)

python基础教程总结15——1.即时标记的更多相关文章

python基础教程总结15——7 自定义电子公告板
1. Python进行SQLite数据库操作简单的介绍 SQLite数据库是一款非常小巧的嵌入式开源数据库软件,也就是说没有独立的维护进程,所有的维护都来自于程序本身.它是遵守ACID的关联式数据库 ...
python基础教程总结15——6 CGI远程编辑
功能: 将文档作为普通网页显示: 在web表单的文本域内显示文档: 保存表单中的文本: 使用密码保护文档: 容易拓展,支持处理多余一个文档的情况 1.CGI CGI(Comment Gateway I ...
python基础教程总结15——5 虚拟茶话会
聊天服务器: 服务器能接受来自不同用户的多个连接: 允许用户同时(并行)操作: 能解释命令,例如,say或者logout: 容易拓展套接字和端口: 套接字是一种使用标准UNIX文件描述符(file ...
python基础教程总结15——4 新闻聚合
NNTP:网络新闻传输协议,Network News Transfer Protocol 目标: 从多种不同的来源收集新闻: 用户可以轻松添加新的新闻来源(甚至是新类型的新闻来源: 程序可以将编译好的 ...
python基础教程总结15——3 XML构建网址
要求: 网址用一个XML文件描述,其中包括独立网页和目录的信息: 程序能创建所需的目录和网页: 可以改变网址的设计,并且以新的设计为基础重新生成所有网页概念: 网站:不用存储有关网站本身的任何信息, ...
python基础教程总结15——2 画幅好画
要求:从Internet上下载数据文件: 分析数据文件并提取感兴趣的部分工具:图形生成包(ReportLab,PYX等) 数据:太阳黑子和射电辐射流量(http://services.swpc.n ...
python基础教程笔记—即时标记（详解）
最近一直在学习python,语法部分差不多看完了,想写一写python基础教程后面的第一个项目.因为我在网上看到的别人的博客讲解都并不是特别详细,仅仅是贴一下代码,书上内容照搬一下,对于当时刚学习py ...
Python基础教程(第2版修订版) pdf
Python基础教程(第2版修订版) 目录 D11章快速改造:基础知识11.1安装Python11.1.1Windows11.1.2Linux和UNIX31.1.3苹果机(Macintosh)41. ...
（Python基础教程之十三）Python中使用httplib2 – HTTP GET和POST示例
Python基础教程在SublimeEditor中配置Python环境 Python代码中添加注释 Python中的变量的使用 Python中的数据类型 Python中的关键字 Python字符串操 ...

随机推荐

linux命令之上传文件和下载文件
lrzsz-0.12.20.tar.gz是一款linux下命令行界面上支持上传和下载的第三方工具,能够起到很方便的作用. # rz 选择文件进行上传 # sz 文件名 sz后面跟文件名可以进行文件从l ...
solidity 学习笔记（7）内联汇编
为什么要有内联汇编? //普通循环和内敛汇编循环比较 pragma solidity ^0.4.25; contract Assembly{ function nativeLoop() public ...
springboot 启动
1. 新建一个java 类,名为Application,代码内容: @ServletComponentScan@SpringBootApplicationpublic class Applicatio ...
洛谷P2854 [USACO06DEC]牛的过山车Cow Roller Coaster
P2854 [USACO06DEC]牛的过山车Cow Roller Coaster 题目描述 The cows are building a roller coaster! They want you ...
lca最近公共祖先（st表/倍增）
大体思路 1.求出每个元素在树中的深度 2.用st表预处理的方法处理出f[i][j],f[i][j]表示元素i上方第2^j行对应的祖先是谁 3.将较深的点向上挪,直到两结点的深度相同 4.深度相同后, ...
IT兄弟连 Java语法教程 Java开发环境 JVM、JRE、JDK
要想开发Java程序,就需要知道什么是JVM.JRE以及JDK.JVM是运行Java程序的核心,JRE是支持Java程序运行的环境,而JDK是Java开发的核心,下面我们分别具体介绍它们以及它们之间的 ...
pytest框架（三）
pytharm运行三种方式代码示例: # coding=utf-8 import pytest class TestClass: def test_one(self): x = "this ...
如何在 Laravel 中 “规范” 的开发验证码发送功能
什么是ThinkSNS ? ThinkSNS(简称TS),一款全平台综合性社交系统,为国内外大中小企业和创业者提供社会化软件研发及技术解决方案,目前最新版本为ThinkSNS+(简称TS+).Thin ...
PJzhang：kali linux安装网易云音乐、Visual Studio Code、skype
猫宁!!! 参考链接:https://blog.csdn.net/cloudatlasm/article/details/79183583 https://code.visualstudio.com/ ...
Python-15-收集参数
允许用户提供任意数量的参数: def print_params(*params): print(params) >>> print_params('Testing') ('Tes ...

python基础教程总结15——1.即时标记

python基础教程总结15——1.即时标记的更多相关文章

随机推荐

热门专题