1.21 Python基础知识 - python常用模块-2

一、xml

　　什么是 XML?

- 　　XML 指可扩展标记语言（EXtensible Markup Language）
- 　　XML 是一种标记语言，很类似 HTML
- 　　XML 的设计宗旨是传输数据，而非显示数据
- 　　XML 标签没有被预定义。您需要自行定义标签。
- 　　XML 被设计为具有自我描述性。
- 　　XML 是 W3C 的推荐标准

　　XML 与 HTML 的主要差异

　　　　XML 不是 HTML 的替代。

　　　　XML 和 HTML 为不同的目的而设计：

　　　　XML 被设计为传输和存储数据，其焦点是数据的内容。

　　　　HTML 被设计用来显示数据，其焦点是数据的外观。

　　　　HTML 旨在显示信息，而 XML 旨在传输信息。

　　XML 使用简单的具有自我描述性的语法：

<?xml version="1.0" encoding="ISO-8859-1"?>

<note>

<to>George</to>

<from>John</from>

<heading>Reminder</heading>

<body>Don't forget the meeting!</body>

</note>

　　第一行是 XML 声明。它定义 XML 的版本 (1.0) 和所使用的编码 (ISO-8859-1 = Latin-1/西欧字符集)。

　　下一行描述文档的根元素（像在说：“本文档是一个便签”）：

<note>

　　接下来 4 行描述根的 4 个子元素（to, from, heading 以及 body）：

<to>George</to>

<from>John</from>

<heading>Reminder</heading>

<body>Don't forget the meeting!</body>

　　最后一行定义根元素的结尾：

</note>

　　XML 文档形成一种树结构

　　XML 文档必须包含根元素。该元素是所有其他元素的父元素。

　　XML 文档中的元素形成了一棵文档树。这棵树从根部开始，并扩展到树的最底端。

　　所有元素均可拥有子元素：

<root>

  <child>

    <subchild>.....</subchild>

  </child>

</root>

　　父、子以及同胞等术语用于描述元素之间的关系。父元素拥有子元素。相同层级上的子元素成为同胞（兄弟或姐妹）。

　　所有元素均可拥有文本内容和属性（类似 HTML 中）。

　　XML 的语法规则:

　　　　所有 XML 元素都须有关闭标签

　　在 HTML，经常会看到没有关闭标签的元素：

<p>This is a paragraph

<p>This is another paragraph

　在 XML 中，省略关闭标签是非法的。所有元素都必须有关闭标签：

<p>This is a paragraph</p>

<p>This is another paragraph</p>

注释：您也许已经注意到 XML 声明没有关闭标签。这不是错误。声明不属于XML本身的组成部分。它不是 XML 元素，也不需要关闭标签。

　　　　XML 标签对大小写敏感

　　XML 元素使用 XML 标签进行定义。

　　XML 标签对大小写敏感。在 XML 中，标签 <Letter> 与标签 <letter> 是不同的。

　　必须使用相同的大小写来编写打开标签和关闭标签：

<Message>这是错误的。</message>

<message>这是正确的。</message>

注释：打开标签和关闭标签通常被称为开始标签和结束标签。不论您喜欢哪种术语，它们的概念都是相同的。

　　XML 必须正确地嵌套

　　在 HTML 中，常会看到没有正确嵌套的元素：

<b><i>This text is bold and italic</b></i>

　　在 XML 中，所有元素都必须彼此正确地嵌套：

<b><i>This text is bold and italic</i></b>

　　在上例中，正确嵌套的意思是：由于 <i> 元素是在 <b> 元素内打开的，那么它必须在 <b> 元素内关闭。

　　XML 文档必须有根元素

　　XML 文档必须有一个元素是所有其他元素的父元素。该元素称为根元素。

<root>

  <child>

    <subchild>.....</subchild>

  </child>

</root>

　　XML 的属性值须加引号

　　与 HTML 类似，XML 也可拥有属性（名称/值的对）。

　　在 XML 中，XML 的属性值须加引号。请研究下面的两个 XML 文档。第一个是错误的，第二个是正确的：

<note date=08/08/2008>

<to>George</to>

<from>John</from>

</note>

<note date="08/08/2008">

<to>George</to>

<from>John</from>

</note>

　　在第一个文档中的错误是，note 元素中的 date 属性没有加引号。

　　实体引用

　　在 XML 中，一些字符拥有特殊的意义。

　　如果你把字符 "<" 放在 XML 元素中，会发生错误，这是因为解析器会把它当作新元素的开始。

　　这样会产生 XML 错误：

<message>if salary < 1000 then</message>

　　为了避免这个错误，请用实体引用来代替 "<" 字符：

<message>if salary &lt; 1000 then</message>

　　在 XML 中，有 5 个预定义的实体引用：

<	<	小于
>	>	大于
&	&	和号
'	'	单引号
"	"	引号

注释：在 XML 中，只有字符 "<" 和 "&" 确实是非法的。大于号是合法的，但是用实体引用来代替它是一个好习惯。

　　XML 中的注释

　　XML 中编写注释的语法与 HTML 的语法很相似：

<!-- This is a comment -->

　　在 XML 中，空格会被保留

　　HTML 会把多个连续的空格字符裁减（合并）为一个：

HTML:    Hello           my name is David.

输出:    Hello my name is David.

　　在 XML 中，文档中的空格不会被删节。

　　XML 以 LF 存储换行

　　在 Windows 应用程序中，换行通常以一对字符来存储：回车符 (CR) 和换行符 (LF)。这对字符与打字机设置新行的动作有相似之处。在 Unix 应用程序中，新行以 LF 字符存储。而 Macintosh 应用程序使用 CR 来存储新行。

　　xml文件操作：

创建：

import xml.etree.ElementTree as ET

# 生成根节点

namelist = ET.Element("NameList")

# 在根节点增加student节点，增加相应的属性值

student = ET.SubElement(namelist, "student", attrib={"enrolled": "yes"})

# 增加student节点的子节点

name = ET.SubElement(student,'name')

name.text = "Alice"

age = ET.SubElement(student, "age")

age.text = ''

sex = ET.SubElement(student, "sex")

sex.text = 'girl'

# 增加第二个节点对象

student2 = ET.SubElement(namelist, "name", attrib={"enrolled": "no"})

name2 = ET.SubElement(student2,'name')

name2.text = 'John'

age2 = ET.SubElement(student2, "age")

age2.text = ''

sex2 = ET.SubElement(student2,'sex')

sex2.text = 'boy'

# 将根节点对象namelist生成文档对象

et = ET.ElementTree(namelist)

# 写入文件

et.write("test3.xml", encoding="utf-8", xml_declaration=True)

# 打印生成的格式

ET.dump(namelist)

访问：

# 创建xml对象

tree = ET.parse("test.xml")

root = tree.getroot()

print(root.tag)

# 遍历文件

for child in root:

    print(child.tag,child.attrib)

    for i in child:

        print("\t",i.tag,i.text)

# 只遍历year节点

for node in root.iter('year'):

    print(node.tag,node.text)

修改：

# 修改"year"节点，并增加属性

for node in root.iter('year'):

    # 对year的值增加1

    new_year = int(node.text) + 1

    node.text = str(new_year)

    # 设置新的属性

    node.set("check", "yes")

# 修完完毕，写入到文件

tree.write("xmltest.xml")

删除：

# 删除

# 遍历查找根文件内容为“country”

for country in root.findall("country"):

    #“country”中查找节点为“rank”大于值50，进行删除

    if int(country.find("rank").text) > 50 :

        root.remove(country)

# 写入新的文件

tree.write("test2.xml")

二、configparser

　　1.基本的读取配置文件

-read(filename) 直接读取ini文件内容

-sections() 得到所有的section，并以列表的形式返回

-options(section) 得到该section的所有option

-items(section) 得到该section的所有键值对

-get(section,option) 得到section中option的值，返回为string类型

-getint(section,option) 得到section中option的值，返回为int类型，还有相应的getboolean()和getfloat() 函数。

　　2.基本的写入配置文件

-add_section(section) 添加一个新的section

-set( section, option, value) 对section中的option进行设置，需要调用write将内容写入配置文件。

　　3.基本例子

- 创建

# 在config对象中增加‘DEFAULT’模块，并赋予参数值

conf["DEFAULT"] = {'ServerAliveInterval': '',

                      'Compression': 'yes',

                     'CompressionLevel': ''}

# 在config对象中增加‘bitbucket.org’模块

conf['bitbucket.org'] = {}

# ‘bitbucket.org’模块，增加参数值，与字典的key，value一样

conf['bitbucket.org']['User'] = 'hg'

#  在config对象中增加‘topsecret.server.com’模块

conf['topsecret.server.com'] = {}

# key赋值给新的变量topsecret，方便使用

topsecret = conf['topsecret.server.com']

topsecret['Host Port'] = ''   # mutates the parser

topsecret['ForwardX11'] = 'no'  # same here

# 对‘DEFAULT’增加ForwardX11的参数

conf['DEFAULT']['ForwardX11'] = 'yes'

# 调用config对象的write方法，写入文件对象f

with open('conf.ini', 'w') as f:

    conf.write(f)

- 访问：

import configparser

# 生成config对象

conf = configparser.ConfigParser()

conf.read('conf.ini')

# 打印section列表

print(conf.sections())

# 打印defaults默认值

print(conf.defaults())

for k,v in conf.defaults().items():

    print(k,v)

# 条件判断

print('bitbucket.org' in conf)

# 读取参数值

section_name = conf.sections()[1]

print(conf[section_name]['host port'])

# 循环节点

for k,v in conf[section_name].items():

    print(k,v)

# 打印全局变量和自身局部变量值

print(conf.options(section_name))

# 以列表方式返回key和value

print(conf.items(section_name))

# 获取其中key的value

print(conf.get(section_name,'compression'))

- 修改

# 判断选项在不在,修改值
print(conf.has_option(section_name,'op'))
conf.set(section_name,'host port','3000')

conf.write(open('conf2.ini','w'))

# 判断section是否存在

flag = conf.has_section('zhanghk.com')

print(flag)

# 增加section

if not flag:

    conf.add_section('zhanghk.com')

conf.set('zhanghk.com', 'age', '')

conf.write(open('conf.ini','w'))

- 删除

# 删除

conf.remove_option(section_name,'forwardx11')

三、hashlib

　　hashlib是python中的加密模块，模块中不同算法，使对象进行加密计算

　　基本操作：

import hashlib

# 生成md5对象

hash = hashlib.md5()

# 更新MD5值

# 打印md5的16进制数值

hash.update(b'China')

print('md5-1:',hash.hexdigest())

hash.update(b'Beijing')

print('md5-2:',hash.hexdigest())

hash2 = hashlib.md5()

hash2.update(b'ChinaBeijing')

print('(md5-1)+(md5+2):',hash2.hexdigest())

#-----------sha256------------

hash = hashlib.sha256()

hash.update(b'China')

print('sha256:',hash.hexdigest())

#-----------sha512------------

hash = hashlib.sha512()

hash.update(b'China')

print('sha512:',hash.hexdigest())

输出结果

md5-1: ae54a5c026f31ada088992587d92cb3a

md5-2: 38b16715f2e07a1fa35ab3ce0eec8adf

(md5-1)+(md5+2): 38b16715f2e07a1fa35ab3ce0eec8adf

sha256: 10436829032f361a3de50048de41755140e581467bc1895e6c1a17f423e42d10

sha512: d6afe88ac526c5e817a9877009da962246da51846e688dfc5f40f45ef8c2fa97b8cbb3f2972e70fd1e9d5f612e1d788b1f20fa81e412bac00a1688c0d31fc059

　　补充：在python中除了hashlib加密模块，还有一个名为hmac的密码模块，使用键值对的方式加密，请看示例：

import hmac

h = hmac.new(b'China')

h.update(b'Beijing')

print(h.hexdigest())

输出结果

26466eae20d85ed7743ececc2ebec46c

四、logging

　　logging是python中的日志功能模块。

　　基本：

import logging

logging.warning('This is warning log!')

logging.critical('This is warning log!')

输出

WARNING:root:This is warning log!

CRITICAL:root:This is warning log!

　　日志级别：

Level Numeric value

CRITICAL 50

ERROR 40

WARNING 30

INFO 20

DEBUG 10

NOTSET 0

Level When it’s used

DEBUG
Detailed information, typically of interest only when diagnosing problems.

（详细信息，通常仅在诊断问题时）

INFO
Confirmation that things are working as expected.

（确认工作正常运行的事情。）

WARNING
An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

（表示意外发生，或表示在不久的将来出现一些问题（例如“磁盘空间低”）。该软件仍然按预期工作。）

ERROR
Due to a more serious problem, the software has not been able to perform some function.

（由于更严重的问题，软件无法执行某些功能。）

CRITICAL
A serious error, indicating that the program itself may be unable to continue running.

（一个严重的错误，指示程序本身可能无法继续运行。）

Level	Numeric value
`CRITICAL`	50
`ERROR`	40
`WARNING`	30
`INFO`	20
`DEBUG`	10
`NOTSET`	0

Level	When it’s used
`DEBUG`	Detailed information, typically of interest only when diagnosing problems. （详细信息，通常仅在诊断问题时）
`INFO`	Confirmation that things are working as expected. （确认工作正常运行的事情。）
`WARNING`	An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. （表示意外发生，或表示在不久的将来出现一些问题（例如“磁盘空间低”）。该软件仍然按预期工作。）
`ERROR`	Due to a more serious problem, the software has not been able to perform some function. （由于更严重的问题，软件无法执行某些功能。）
`CRITICAL`	A serious error, indicating that the program itself may be unable to continue running. （一个严重的错误，指示程序本身可能无法继续运行。）

　　写入文件：

# 写入文件操作

logging.basicConfig(filename='test.log',level=20)

"""

    filename  Specifies that a FileHandler be created, using the specified

              filename, rather than a StreamHandler.

    filemode  Specifies the mode to open the file, if filename is specified

              (if filemode is unspecified, it defaults to 'a').

    format    Use the specified format string for the handler.

    datefmt   Use the specified date/time format.

    style     If a format string is specified, use this to specify the

              type of format string (possible values '%', '{', '$', for

              %-formatting, :meth:`str.format` and :class:`string.Template`

              - defaults to '%').

    level     Set the root logger level to the specified level.

    stream    Use the specified stream to initialize the StreamHandler. Note

              that this argument is incompatible with 'filename' - if both

              are present, 'stream' is ignored.

    handlers  If specified, this should be an iterable of already created

              handlers, which will be added to the root handler. Any handler

              in the list which does not have a formatter assigned will be

              assigned the formatter created in this function.

"""

logging.warning('this is warnig!')

logging.info('this is info!')

logging.debug('this is debug!')

生成了一个名为test.log的文件，内容如下：

WARNING:root:this is warnig!

INFO:root:this is info!

我们发现，并没有debug的日志，那是因为我们设置日志的级别为info级别，debug级别的信息低于info级别，将不会被保存。

我们通常见到的日志除了日志信息，还有就是时间，接下来我们为日志增加记录时间

logging.basicConfig(format='%(asctime)s %(message)s',filename='test.log',level=20,datefmt='%Y-%m-%d %H:%M:%S')

输出效果

2017-03-19 13:26:41  this is warnig!

2017-03-19 13:26:41  this is info!

这里我们增加了参数format，指定了日志的输出格式，以及时间格式的datefmt，日志的格式具体参数描述，请看下面的表格。

　　日志格式参数：

%(name)s            Name of the logger (logging channel)
%(levelno)s         Numeric logging level for the message (DEBUG, INFO,
                    WARNING, ERROR, CRITICAL)
%(levelname)s       Text logging level for the message ("DEBUG", "INFO",
                    "WARNING", "ERROR", "CRITICAL")
%(pathname)s        Full pathname of the source file where the logging
                    call was issued (if available)
%(filename)s        Filename portion of pathname
%(module)s          Module (name portion of filename)
%(lineno)d          Source line number where the logging call was issued
                    (if available)
%(funcName)s        Function name
%(created)f         Time when the LogRecord was created (time.time()
                    return value)
%(asctime)s         Textual time when the LogRecord was created
%(msecs)d           Millisecond portion of the creation time
%(relativeCreated)d Time in milliseconds when the LogRecord was created,
                    relative to the time the logging module was loaded
                    (typically at application startup time)
%(thread)d          Thread ID (if available)
%(threadName)s      Thread name (if available)
%(process)d         Process ID (if available)
%(message)s         The result of record.getMessage(), computed just as
                    the record is emitted

　　配置屏幕输出和文件记录的同时操作

import logging

# 首先生成logging对象

logger = logging.getLogger('Log')

logger.setLevel(logging.DEBUG)

# 生成屏幕输出对象，配置日志级别

ch = logging.StreamHandler()

ch.setLevel(logging.DEBUG)

# 生成文件记录器对象，配置日志级别

fh = logging.FileHandler('access.log',encoding='utf-8')

fh.setLevel(logging.INFO)

# 配置两个模式的日志格式

ch_formatter = logging.Formatter('%(asctime)s %(filename)s %(lineno)d - [%(levelname)s] %(message)s')

fh_formatter = logging.Formatter('%(asctime)s - [%(levelname)s] %(message)s')

# 将两种日志格式分别放进对应的对象中

ch.setFormatter(ch_formatter)

fh.setFormatter(fh_formatter)

# 将连个对象添加到logging对象中

logger.addHandler(ch)

logger.addHandler(fh)

# 模拟输出操作

logger.error('this is error!')

logger.warning('this is warning!')

logger.info('this is info!')

logger.debug('this is debug!')

屏幕输出效果

2017-03-19 13:39:49,758 logging_mod.py 29 - [ERROR] this is error!

2017-03-19 13:39:49,758 logging_mod.py 30 - [WARNING] this is warning!

2017-03-19 13:39:49,758 logging_mod.py 31 - [INFO] this is info!

2017-03-19 13:39:49,759 logging_mod.py 32 - [DEBUG] this is debug!

文件内容

2017-03-19 13:39:49,758 - [ERROR] this is error!

2017-03-19 13:39:49,758 - [WARNING] this is warning!

2017-03-19 13:39:49,758 - [INFO] this is info!

　　日志文件自动切割

日志的文件的切割还要接入另一个方法：logging中的handlers的属性，日志切割可以按文件大小截断，按日期截断

- RotatingFileHandler：文件大小截断器

- TimedRotatingFileHandler：文件时间截断器

示例代码：

import logging

import time

from logging import handlers

logger = logging.getLogger('TEST')

log_file = "timelog.log"

# fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3,encoding='utf-8')

fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=5)

formatter = logging.Formatter('%(asctime)s - [%(levelname)s] %(message)s')

fh.setFormatter(formatter)

logger.addHandler(fh)

flag = 0

while flag < 10:

    time.sleep(2)

    logger.warning("test1")

    flag += 1

时间截断器，参数when支持的参数

# S - Seconds

# M - Minutes

# H - Hours

# D - Days

# midnight - roll over at midnight

# W{0-6} - roll over on a certain day; 0 - Monday