4月9日 python学习总结模块

1、XML模块　　

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式，就是通过<>节点来区别数据结构的

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml：

# print(root.iter('year')) #全文搜索

# print(root.find('country')) #在root的子节点找，只找一个

# print(root.findall('country')) #在root的子节点找，找所有

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")

root = tree.getroot()

print(root.tag)

#遍历xml文档

for child in root:

    print('========>',child.tag,child.attrib,child.attrib['name'])

    for i in child:

        print(i.tag,i.attrib,i.text)

#只遍历year 节点

for node in root.iter('year'):

    print(node.tag,node.text)

#---------------------------------------

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")

root = tree.getroot()

#修改

for node in root.iter('year'):

    new_year=int(node.text)+1

    node.text=str(new_year)

    node.set('updated','yes')

    node.set('version','1.0')

tree.write('test.xml')

#删除node

for country in root.findall('country'):

   rank = int(country.find('rank').text)

   if rank > 50:

     root.remove(country)

tree.write('output.xml')

#在country内添加（append）节点year2

import xml.etree.ElementTree as ET

tree = ET.parse("a.xml")

root=tree.getroot()

for country in root.findall('country'):

    for year in country.findall('year'):

        if int(year.text) > 2000:

            year2=ET.Element('year2')

            year2.text='新年'

            year2.attrib={'update':'yes'}

            country.append(year2) #往country节点下添加子节点

tree.write('a.xml.swap')

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")

name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})

age = ET.SubElement(name,"age",attrib={"checked":"no"})

sex = ET.SubElement(name,"sex")

sex.text = '33'

name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})

age = ET.SubElement(name2,"age")

age.text = '19'

et = ET.ElementTree(new_xml) #生成文档对象

et.write("test.xml", encoding="utf-8",xml_declaration=True)

ET.dump(new_xml) #打印生成的格式

2、re模块　　

正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

生活中处处都是正则：

比如我们描述：4条腿

　　你可能会想到的是四条腿的动物或者桌子，椅子等

继续描述：4条腿，活的

就只剩下四条腿的动物这一类了

应用：

# =================================匹配模式=================================

#一对一的匹配

# 'hello'.replace(old,new)

# 'hello'.find('pattern')

#正则匹配

import re

#\w与\W

print(re.findall('\w','hello egon 123')) #['h', 'e', 'l', 'l', 'o', 'e', 'g', 'o', 'n', '1', '2', '3']

print(re.findall('\W','hello egon 123')) #[' ', ' ']

#\s与\S

print(re.findall('\s','hello  egon  123')) #[' ', ' ', ' ', ' ']

print(re.findall('\S','hello  egon  123')) #['h', 'e', 'l', 'l', 'o', 'e', 'g', 'o', 'n', '1', '2', '3']

#\n \t都是空,都可以被\s匹配

print(re.findall('\s','hello \n egon \t 123')) #[' ', '\n', ' ', ' ', '\t', ' ']

#\n与\t

print(re.findall(r'\n','hello egon \n123')) #['\n']

print(re.findall(r'\t','hello egon\t123')) #['\n']

#\d与\D

print(re.findall('\d','hello egon 123')) #['1', '2', '3']

print(re.findall('\D','hello egon 123')) #['h', 'e', 'l', 'l', 'o', ' ', 'e', 'g', 'o', 'n', ' ']

#\A与\Z

print(re.findall('\Ahe','hello egon 123')) #['he'],\A==>^

print(re.findall('123\Z','hello egon 123')) #['he'],\Z==>$

#^与$

print(re.findall('^h','hello egon 123')) #['h']

print(re.findall('3$','hello egon 123')) #['3']

# 重复匹配：| . | * | ? | .* | .*? | + | {n,m} |

#.

print(re.findall('a.b','a1b')) #['a1b']

print(re.findall('a.b','a1b a*b a b aaab')) #['a1b', 'a*b', 'a b', 'aab']

print(re.findall('a.b','a\nb')) #[]

print(re.findall('a.b','a\nb',re.S)) #['a\nb']

print(re.findall('a.b','a\nb',re.DOTALL)) #['a\nb']同上一条意思一样

#*

print(re.findall('ab*','bbbbbbb')) #[]

print(re.findall('ab*','a')) #['a']

print(re.findall('ab*','abbbb')) #['abbbb']

#?

print(re.findall('ab?','a')) #['a']

print(re.findall('ab?','abbb')) #['ab']

#匹配所有包含小数在内的数字

print(re.findall('\d+\.?\d*',"asdfasdf123as1.13dfa12adsf1asdf3")) #['123', '1.13', '12', '1', '3']

#.*默认为贪婪匹配

print(re.findall('a.*b','a1b22222222b')) #['a1b22222222b']

#.*?为非贪婪匹配：推荐使用

print(re.findall('a.*?b','a1b22222222b')) #['a1b']

#+

print(re.findall('ab+','a')) #[]

print(re.findall('ab+','abbb')) #['abbb']

#{n,m}

print(re.findall('ab{2}','abbb')) #['abb']

print(re.findall('ab{2,4}','abbb')) #['abb']

print(re.findall('ab{1,}','abbb')) #'ab{1,}' ===> 'ab+'

print(re.findall('ab{0,}','abbb')) #'ab{0,}' ===> 'ab*'

#[]

print(re.findall('a[1*-]b','a1b a*b a-b')) #[]内的都为普通字符了，且如果-没有被转意的话，应该放到[]的开头或结尾

print(re.findall('a[^1*-]b','a1b a*b a-b a=b')) #[]内的^代表的意思是取反，所以结果为['a=b']

print(re.findall('a[0-9]b','a1b a*b a-b a=b')) #[]内的^代表的意思是取反，所以结果为['a=b']

print(re.findall('a[a-z]b','a1b a*b a-b a=b aeb')) #[]内的^代表的意思是取反，所以结果为['a=b']

print(re.findall('a[a-zA-Z]b','a1b a*b a-b a=b aeb aEb')) #[]内的^代表的意思是取反，所以结果为['a=b']

#\# print(re.findall('a\\c','a\c')) #对于正则来说a\\c确实可以匹配到a\c,但是在python解释器读取a\\c时，会发生转义，然后交给re去执行，所以抛出异常

print(re.findall(r'a\\c','a\c')) #r代表告诉解释器使用rawstring，即原生字符串，把我们正则内的所有符号都当普通字符处理，不要转义

print(re.findall('a\\\\c','a\c')) #同上面的意思一样，和上面的结果一样都是['a\\c']

#():分组

print(re.findall('ab+','ababab123')) #['ab', 'ab', 'ab']

print(re.findall('(ab)+123','ababab123')) #['ab']，匹配到末尾的ab123中的ab

print(re.findall('(?:ab)+123','ababab123')) #findall的结果不是匹配的全部内容，而是组内的内容,?:可以让结果为匹配的全部内容

print(re.findall('href="(.*?)"','<a href="http://www.baidu.com">点击</a>'))#['http://www.baidu.com']

print(re.findall('href="(?:.*?)"','<a href="http://www.baidu.com">点击</a>'))#['href="http://www.baidu.com"']

#|

print(re.findall('compan(?:y|ies)','Too many companies have gone bankrupt, and the next one is my company'))

# ===========================re模块提供的方法介绍===========================

import re

#1

print(re.findall('e','alex make love') )   #['e', 'e', 'e'],返回所有满足匹配条件的结果,放在列表里

#2

print(re.search('e','alex make love').group()) #e,只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以通过调用group()方法得到匹配的字符串,如果字符串没有匹配，则返回None。

#3

print(re.match('e','alex make love'))    #None,同search,不过在字符串开始处进行匹配,完全可以用search+^代替match

#4

print(re.split('[ab]','abcd'))     #['', '', 'cd']，先按'a'分割得到''和'bcd',再对''和'bcd'分别按'b'分割

#5

print('===>',re.sub('a','A','alex make love')) #===> Alex mAke love，不指定n，默认替换所有

print('===>',re.sub('a','A','alex make love',1)) #===> Alex make love

print('===>',re.sub('a','A','alex make love',2)) #===> Alex mAke love

print('===>',re.sub('^(\w+)(.*?\s)(\w+)(.*?\s)(\w+)(.*?)$',r'\5\2\3\4\1','alex make love')) #===> love make alex

print('===>',re.subn('a','A','alex make love')) #===> ('Alex mAke love', 2),结果带有总共替换的个数

#6

obj=re.compile('\d{2}')

print(obj.search('abc123eeee').group()) #12

print(obj.findall('abc123eeee')) #['12'],重用了obj

import re

print(re.findall("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>")) #['h1']

print(re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>").group()) #<h1>hello</h1>

print(re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>").groupdict()) #<h1>hello</h1>

print(re.search(r"<(\w+)>\w+</(\w+)>","<h1>hello</h1>").group())

print(re.search(r"<(\w+)>\w+</\1>","<h1>hello</h1>").group())

补充一

4月9日 python学习总结模块的更多相关文章

4月8日 python学习总结模块与包
一.包 #官网解释 Packages are a way of structuring Python's module namespace by using "dotted module n ...
4月10日 python学习总结模块和面向对象
1.hashlib 1.什么叫hash:hash是一种算法,该算法接受传入的内容,经过运算得到一串hash值 2.hash值的特点是:2.1 只要传入的内容一样,得到的hash值必然一样=====& ...
4月2日 python学习总结
昨天内容回顾: 1.迭代器可迭代对象: 只要内置有__iter__方法的都是可迭代的对象既有__iter__,又有__next__方法调用__iter__方法==>得到内置的迭代器对象调 ...
4月12日 python学习总结继承和派生
一.继承什么是继承: 继承是一种新建类的方式,在python中支持一个子类继承多个父类新建类称为子类或派生类父类可以称之为基类或者超类子类会遗传父类的属性 2. 为什么继承 ...
4月11日 python学习总结对象与类
1.类的定义 #类的定义 class 类名: 属性='xxx' def __init__(self): self.name='enon' self.age=18 def other_func: pas ...
6月4日 python学习总结初次接触jQuery
1. jQuery是什么?是一个轻量级的,兼容多浏览器的JS库(write less, do more) 1. 是一个工具,简单方便的实现一些DOM操作 2. 不用jQuery完全可以,但是不明智. ...
5月16日 python学习总结 DBUtils模块、orm 和 cookie、session、token
一.DBUtils模块介绍 The DBUtils suite is realized as a Python package containing two subsets of modules, ...
5月11日 python学习总结子查询、pymysql模块增删改查、防止sql注入问题
一.子查询子查询:把一个查询语句用括号括起来,当做另外一条查询语句的条件去用,称为子查询 select emp.name from emp inner join dep on emp.dep_id ...
4月19日 python学习总结套接字模块的使用
服务端: import socket phone=socket.socket(socket.AF_INET,socket.SOCK_STREAM) # 买电话 phone.bind(('127.0.0 ...

随机推荐

Java判断是否是质数
public static boolean isPrime(int num) { /* * 质数定义:只有1和它本身两个因数的自然数 * * 1. 小于等于1或者是大于2的偶数,直接返回false * ...
用Express 创建项目
1.Node.js Express 框架安装:npm install express --save在当前目录下创建一个node_modules 2.安装必要的中间件npm install body-p ...
PHP面试常考内容之面向对象（1）
PHP中面向对象常考的知识点有以下几点,我将会从以下几点进行详细介绍说明,帮助你更好的应对PHP面试常考的面向对象相关的知识点和考题. 整个面向对象文章的结构涉及的内容模块有: 一.面向对象与面向过程 ...
[LeetCode]3.无重复字符的最长子串（Java）
原题地址: longest-substring-without-repeating-characters/submissions 题目描述: 示例 1: 输入: s = "pwwkew&qu ...
EMNLP 2017 | Sparse Communication for Distributed Gradient Descent
通过将分布式随机梯度下降(SGD)中的稠密更新替换成稀疏更新可以显著提高训练速度.当大多数更新接近于0时,梯度更新会出现正偏差,因此我们将99%最小更新(绝对值)映射为零,然后使用该稀疏矩阵替换原来的 ...
Cesium 加载地形数据
1.注册Cesium Ion账号,注册地址:Sign In | Cesium ion 否则,加载数据会报错{code: "InvalidCredentials", message: ...
Thread、ThreadPool 和 Task
对 C# 开发者来说,不可不理解清楚 Thread.ThreadPool 和 Task 这三个概念.这也是面试频率很高的话题,在 StackOverflow 可以找到有很多不错的回答,我总结整理了一下 ...
C# 9.0元组（ValueTuple）详细解说
元组 (ValueTuple)类型是值类型:元组元素是公共字段,可以使用任意数量的元素定义元组.Tuple类型像一个口袋,在出门前可以把所需的任何东西一股脑地放在里面.您可以将钥匙.驾驶证.便笺簿和钢 ...
【C#基础概念】vs2019 代码段
打开记事本,输入下面代码,然后把文件后缀改为.snippet .然后通过vs2019 工具>代码段管理导入. <?xml version="1.0" encoding= ...
WPF中ComboBox控件的SelectedItem和SelectedValue的MVVM绑定
问题描述:左侧是一个ListView控件,用于显示User类的Name属性,右侧显示其SelectedItem的其他属性,包括Age, Address,和Category.其中Category用Com ...

4月9日 python学习总结 模块

4月9日 python学习总结 模块的更多相关文章

随机推荐

热门专题

4月9日 python学习总结模块

4月9日 python学习总结模块的更多相关文章