BeautifulSoup库

<html>

    <body>

        <p class='title'></p>

    </body>

</html>

BeautifulSoup库是解析、遍历、维护、"标签树"的功能库

对标签的理解

<p class='title'></p>

<!--成对的尖括号和属性-->

导入beautifulsoup库

from bs4 import BeautifulSoup

import bs4

构造解析html的BeautifulSoup对象

from bs4 import BeautifulSoup

soup1=BeautifulSoup("<html>data</html>","html.parser")

soup2=BeautifulSoup(open("D://demo.html"),"html.parser")

BeautifulSoup库对应一个HTML/XML文档的全部内容

四种解析器

解析器	使用方法	条件
bs4的HTML解析器	BeautifulSoup(mk,'html.parser')	安装bs4库
lxml的HTML解析器	BeautifulSoup(mk,'lxml')	pip install lxml
lxml的xml解析器	BeautifulSoup(mk,'xml')	pip install lxml
html5lib的解析器	BeautifulSoup(mk,'html5lib')	pip install html5lib

五种基本元素

基本元素	说明
Tag	标签，<>开头和</>结尾
Name	标签的名字，格式.name
Attribute	标签的属性，字典形式进行组织,.attrs
NavigatableString	标签内非属性字符串，格式.string
Comment	标签内字符串注释部分

获取页面信息demo

from bs4 import BeautifulSoup

import requests

html=requests.get('http://python123.io/ws/demo.html').text

soup=Beautiful(demo,'html.parser')

tag=soup.a#获取第一个a标签

name=tag.name#'a'，标签的名称

parentName=soup.a.parent.name#获取父亲节点的名称

attr=tag.attrs#属性值，字典

attr['class']#访问对应标签的属性

type(attr)#字典

tag.a.string#标签之间的信息

newsoup=BeautifulSoup('<b><!--This is a comment-->></b><p>

This is not a comment</p>','html.parser')

type(newsoup.b.string)#注释类型

type(newsoup.p.string)#文本类型

BeautifulSoup库的基本元素的更多相关文章

python BeautifulSoup库的基本使用
Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree). 它提供简单又常用的导航(navigating),搜索以 ...
python爬虫学习(一)：BeautifulSoup库基础及一般元素提取方法
最近在看爬虫相关的东西,一方面是兴趣,另一方面也是借学习爬虫练习python的使用,推荐一个很好的入门教程:中国大学MOOC的<python网络爬虫与信息提取>,是由北京理工的副教授嵩天老 ...
Python爬虫利器：BeautifulSoup库
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. BeautifulSoup ...
BeautifulSoup库整理
BeautifulSoup库一.BeautifulSoup库的下载以及使用 1.下载 pip3 install beautifulsoup4 2.使用 improt bs4 二.BeautifulS ...
BeautifulSoup库的安装与使用
BeautifulSoup库的安装 Win平台:“以管理员身份运行” cmd 执行 pip install beautifulsoup4 演示HTML页面地址:http://python123.io/ ...
Python中的BeautifulSoup库简要总结
一.基本元素 BeautifulSoup库是解析.遍历.维护“标签树”的功能库. 引用 from bs4 import BeautifulSoup import bs4 html文档-标签树-Beau ...
requests 库和beautifulsoup库
python 爬虫和解析库的安装:pip install requests; pip install beautifulsoup4 requests 的几个常用方法: requests.reques ...
Python爬虫小白入门（三）BeautifulSoup库
# 一.前言 *** 上一篇演示了如何使用requests模块向网站发送http请求,获取到网页的HTML数据.这篇来演示如何使用BeautifulSoup模块来从HTML文本中提取我们想要的数据. ...
BeautifulSoup库children(),descendants()方法的使用
BeautifulSoup库children(),descendants()方法的使用示例网站:http://www.pythonscraping.com/pages/page3.html 网站内容 ...

随机推荐

vue作业1
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
【SaltStack官方版】—— MANAGING THE JOB CACHE
MANAGING THE JOB CACHE The Salt Master maintains a job cache of all job executions which can be quer ...
DNS预读取 dns-prefetch 提升页面载入速度
DNS Prefetch,即DNS预获取,是前端优化的一部分.一般来说,在前端优化中与 DNS 有关的有两点: 一个是减少DNS的请求次数,另一个就是进行DNS预获取 . DNS 作为互联网的基础协议 ...
golang rabbitmq实践（啰嗦）
目录 rabbitmq ubuntu下的配置 go 实现rabbitmq的消息收发 1:背景简介我是一个.net一线开发,今年6月份离开帝都来到魔都,后入职于莫江互联网在线教育公司.现刚刚转正,在这 ...
latex beamer技巧
%章节标题\section{Related work(LSH)} %开始一页ppt \begin{frame}{Related work}{} \partitle{Locality-Sensitive ...
QGIS源码解析和二次开发
使用Python 开发一个交通系统? 不如基于GeoServer来开发更能产生效益 QGIS3d:https://blog.csdn.net/shi_weihappy/article/details/ ...
AtCoder AGC002E Candy Piles (博弈论)
神仙题..表示自己智商不够想不到... 好几次读成最后拿的赢了,导致一直没看懂题解... 题目链接: https://atcoder.jp/contests/agc002/tasks/agc002_e ...
南昌网络赛 H The Nth Item
南昌网络赛The Nth Item 暴力快速幂+unordered_map记忆化注意:记忆化不能写到快速幂求解函数里,不断调用函数会造成很大的时间浪费 #include<bits/stdc++ ...
bootstrap基础讲解
Bootstrap基础简介网站链接: http://www.bootcss.com/ bootstrap优点: 下载: bootstrap的引入: <meta name="view ...
转 HTTP请求报文格式 GET和POST
https://blog.csdn.net/h517604180/article/details/79802914 最近在做安卓客户端图片上传插件功能,供后台调用.其中涉及到了拼接HTTP请求报文,所 ...

BeautifulSoup库的基本元素

BeautifulSoup库

对标签的理解

导入beautifulsoup库

构造解析html的BeautifulSoup对象

四种解析器

五种基本元素

获取页面信息demo

BeautifulSoup库的基本元素的更多相关文章

随机推荐

热门专题