python3用BeautifulSoup抓取id='xiaodeng',且正则包含‘elsie’的标签

# -*- coding:utf-8 -*-

#python 2.7

#XiaoDeng

#http://tieba.baidu.com/p/2460150866

#使用多个指定名字的参数可以同时过滤tag的多个属性

from bs4 import BeautifulSoup

import urllib.request

import re

#如果是网址，可以用这个办法来读取网页

#html_doc = "http://tieba.baidu.com/p/2460150866"

#req = urllib.request.Request(html_doc)

#webpage = urllib.request.urlopen(req)

#html = webpage.read()

html="""

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="xiaodeng"><!-- Elsie --></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

<a href="http://example.com/lacie" class="sister" id="xiaodeng">Lacie</a>

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

soup = BeautifulSoup(html, 'html.parser')

#抓取id='xiaodeng',且正则包含‘elsie’的标签

content=soup.find_all(href=re.compile("elsie"), id='xiaodeng')

#print(content)  #[<a class="sister" href="http://example.com/elsie" id="xiaodeng"><!-- Elsie --></a>]

for  k  in content:

    print(k.get('href'))  #抓取href的内容

    #http://example.com/elsie

python3用BeautifulSoup抓取id='xiaodeng',且正则包含‘elsie’的标签的更多相关文章

python3用BeautifulSoup抓取a标签
# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 from bs4 import Be ...
python3用BeautifulSoup抓取div标签
# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 #标签操作 from bs4 imp ...
python3+beautifulSoup4.6抓取某网站小说（三）网页分析，BeautifulSoup解析
本章学习内容:将网站上的小说都爬下来,存储到本地. 目标网站:www.cuiweijuxs.com 分析页面,发现一共4步:从主页进入分版打开分页列表.打开分页下所有链接.打开作品页面.打开单章内容. ...
python3+beautifulSoup4.6抓取某网站小说（四）多线程抓取
上一篇多文章,是二级目录,根目录"小说",二级目录"作品名称",之后就是小说文件. 本篇改造了部分代码,将目录设置为根目录->作者目录->作品目录- ...
Python3.x：抓取百事糗科段子
Python3.x:抓取百事糗科段子实现代码: #Python3.6 获取糗事百科的段子 import urllib.request #导入各类要用到的包 import urllib import ...
python3用BeautifulSoup抓取图片地址
# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 #抓取图片地址 from bs4 i ...
使用selenium+BeautifulSoup 抓取京东商城手机信息
1.准备工作: chromedriver 传送门:国内:http://npm.taobao.org/mirrors/chromedriver/ vpn: selenium BeautifulSo ...
利用BeautifulSoup抓取新浪网页新闻的内容
第一次写的小爬虫,python确实功能很强大,二十来行的代码抓取内容并存储为一个txt文本直接上代码 #coding = 'utf-8' import requests from bs4 impor ...
Python3利用BeautifulSoup4抓取站点小说全文的代码
再写一个用BeautifulSoup抓站的工具,体会BeautifulSoup的强大. 根据小说索引页获取小说全部章节内容并在本地整合为小说全文.不过不是智能的,不同的站点对代码需要做相应的修改. # ...

随机推荐

【深度探索C++对象模型 | 02】构造函数语意学
默认构造函数的构造操作.拷贝构造函数额构造操作注意:默认构造函数和拷贝构造函数在必要时的时候由编译器产生出来. 参考资料关于默认构造函数的几个错误认识(四种情况下,编译器会生成默认构造函数)
poj 2406 求字符串中重复子串的个数
Sample Input abcdaaaaababab.Sample Output 1 //1个abcd4 //4个a3 //3个ab #include<stdio.h> #include ...
[转]数据类型和Json格式
作者: 阮一峰日期: 2009年5月30日 1. 前几天,我才知道有一种简化的数据交换格式,叫做yaml. 我翻了一遍它的文档,看懂的地方不多,但是有一句话令我茅塞顿开. 它说,从结构上看,所有的数 ...
Mysql mysqld_safe启动与myslqd启动坑
一.用mysqld_safe启动时候无法看到报错信息. 二.用mysqld启动时候可以看到日志实时打印.
64位JDK+tomcat6+myeclipse 10安装与配置
一.安装JDK与配置环境与检验配置成功: 1.进入java.com网站,然后按照以下步骤进行 =>=>=>=>=> =>=>等会出现java茶杯双击,一次一 ...
springmvc传参问题
@RequestMapping(value = "/addHit", method = { RequestMethod.POST, RequestMethod.GET }) pub ...
python获取公网ip,本地ip及所在国家城市等相关信息收藏
python获取公网ip的几种方式 from urllib2 import urlopen my_ip = urlopen('http://ip.42.pl/raw').read() ...
Linux学习之文件系统常用命令(七)
Linux文件系统常用命令目录 df命令 du命令 fsck命令 dump2fs命令 df命令 df命令统计文件系统的占有情况,分区用了多少空间,还剩多少空间 df [选项] [挂载点] 选项: ...
pandas学习(创建多层索引、数据重塑与轴向旋转)
pandas学习(创建多层索引.数据重塑与轴向旋转) 目录创建多层索引数据重塑与轴向旋转创建多层索引隐式构造 Series 最常见的方法是给DataFrame构造函数的index参数传递两个或 ...
spring的i o c简单回顾
1.springIOC是一个创建对象的容器,他负责将我们需要的对象帮我们创建出来,创建时间是:当我们从上下文环境中读取此对象时就会帮我们创建,严格意义上来讲它是一种编程思想不是一种技术. 2.依赖注入 ...

python3用BeautifulSoup抓取id='xiaodeng',且正则包含‘elsie’的标签

python3用BeautifulSoup抓取id='xiaodeng',且正则包含‘elsie’的标签的更多相关文章

随机推荐

热门专题