The sixth day of Crawler learning
爬取我爱竞赛网的大量数据
首先获取每一种比赛信息的分类链接

def get_type_url(url):
web_data = requests.get(web_url)
soup = BeautifulSoup(web_data.text, 'lxml')
types = soup.select("#mn_P1_menu li a")
for type in types:
print(type.get_text())
get_num(type.get("href"))
然后获取每一个分类连接中的总页数

def get_num(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
num = soup.select(".pg span")
# 部分页面没有分页只有一页,需要分类一下
if(num!=[]):
i = int(num[0].get_text().split(" ")[2])
for w in range(1, i):
print("第"+str(w)+"页")
urls = url + "index.php?page={}".format(str(w))
get_message_url(urls)
else:
get_message_url(url)
最后获取每一页中各个比赛的信息

def get_message_url(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
titles = soup.select(".xld .xs2_tit a")
views = soup.select("span.chakan")
post_times = soup.select("div.list_info")
for title, view, post_time in zip(titles, views, post_times):
data = {
"标题": title.get_text(),
"浏览量": view.get_text().strip(),
"发布时间": post_time.get_text().strip().split(" ")[0],
"链接": title.get("href")
}
print(data)
The sixth day of Crawler learning的更多相关文章
- The fifth day of Crawler learning
使用mongoDB 下载地址:https://www.mongodb.com/dr/fastdl.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl ...
- The fourth day of Crawler learning
爬取58同城 from bs4 import BeautifulSoupimport requestsurl = "https://qd.58.com/diannao/35200617992 ...
- The third day of Crawler learning
连续爬取多页数据 分析每一页url的关联找出联系 例如虎扑 第一页:https://voice.hupu.com/nba/1 第二页:https://voice.hupu.com/nba/2 第三页: ...
- The second day of Crawler learning
用BeatuifulSoup和Requests爬取猫途鹰网 服务器与本地的交换机制 我们每次浏览网页都是再向网页所在的服务器发送一个Request,然后服务器接受到Request后返回Response ...
- The first day of Crawler learning
使用BeautifulSoup解析网页 Soup = BeautifulSoup(urlopen(html),'lxml') Soup为汤,html为食材,lxml为菜谱 from bs4 impor ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- 2019-6-23-WPF-获得当前输入法语言区域
title author date CreateTime categories WPF 获得当前输入法语言区域 lindexi 2019-06-23 11:51:21 +0800 2018-10-12 ...
- laravel进阶系列--通过事件和事件监听实现服务解耦
简介 Laravel 事件提供了简单的观察着模式实现,允许你订阅和监听应用中的事件.事件类通常存放在 app/Events 目录. 监听器存放在 app/Listeners. 如果你在应用中没有看到这 ...
- python selenium 处理悬浮窗口(baidu tj_more)
python selenium 处理悬浮窗口 from selenium.webdriver.common.action_chains import ActionChainsActionChains( ...
- 在web.xml中配置SpringMVC
代码如下 <servlet> <servlet-name>springMVC</servlet-name> <servlet-class>org.spr ...
- Rikka with Mista 线段树求交点个数
由于上下线段是不可能有交点的 可以先看左右线段树,按照y递增的顺序,对点进行排序. 升序构造,那么对于从某一点往下的射线,对于L,R进行区间覆盖,线段交点个数就是单点的被覆盖的次数. 降序构造,那么对 ...
- vue4——把输入框的内容添加到页面(简单留言板)
文章地址:https://www.cnblogs.com/sandraryan/ vue最最最简单的demo(记得引入) 实例化一个vue,绑定#app的元素,要渲染的数组arr作为data. 把ar ...
- POJ2752 Seek the Name, Seek the Fame 题解 KMP算法
题目链接:http://poj.org/problem?id=2752 题目大意:给你一个字符串 \(S\) ,如果它的一个前缀同时也是它的后缀,则输出这个前缀(后缀)的长度. 题目分析:next函数 ...
- Python--day20--序列化模块
序列化:转向一个字符串数据类型 序列 ———— 字符串 序列化和反序列化的概念: 序列化三种方法:json pickle shelve json模块:json模块提供了四个方法dumps和load ...
- Java一行代码可声明多个同类变量
Java支持一句语句声明多个同类变量. Example: String a = "Hello", c = "hello"; int x = 5, y = 5;
- 解析XML内容到User对象
users.xml <?xml version="1.0" encoding="UTF-8"?> <xml-root> <conn ...