The sixth day of Crawler learning
爬取我爱竞赛网的大量数据
首先获取每一种比赛信息的分类链接

def get_type_url(url):
web_data = requests.get(web_url)
soup = BeautifulSoup(web_data.text, 'lxml')
types = soup.select("#mn_P1_menu li a")
for type in types:
print(type.get_text())
get_num(type.get("href"))
然后获取每一个分类连接中的总页数

def get_num(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
num = soup.select(".pg span")
# 部分页面没有分页只有一页,需要分类一下
if(num!=[]):
i = int(num[0].get_text().split(" ")[2])
for w in range(1, i):
print("第"+str(w)+"页")
urls = url + "index.php?page={}".format(str(w))
get_message_url(urls)
else:
get_message_url(url)
最后获取每一页中各个比赛的信息

def get_message_url(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
titles = soup.select(".xld .xs2_tit a")
views = soup.select("span.chakan")
post_times = soup.select("div.list_info")
for title, view, post_time in zip(titles, views, post_times):
data = {
"标题": title.get_text(),
"浏览量": view.get_text().strip(),
"发布时间": post_time.get_text().strip().split(" ")[0],
"链接": title.get("href")
}
print(data)
The sixth day of Crawler learning的更多相关文章
- The fifth day of Crawler learning
使用mongoDB 下载地址:https://www.mongodb.com/dr/fastdl.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl ...
- The fourth day of Crawler learning
爬取58同城 from bs4 import BeautifulSoupimport requestsurl = "https://qd.58.com/diannao/35200617992 ...
- The third day of Crawler learning
连续爬取多页数据 分析每一页url的关联找出联系 例如虎扑 第一页:https://voice.hupu.com/nba/1 第二页:https://voice.hupu.com/nba/2 第三页: ...
- The second day of Crawler learning
用BeatuifulSoup和Requests爬取猫途鹰网 服务器与本地的交换机制 我们每次浏览网页都是再向网页所在的服务器发送一个Request,然后服务器接受到Request后返回Response ...
- The first day of Crawler learning
使用BeautifulSoup解析网页 Soup = BeautifulSoup(urlopen(html),'lxml') Soup为汤,html为食材,lxml为菜谱 from bs4 impor ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- hdu 2196【树形dp】
http://acm.hdu.edu.cn/showproblem.php?pid=2196 题意:找出树中每个节点到其它点的最远距离. 题解: 首先这是一棵树,对于节点v来说,它到达其它点的最远距离 ...
- jmeter循环和计数器
- Header和Cookie相关内容
相信很多同学都对HTTP的header和cookie,和session都有疑问,因为我们开发的时候一般都需要请求网络获取数据,有时候还需要带cookie或者带特殊的字段发起请求. 现在我们就来简单的了 ...
- uni-app设置 video开始播放进入全屏状态
有一video标签 <video id="myVideo" :src="videoUrl"></video> 获取 video 上下文 ...
- 20190527-JavaScriptの打怪升级旅行 { 语句 [ 声明 ,变量 ] }
写在前面的乱七八糟:时间总是轻易地溜走,不留一丝念想,近一个月,倒是过得有点丧,从今天开始起,已经开始接触后台了,而JavaScript也只是大致有了个分类框架,那些细枝末节还有的补,任重道远,天将降 ...
- HTML静态网页--JavaScript-简介
JavaScript简介 1.JavaScript是个什么东西? 它是个脚本语言,需要有宿主文件,它的宿主文件是HTML文件. 2.它与Java什么关系? 没有什么直接的联系,Java是Sun公司(已 ...
- tp5 select出来数据集(对象)转成数组
1.先在数据库配置文件中 //数据集返回类型 'resultset_type' => 'collection', 2.在使用时, 使用 toArray() 方法 //查询数据库 $news = ...
- 2018-8-10-使用-RetroShare-分享资源
title author date CreateTime categories 使用 RetroShare 分享资源 lindexi 2018-08-10 19:16:51 +0800 2018-02 ...
- maven 安装 环境变量设置后变成 mvn 并且Cmd Idea创建第一个项目
1.maven的安装教程 下载地址为:http://maven.apache.org/download.cgi 点击下载,然后解压,我把目录名改为maven,目录结构如下图所示 下面我们配置环境变量 ...
- C++的价值
In May 2010, the GCC steering committee decided to allow use of a C++ compiler to compile GCC. The c ...