The sixth day of Crawler learning
爬取我爱竞赛网的大量数据
首先获取每一种比赛信息的分类链接

def get_type_url(url):
web_data = requests.get(web_url)
soup = BeautifulSoup(web_data.text, 'lxml')
types = soup.select("#mn_P1_menu li a")
for type in types:
print(type.get_text())
get_num(type.get("href"))
然后获取每一个分类连接中的总页数

def get_num(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
num = soup.select(".pg span")
# 部分页面没有分页只有一页,需要分类一下
if(num!=[]):
i = int(num[0].get_text().split(" ")[2])
for w in range(1, i):
print("第"+str(w)+"页")
urls = url + "index.php?page={}".format(str(w))
get_message_url(urls)
else:
get_message_url(url)
最后获取每一页中各个比赛的信息

def get_message_url(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
titles = soup.select(".xld .xs2_tit a")
views = soup.select("span.chakan")
post_times = soup.select("div.list_info")
for title, view, post_time in zip(titles, views, post_times):
data = {
"标题": title.get_text(),
"浏览量": view.get_text().strip(),
"发布时间": post_time.get_text().strip().split(" ")[0],
"链接": title.get("href")
}
print(data)
The sixth day of Crawler learning的更多相关文章
- The fifth day of Crawler learning
使用mongoDB 下载地址:https://www.mongodb.com/dr/fastdl.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl ...
- The fourth day of Crawler learning
爬取58同城 from bs4 import BeautifulSoupimport requestsurl = "https://qd.58.com/diannao/35200617992 ...
- The third day of Crawler learning
连续爬取多页数据 分析每一页url的关联找出联系 例如虎扑 第一页:https://voice.hupu.com/nba/1 第二页:https://voice.hupu.com/nba/2 第三页: ...
- The second day of Crawler learning
用BeatuifulSoup和Requests爬取猫途鹰网 服务器与本地的交换机制 我们每次浏览网页都是再向网页所在的服务器发送一个Request,然后服务器接受到Request后返回Response ...
- The first day of Crawler learning
使用BeautifulSoup解析网页 Soup = BeautifulSoup(urlopen(html),'lxml') Soup为汤,html为食材,lxml为菜谱 from bs4 impor ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- AspNetPager 样式
使用方法: 1.引入样式表. 将 想要使用的样式表加入到本页面<style type="text/css"></style>标记中,或者新建一个css文件如 ...
- 13-4 jquery操作标签(文本,属性,class,value)
一 文本操作 $().html() $().text() 文本赋值操作 $().html("") $().text("") 二 属性操作 $().attr(属性 ...
- 枚举类型的数据存入到map中
阅读更多 原文来自http://fokman.iteye.com/blog/1568905 public enum IdeasCMD { RESERVED(0), PING(1), PING_ACK( ...
- Data Flow-File Read-详细过程
- Python基础:22__slots__类属性
1:工厂函数 由于类型和类的统一,因而可以子类化Python数据类型.但是所有的Python 内建的转换函数现在都是工厂函数.当这些函数被调用时,你实际上是对相应的类型进行实例化.比如下面的函数都已经 ...
- Android ListView批量选择(全选、反选、全不选)
APP的开发中,会常遇到这样的需求:批量取消(删除)List中的数据.这就要求ListVIew支持批量选择.全选.单选等等功能,做一个比较强大的ListView批量选择功能是很有必要的,那如何做呢? ...
- 多版本python共存,安装三方库到指定python版本
多版本python安装过程略过不提提供完美解决python启动和各版本pip问题: python3下pip安装命令如下: py -3 -m pip install xxxxxx python2下pip ...
- 使用 Laravel-Excel 进行 CSV/EXCEL 文件读写
https://blog.csdn.net/yiluohan0307/article/details/80229978 http://www.ptbird.cn/laravel-excel-csv.h ...
- jq实现超级简单的隔行变色
文章地址:https://www.cnblogs.com/sandraryan/ <!DOCTYPE html> <html lang="en"> < ...
- 洛谷P4136 谁能赢呢? 题解 博弈论
题目链接:https://www.luogu.org/problem/P4136 找规律 首先这道题目我没有什么思路,所以一开始想到的是通过搜索来枚举 \(n\) 比较小的时候的情况. 所以我开搜索枚 ...