The sixth day of Crawler learning
爬取我爱竞赛网的大量数据
首先获取每一种比赛信息的分类链接

def get_type_url(url):
web_data = requests.get(web_url)
soup = BeautifulSoup(web_data.text, 'lxml')
types = soup.select("#mn_P1_menu li a")
for type in types:
print(type.get_text())
get_num(type.get("href"))
然后获取每一个分类连接中的总页数

def get_num(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
num = soup.select(".pg span")
# 部分页面没有分页只有一页,需要分类一下
if(num!=[]):
i = int(num[0].get_text().split(" ")[2])
for w in range(1, i):
print("第"+str(w)+"页")
urls = url + "index.php?page={}".format(str(w))
get_message_url(urls)
else:
get_message_url(url)
最后获取每一页中各个比赛的信息

def get_message_url(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
titles = soup.select(".xld .xs2_tit a")
views = soup.select("span.chakan")
post_times = soup.select("div.list_info")
for title, view, post_time in zip(titles, views, post_times):
data = {
"标题": title.get_text(),
"浏览量": view.get_text().strip(),
"发布时间": post_time.get_text().strip().split(" ")[0],
"链接": title.get("href")
}
print(data)
The sixth day of Crawler learning的更多相关文章
- The fifth day of Crawler learning
使用mongoDB 下载地址:https://www.mongodb.com/dr/fastdl.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl ...
- The fourth day of Crawler learning
爬取58同城 from bs4 import BeautifulSoupimport requestsurl = "https://qd.58.com/diannao/35200617992 ...
- The third day of Crawler learning
连续爬取多页数据 分析每一页url的关联找出联系 例如虎扑 第一页:https://voice.hupu.com/nba/1 第二页:https://voice.hupu.com/nba/2 第三页: ...
- The second day of Crawler learning
用BeatuifulSoup和Requests爬取猫途鹰网 服务器与本地的交换机制 我们每次浏览网页都是再向网页所在的服务器发送一个Request,然后服务器接受到Request后返回Response ...
- The first day of Crawler learning
使用BeautifulSoup解析网页 Soup = BeautifulSoup(urlopen(html),'lxml') Soup为汤,html为食材,lxml为菜谱 from bs4 impor ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- JAVA内存dump
# 注意点: # 项目运行的用户 # 使用的jdk版本下的jstack去查看 /opt/jdk1..0_191/bin/jmap -dump:format=b,file=/webser/www/`da ...
- C++ 求向量的交集、并集、差集
#include<iostream> #include<stdio.h> #include<list> #include<algorithm> //se ...
- 设计模式 - 工厂模式(factory pattern) 具体解释
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/u012515223/article/details/27081511 工厂模式(factory pa ...
- bzoj 4386: [POI2015]Wycieczki
bzoj 4386: [POI2015]Wycieczki 这题什么素质,爆long long就算了,连int128都爆……最后还是用long double卡过的……而且可能是我本身自带大常数吧,T了 ...
- android学习——android 常见的错误 和 解决方法
1. Application does not specify an API level requirement! 解决方法:AndroidManifest.xml中 加入: <uses-sdk ...
- UTF-8与UTF-8 BOM
在我们通常使用的windows系统中,我发现了一个有趣的现象.我新建一个空的文本文档,点击文件-另存为-编码选择UTF-8,然后保存.此时这个文件明明是空的,却占了3字节大小.原因在于:此时保存的编码 ...
- js获取dom节点
var s= document.getElementById("test");del_ff(s); //清理空格var chils= s.childNodes; //得到s的全部子 ...
- torch.optim优化算法理解之optim.Adam()
torch.optim是一个实现了多种优化算法的包,大多数通用的方法都已支持,提供了丰富的接口调用,未来更多精炼的优化算法也将整合进来. 为了使用torch.optim,需先构造一个优化器对象Opti ...
- SuperSocket通过 SessionID 获取 Session
前面提到过,如果你获取了连接的 Session 实例,你就可以通过 "Send()" 方法向客户端发送数据.但是在某些情况下,你无法直接获取 Session 实例. SuperSo ...
- mysql多表连接和子查询
文章实例的数据表,来自上一篇博客<mysql简单查询>:http://blog.csdn.net/zuiwuyuan/article/details/39349611 MYSQL的多表连接 ...