The sixth day of Crawler learning
爬取我爱竞赛网的大量数据
首先获取每一种比赛信息的分类链接

def get_type_url(url):
web_data = requests.get(web_url)
soup = BeautifulSoup(web_data.text, 'lxml')
types = soup.select("#mn_P1_menu li a")
for type in types:
print(type.get_text())
get_num(type.get("href"))
然后获取每一个分类连接中的总页数

def get_num(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
num = soup.select(".pg span")
# 部分页面没有分页只有一页,需要分类一下
if(num!=[]):
i = int(num[0].get_text().split(" ")[2])
for w in range(1, i):
print("第"+str(w)+"页")
urls = url + "index.php?page={}".format(str(w))
get_message_url(urls)
else:
get_message_url(url)
最后获取每一页中各个比赛的信息

def get_message_url(url):
web_data = requests.get(url)
soup = BeautifulSoup(web_data.text, 'lxml')
titles = soup.select(".xld .xs2_tit a")
views = soup.select("span.chakan")
post_times = soup.select("div.list_info")
for title, view, post_time in zip(titles, views, post_times):
data = {
"标题": title.get_text(),
"浏览量": view.get_text().strip(),
"发布时间": post_time.get_text().strip().split(" ")[0],
"链接": title.get("href")
}
print(data)
The sixth day of Crawler learning的更多相关文章
- The fifth day of Crawler learning
使用mongoDB 下载地址:https://www.mongodb.com/dr/fastdl.mongodb.org/win32/mongodb-win32-x86_64-2008plus-ssl ...
- The fourth day of Crawler learning
爬取58同城 from bs4 import BeautifulSoupimport requestsurl = "https://qd.58.com/diannao/35200617992 ...
- The third day of Crawler learning
连续爬取多页数据 分析每一页url的关联找出联系 例如虎扑 第一页:https://voice.hupu.com/nba/1 第二页:https://voice.hupu.com/nba/2 第三页: ...
- The second day of Crawler learning
用BeatuifulSoup和Requests爬取猫途鹰网 服务器与本地的交换机制 我们每次浏览网页都是再向网页所在的服务器发送一个Request,然后服务器接受到Request后返回Response ...
- The first day of Crawler learning
使用BeautifulSoup解析网页 Soup = BeautifulSoup(urlopen(html),'lxml') Soup为汤,html为食材,lxml为菜谱 from bs4 impor ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- 网络编程--多线程 , socketserver
内容补充 python2与python3的区别? """ python3对unicode字符的原生支持 Python2中使用ASCII码作为默认编码方式导致string有 ...
- Java实现接口用来弥补Java单继承的缺陷
package com.test3;/** * @author qingfeng * 功能:继承类 VS 实现接口 :两者之间的关系(实现接口用来弥补Java单继承的缺陷) */public clas ...
- Flask学习之七 单元测试
英文博客地址:http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-vii-unit-testing 中文翻译地址:http ...
- jQuery 链
通过 jQuery,可以把动作/方法链接在一起. Chaining 允许我们在一条语句中运行多个 jQuery 方法(在相同的元素上). jQuery 方法链接 直到现在,我们都是一次写一条 jQue ...
- day5-python之面向过程编程
一.面向过程编程 #1.首先强调:面向过程编程绝对不是用函数编程这么简单,面向过程是一种编程思路.思想,而编程思路是不依赖于具体的语言或语法的.言外之意是即使我们不依赖于函数,也可以基于面向过程的思想 ...
- @hdu - 6598@ Harmonious Army
目录 @description@ @solution@ @accepted code@ @details@ @description@ n 个士兵,每个士兵可以选择加入 A 组或 B 组. 有 m 个 ...
- hdu 4476 Cut the rope (2-pointer && simulation)
Problem - 4476 题意是,给出若干绳子,对同一根绳子只能切割一次,求出最多能获得多少长度相同的绳子. 代码中,s是最大切割长度,而当前切割长度为t/2. 代码如下: #include &l ...
- kwargs.pop是什么意思
pop()函数一般用来删除list列表的末尾元素,同样,kwargs.pop()用来删除关键字参数中的末尾元素,比如:kwargs = {'Michael': 95, 'Bob': 75, 'Trac ...
- [Err] 1062 - Duplicate entry '0' for key 'PRIMARY'
问题描述: sql语句执行的时候,插入语句无法正确执行 问题原因: 主键 重复 出现 0 解决方案: 将主键设置为自增 然而,设置自增后还是可能会出现下面的问题 #1062 – Duplicate e ...
- Python--day47--mysql索引种类