Use BeautifulSoup and Python to scrap a website

Lib:

  • urllib
  • Parsing HTML Data

Web scraping script

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)

Run this script successfully

Following is the whole result of this scraping.

I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein

.

Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章

  1. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

    Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...

  2. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)

    Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...

  3. Web Scraping using Python Scrapy_BS4 - Software

    Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...

  4. Web Scraping using Python Scrapy_BS4 - Introduction

    What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...

  5. Web Scraping with Python读书笔记及思考

    Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...

  6. <Web Scraping with Python>:Chapter 1 & 2

    <Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...

  7. 《Web Scraping With Python》Chapter 2的学习笔记

    You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...

  8. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...

  9. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...

随机推荐

  1. 使用Jmeter如何测试下载接口

    性能测试过程中,有时候需要对下载类的功能做压测,有些同学没有这方面的测试经验,比较迷茫,本文简单介绍下如何测试下载类的请求1.首先使用fiddler抓包,知道是一个http类型的请求,有一个post请 ...

  2. Vuex怎么用(1)

    1. vuex是什么 github站点: https://github.com/vuejs/vuex在线文档: https://vuex.vuejs.org/zh-cn/简单来说: 对应用中组件的状态 ...

  3. 新来的"大神"用策略模式把if else给"优化"了,技术总监说:能不能想好了再改?

    本文来自作者投稿,原作者:上帝爱吃苹果 目前在魔都,贝壳找房是我的雇主,平时关注一些 java 领域相关的技术,希望你们能在这篇文章中找到些有用的东西.个人水平有限,如果文章有错误还请指出,在留言区一 ...

  4. 微信小程序上传Word文档、PDF、图片等文件

    <view class="main" style="border:none"> <view class="title"&g ...

  5. 使用IDEA 发布项目搭配远程仓库 Gitee

    本次讲解的是idea 发布到gitee上 一样的操作流程 没有基础的请先去学习 附上我的 gitee 地址 有资源会发布到gitee 俗话说关注走一走 活到999 https://gitee.com/ ...

  6. js基础练习题(4)

    9.对象 阅读代码,回答问题 function User(name) { var name1 = name; this.name2 = name; function getName1() { retu ...

  7. BZOJ 3573米特运输

    Description 米特是D星球上一种非常神秘的物质,蕴含着巨大的能量.在以米特为主要能源的D星上,这种米特能源的运输和储存一直是一个大问题.D星上有N个城市,我们将其顺序编号为1到N,1号城市为 ...

  8. 问题: No module named _gexf 解决方法

    最近在参与一个社交网络数据可视化的项目,要在后端将社交网络信息组建成网络传至前端以使其可视化.前端使用Echart显示网络,后端要通过Python的Gexf库组建网络. Gexf库安装过程为: pip ...

  9. 如何白嫖微软Azure12个月及避坑指南

    Azure是微软提供的一个云服务平台.是全球除了AWS外最大的云服务提供商.Azure是微软除了windows之外另外一个王牌,微软错过了移动端,还好抓住了云服务.这里的Azure是Azure国际不是 ...

  10. 洛谷 P4910 帕秋莉的手环

    题意 多组数据,给出一个环,要求不能有连续的\(1\),求出满足条件的方案数 \(1\le T \le 10, 1\le n \le 10^{18}\) 思路 20pts 暴力枚举(不会写 60pts ...