Use BeautifulSoup and Python to scrap a website

Lib:

  • urllib
  • Parsing HTML Data

Web scraping script

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)

Run this script successfully

Following is the whole result of this scraping.

I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein

.

Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章

  1. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

    Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...

  2. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)

    Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...

  3. Web Scraping using Python Scrapy_BS4 - Software

    Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...

  4. Web Scraping using Python Scrapy_BS4 - Introduction

    What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...

  5. Web Scraping with Python读书笔记及思考

    Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...

  6. <Web Scraping with Python>:Chapter 1 & 2

    <Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...

  7. 《Web Scraping With Python》Chapter 2的学习笔记

    You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...

  8. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...

  9. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...

随机推荐

  1. linux下操作memcache的操作命令

    1.连接memcache linux下一般使用telnet连接memcache服务 [root@localhost ~]# telnet 127.0.0.1 11266 Trying 127.0.0. ...

  2. show and hide. xp扩展名

    reg add "HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\Advanced" /v HideFileExt ...

  3. Java内存溢出OutOfMemoryError的产生与排查

    在java的虚拟机异常中,有两个异常是大家比较关心的,一个是StackOverflowError,另一个是OutOfMemoryError.今天我们就来看看OutOfMemoryError是怎么产生的 ...

  4. Refresh Java

    当你的知识来源于实践, 你可能会忽略很多细节. 当你的知识来源于阅读, 你可能会很快的忘掉. 那么, 不如在空闲之余, 浏览一遍, 把觉得有必要的记录下来, 也便于以后温故而知新, 何乐而不为呢? 于 ...

  5. .NET Core请求控制器Action方法正确匹配,但为何404?

    前言 有些时候我们会发现方法名称都正确匹配,但就是找不到对应请求接口,所以本文我们来深入了解下何时会出现接口请求404的情况. 匹配控制器Action方法(404) 首先我们创建一个web api应用 ...

  6. Java wait 和 sleep 的区别

    一.区别 sleep 来自 Thread 类,和 wait 来自 Object 类 sleep 方法没有释放锁,而wait方法释放了锁,使得其他线程可以使用同步控制块或方法 wait,notify和 ...

  7. 入门大数据---Elasticsearch是什么?

    Elasticsearch是谁不重要,重要的是咱们都知道百度,谷歌这样的搜索巨头吧.它们的核心技术都利用了Elasticsearch,所以我们有必要对Elasticsearch了解下! 1.Elast ...

  8. vue全家桶(4.2)

    5.2.使用vuex重构上面代码 Vuex是什么?官方定义:Vuex 是一个专为 Vue.js 应用程序开发的状态管理模式.它采用集中式存储管理应用的所有组件的状态,并以相应的规则保证状态以一种可预测 ...

  9. Spring AOP学习笔记05:AOP失效的罪因

    前面的文章中我们介绍了Spring AOP的简单使用,并从源码的角度学习了其底层的实现原理,有了这些基础之后,本文来讨论一下Spring AOP失效的问题,这个问题可能我们在平时工作中或多或少也会碰到 ...

  10. Java 内存回收机制

    当执行构造方法生成一个对象时,需要占用各种系统资源.当生成的对象不再使用时,就需要返回给操作系统,以免资源的泄露.在各种系统资源中,最常使用的就是内存.Java运行时系统通过垃圾收集周期性地释放无用对 ...