Use BeautifulSoup and Python to scrap a website

Lib:

  • urllib
  • Parsing HTML Data

Web scraping script

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)

Run this script successfully

Following is the whole result of this scraping.

I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein

.

Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章

  1. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

    Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...

  2. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)

    Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...

  3. Web Scraping using Python Scrapy_BS4 - Software

    Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...

  4. Web Scraping using Python Scrapy_BS4 - Introduction

    What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...

  5. Web Scraping with Python读书笔记及思考

    Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...

  6. <Web Scraping with Python>:Chapter 1 & 2

    <Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...

  7. 《Web Scraping With Python》Chapter 2的学习笔记

    You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...

  8. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...

  9. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...

随机推荐

  1. vulstack红队评估(二)

    一.环境搭建: 1.根据作者公开的靶机信息整理: 靶场统一登录密码:1qaz@WSX     2.网络环境配置: ①Win2008双网卡模拟内外网: 外网:192.168.1.80,桥接模式与物理机相 ...

  2. Quaternion:通过API对Quaternion(四元数)类中的方法属性初步学习总结(二)

    1.RotateTowards方法 RotateTowards(From.rotation,To.rotation,fspeed) 个人理解:使From的rotation以floatspeed为速度, ...

  3. VulnHub CengBox2靶机渗透

    ​本文首发于微信公众号:VulnHub CengBox2靶机渗透,未经授权,禁止转载. 难度评级:☆☆☆☆官网地址:https://download.vulnhub.com/cengbox/CengB ...

  4. mysql语句基本练习

    select ename,job from emp where job in ('MANAGER','ANALYET','SALESMAN') 1.查询出工作岗位为MANAGER.ANALYST.SA ...

  5. activiti学习笔记二

    上一篇文章大概讲了下什么是流程引擎,为什么我们要用流程引擎,他的基本原理是啥,以及怎么进行基本的使用,这篇文章我们再讲下其他的一些使用. 删除流程部署 package activiti02; impo ...

  6. 入门大数据---HiveCLI和Beeline命令行的基本使用

    一.Hive CLI 1.1 Help 使用 hive -H 或者 hive --help 命令可以查看所有命令的帮助,显示如下: usage: hive -d,--define <key=va ...

  7. LQR要点

    新的“A”变成着了这样:Ac = A - KB 基于对象:状态空间形式的系统 能量函数J:也称之为目标函数 Q:半正定矩阵,对角阵(允许对角元素出现0) R:正定矩阵,QR其实就是权重 下面这段话可能 ...

  8. 【floyd+矩阵乘法】POJ 3613 Cow Relays

    Description For their physical fitness program, N (2 ≤ N ≤ 1,000,000) cows have decided to run a rel ...

  9. JQuery 优缺点略谈

    1.jQuery实现脚本与页面的分离 ; 2.最少的代码做最多的事情; 3.性能; 在大型JavaScript框架中,jQuery对性能的理解最好.尽管不同版本拥有众多新功能,其最精简版本只有18KB ...

  10. 洛谷 P1692 【部落卫队】

    啊这道题其实暴力就行了,算是一道搜索入门题吧. 搜索变量就应该是当前到哪一位了,然后进行枚举,当前的一位加或者不加,然后知道搜完为止. 判断当前一位可不可以加的时候本来想用vector的,但是没调出来 ...