异常处理

处理程序的报错

语法

捕捉万能异常：

try:

    print(a)

except Exception as e:

    print("你的代码有问题")

print("程序走下面的代码")

字符串内置方法

索引取值
切片
长度（len）
成员运算
移除两边空白字符
str切分
循环
startswith/endswith
join()
index()
count()

selenium

是一个自动化测试工具，可以通过驱动浏览器，自动点击完成功能

安装驱动

http://npm.taobao.org/mirrors/chromedriver/2.38/

安装请求库

pip3 install selenium

首先体验一下selenium的效果，使用驱动，自动打开浏览器进入百度，代码：

# coding=utf-8

from selenium import webdriver  # 用来驱动浏览器的

from selenium.webdriver import ActionChains  # 破解滑动验证码的时候用的 可以拖动图片

from selenium.webdriver.common.by import By  # 按照什么方式查找，By.ID,By.CSS_SELECTOR

from selenium.webdriver.common.keys import Keys  # 键盘按键操作

from selenium.webdriver.support import expected_conditions as EC  # 和下面WebDriverWait一起用的

from selenium.webdriver.support.wait import WebDriverWait  # 等待页面加载某些元素

import time

drive = webdriver.Chrome(r"C:\Users\Administrator\Desktop\chromedriver.exe")

try:

    #等待浏览器加载10s

    drive.implicitly_wait(10)

    #打开浏览器访问百度页面

    drive.get("https://www.baidu.com/")

    time.sleep(1)

    #找到搜索框

    search_button = drive.find_element_by_id("kw")

    search_button.send_keys("驱动")

    #找到 百度一下 按钮

    baiduyixia_button = drive.find_element_by_id("su")

    baiduyixia_button.click()

    time.sleep(10)

finally:

    drive.close()

爬取京东商品并把数据保存下来

# coding=utf-8

from selenium import webdriver  # 用来驱动浏览器的

from selenium.webdriver import ActionChains  # 破解滑动验证码的时候用的 可以拖动图片

from selenium.webdriver.common.by import By  # 按照什么方式查找，By.ID,By.CSS_SELECTOR

from selenium.webdriver.common.keys import Keys  # 键盘按键操作

from selenium.webdriver.support import expected_conditions as EC  # 和下面WebDriverWait一起用的

from selenium.webdriver.support.wait import WebDriverWait  # 等待页面加载某些元素

import time

drive = webdriver.Chrome(r"C:\Users\Administrator\Desktop\chromedriver.exe")

#

try:

    #等待浏览器加载

    drive.implicitly_wait(10)

    #搜索京东

    drive.get("https://www.jd.com/")

    search_button = drive.find_element_by_id("key")

    search_button.send_keys("全新国行iPhone8")

    #找到 搜索 按钮 或者回车搜索,没有找到搜索按钮， 直接回车

    search_button.send_keys(Keys.ENTER)

    #通过id查找商品的父标签

    goods_div = drive.find_element_by_id("J_goodsList")

    #通过属性名找每个商品的具体信息

    goods_list = goods_div.find_elements_by_class_name("gl-item")

    print(type(goods_list))

    #

    # #通过循环出 每个商品的详情

    for goods in goods_list:

        # 通过css_selector获取商品价格

        goods_price = goods.find_element_by_css_selector('.p-price i').text

        # 通过css_selector获取商品名称

        goods_name = goods.find_element_by_css_selector('.p-name em').text

        # 通过css_selector获取商品评价人数

        goods_commit = goods.find_element_by_css_selector('.p-commit a').text

        # 通过css_selector获取商品详情链接

        goods_url = goods.find_element_by_css_selector('.p-commit a').get_attribute('href')

        data = f'''

        商品名称：{goods_name}

        商品价格：{goods_price}

        评价人数：{goods_commit}

        详情链接：{goods_url}

        '''

        print(data)

        with open("京东手机信息.txt","a",encoding="utf8") as f:

            f.write(data)

    time.sleep(10)

finally:

自动登录百度账号

# coding=utf-8

from selenium import webdriver  # 用来驱动浏览器的

from selenium.webdriver import ActionChains  # 破解滑动验证码的时候用的 可以拖动图片

from selenium.webdriver.common.by import By  # 按照什么方式查找，By.ID,By.CSS_SELECTOR

from selenium.webdriver.common.keys import Keys  # 键盘按键操作

from selenium.webdriver.support import expected_conditions as EC  # 和下面WebDriverWait一起用的

from selenium.webdriver.support.wait import WebDriverWait  # 等待页面加载某些元素

import time

drive = webdriver.Chrome(r"C:\Users\Administrator\Desktop\chromedriver.exe")

#

try:

    drive.implicitly_wait(10)

    drive.get("https://www.baidu.com/")

    login_button = drive.find_element_by_link_text("登录")

    login_button.click()

    login_tag = drive.find_element_by_id("TANGRAM__PSP_10__footerULoginBtn")

    login_tag.click()

    login_tag_user = drive.find_element_by_id("TANGRAM__PSP_10__userName")

    login_tag_user.send_keys("15221024542")

    login_tag_pass = drive.find_element_by_id("TANGRAM__PSP_10__password")

    login_tag_pass.send_keys("123456789qq")

    login_commit = drive.find_element_by_id("TANGRAM__PSP_10__submit")

    login_commit.click()

    time.sleep(10)

finally:

    drive.close()

Python-异常处理使用selenium库自动爬取数据的更多相关文章

Python爬虫初探 - selenium+beautifulsoup4+chromedriver爬取需要登录的网页信息
目标之前的自动答复机器人需要从一个内部网页上获取的消息用于回复一些问题,但是没有对应的查询api,于是想到了用脚本模拟浏览器访问网站爬取内容返回给用户.详细介绍了第一次探索python爬虫的坑. 准 ...
爬虫-----selenium模块自动爬取网页资源
selenium介绍与使用 1 selenium介绍什么是selenium?selenium是Python的一个第三方库,对外提供的接口可以操作浏览器,然后让浏览器完成自动化的操作. sel ...
通过python的urllib.request库来爬取一只猫
我们实验的网站很简单,就是一个关于猫的图片的网站:http://placekitten.com 代码如下: import urllib.request respond = urllib.request ...
python模拟登陆知乎并爬取数据
一些废话看了一眼上一篇日志的时间已然是5个月前的事情了不禁感叹光阴荏苒其实就是我懒几周前心血来潮想到用爬虫爬些东西于是先后先重写了以前写过的求绩点代码爬了草榴贴图,妹子图网,后来想爬婚恋网 ...
python网络爬虫（6）爬取数据静态
爬取静态数据并存储json import requests import chardet from bs4 import BeautifulSoup import json user_agent='M ...
Python爬虫系列-Selenium+Chrome/PhantomJS爬取淘宝美食
1.搜索关键字利用Selenium驱动浏览器搜索关键字,得到查询后的商品列表 2.分析页码并翻页得到商品页码数,模拟翻页,得到后续页面的商品列表 3.分析提取商品内容利用PyQuery分析源码, ...
python通过token登录，并爬取数据实例
from bs4 import BeautifulSoup import requests class Zabbix(object): def __init__(self, headers): sel ...
如何手动写一个Python脚本自动爬取Bilibili小视频
如何手动写一个Python脚本自动爬取Bilibili小视频国庆结束之余,某个不务正业的码农不好好干活,在B站瞎逛着,毕竟国庆嘛,还让不让人休息了诶-- 我身边的很多小伙伴们在朋友圈里面晒着出去游玩 ...
Python使用urllib,urllib3,requests库+beautifulsoup爬取网页
Python使用urllib/urllib3/requests库+beautifulsoup爬取网页 urllib urllib3 requests 笔者在爬取时遇到的问题 1.结果不全 2.'抓取失 ...

随机推荐

关于web前端网站优化
不知道是哪位大牛的文章,转过来嘻嘻. 作者:斯迪链接:https://www.zhihu.com/question/21658448/answer/18903129来源:知乎著作权归作者所有.商业转载 ...
dart 异步
使用异步有两种方法 then 或者 async/await. async/await 方法更易于理解,
转：Linux环境下段错误的产生原因及调试方法小结
源地址:http://www.cnblogs.com/panfeng412/archive/2011/11/06/2237857.html 补充:http://baike.baidu.com/link ...
ArrayList 和linkedList 插入比较
从学Java开始, 就一直大脑记着 arrayList 底层是数组 ,查询快, 插入慢, 有移动的动作.linkedList 底层链表, 插入快查询慢,今天写了例子跑了跑, 果然. public ...
Python操作三大主流数据库✍✍✍
Python操作三大主流数据库 Python 标准数据库接口为 Python DB-API,Python DB-API为开发人员提供了数据库应用编程接口. Python 数据库接口支持非常多的数据库, ...
收藏的链接-Git
git远程删除分支后,本地git branch -a 依然能看到的解决办法. - qq_763034592的博客 - CSDN博客 https://blog.csdn.net/qq_16885135/ ...
js 实现加载百分比效果
效果: html: <!DOCTYPE html> <html> <head> <meta charset="utf-8"> < ...
在Eclipse中修改Jsp页面的新增模板
打开Eclipse的Preferences页面路径: Window à Preferences 搜索"jsp",点击"Templates",选择要修改的Jsp ...
关于Windows10企业版的激活方法
今天打开Excel在使用的时候,突然弹出弹窗,说我的激活即将过期什么的,让我转到设置进行激活. 第一个想到的办法就是更换产品密钥,在网上找了不少产品密钥,密钥有效,但是需要连接企业激活什么的,因为我是 ...
Java-MyBatis-MyBatis3-XML映射文件：insert, update 和 delete
ylbtech-Java-MyBatis-MyBatis3-XML映射文件:insert, update 和 delete 1.返回顶部 1. insert, update 和 delete 数据变更 ...

Python-异常处理 使用selenium库自动爬取数据

异常处理

语法

字符串内置方法

selenium

Python-异常处理 使用selenium库自动爬取数据的更多相关文章

随机推荐

热门专题

Python-异常处理使用selenium库自动爬取数据

Python-异常处理使用selenium库自动爬取数据的更多相关文章