selenium模块 phantomJs 谷歌无可视界面

参考微博:

什么是selenium

　　一款基于浏览器自动化的模块

什么是浏览器自动化

　　通过脚本程序或者python代码,这组程序或者代码表示一些行为动作,selenium可以让这些行为动作映射到浏览器中,根据设定好的行为动作完成自动化的操作

和爬虫的关联

　　模拟登陆

　　获取动态数据

#演示程序

from selenium import webdriver

from time import sleep

# 后面是你的浏览器驱动位置，记得前面加r'','r'是防止字符转义的

driver = webdriver.Chrome(r'./chromedriver.exe')

# 用get打开百度页面

driver.get("http://www.baidu.com")

# 查找页面的“设置”选项，并进行点击

driver.find_elements_by_link_text('设置')[0].click()

sleep(2)

# # 打开设置后找到“搜索设置”选项，设置为每页显示50条

driver.find_elements_by_link_text('搜索设置')[0].click()

sleep(2)

# 选中每页显示50条

m = driver.find_element_by_id('nr')

sleep(2)

m.find_element_by_xpath('//*[@id="nr"]/option[3]').click()

m.find_element_by_xpath('.//option[3]').click()

sleep(2)

# 点击保存设置

driver.find_elements_by_class_name("prefpanelgo")[0].click()

sleep(2)

# 处理弹出的警告页面   确定accept() 和 取消dismiss()

driver.switch_to_alert().accept()

sleep(2)

# 找到百度的输入框，并输入 美女

driver.find_element_by_id('kw').send_keys('美女')

sleep(2)

# 点击搜索按钮

driver.find_element_by_id('su').click()

sleep(2)

# 在打开的页面中找到“Selenium - 开源中国社区”，并打开这个页面

driver.find_elements_by_link_text('美女_百度图片')[0].click()

sleep(3)

# 关闭浏览器

driver.quit()

selenium如何获取动态加载数据

　　环境的安装:pip install selenium

　　基本使用流程

　　　　1.from selenium import webdriver

　　　　2.结合着某一款浏览器驱动程序实例化一个浏览器对象 bro=webdriver.Chrome(executable_path='./chromedriver.exe')

　　　　3.下载浏览器驱动:http://chromedriver.storage.googleapis.com/index.html 将下载的驱动放在对应爬虫项目的文件夹里面

　　　　　　3.1查看驱动和浏览器版本的映射系:http://blog.csdn.net/huilan_same/article/details/51896672

　　　　4.通过get发起请求: bro.get(url='http://125.35.6.84:81/xk/')

　　　　5.通过xpath或者bs4 获取当前页面的源码数据

page_text=bro.page_source

soup=BeautifulSoup(page_text,'lxml')

dl_list=soup.select('#gzlist > li > dl')

for dl in dl_list:

    print(dl.string)

　　　　编写自动化操作代码

#low版爬取药监局的数据
from bs4 import BeautifulSoup

#引入webdriver

from selenium import webdriver

#实例化浏览器对象,参数executable_path

bro=webdriver.Chrome(executable_path='./chromedriver.exe')

#发起一个请求

bro.get(url='http://125.35.6.84:81/xk/')

#获取当前浏览器页面的源码数据

page_text=bro.page_source

soup=BeautifulSoup(page_text,'lxml')

dl_list=soup.select('#gzlist > li > dl')

for dl in dl_list:

    print(dl.string)

selenium的详细用法

　　1.实例化浏览器(参数为浏览器的驱动) bro = webdriver.Chrome(executable_path='./chromedriver.exe')

　　2.通过get发起请求 bro.get('https://www.taobao.com/')

　　3.通过find系列定位标签 search_input = bro.find_element_by_id('q')

　　　　find_element_by_id

　　　　find_element_by_xpath

　　4.执行js代码execute_script,滚动刷新 bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

　　5.点击按钮事件 btn.click()

　　6.退出浏览器 bro.quit()

from selenium import webdriver

from time import sleep

#实例化一个浏览器对象

bro = webdriver.Chrome(executable_path='./chromedriver.exe')

bro.get('https://www.taobao.com/')

#在淘宝首页搜索框中录入一个商品名称

search_input = bro.find_element_by_id('q') #find系列的函数是用作于定位标签

#向定位到的标签中录入一个商品名称

search_input.send_keys('华为')

sleep(2)

#如何执行js代码,滚动刷新

bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

sleep(2)

bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

sleep(2)

btn = bro.find_element_by_xpath('//*[@id="J_TSearchForm"]/div[1]/button')

btn.click()

sleep(2)

bro.quit()

执行动作链 from selenium.webdriver import ActionChains

1.实例化一个动作连对象,且将浏览器对象作为参数传递到该对象的构造方法中 action = ActionChains(bro)

2.action.click_and_hold, action.click_and_hold(div_tag)

3.action.move_by_offset(x,y).perform() action.move_by_offset(17,0).perform()

4.action.release action.release()

from selenium import webdriver

from selenium.webdriver import ActionChains

from time import sleep

#实例化浏览器

bro = webdriver.Chrome(executable_path='./chromedriver.exe')

#发起请求

bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')

bro.switch_to.frame('iframeResult')#如果定位的标签存在于iframe标签之下,则必须执行该操作后,在进行标签定位

div_tag = bro.find_element_by_id('draggable')

#如何使用动作连

#实例化一个动作连对象,且将浏览器对象作为参数传递到该对象的构造方法中

action = ActionChains(bro)

#

action.click_and_hold(div_tag)

for i in range(5):

    #偏移的大小

    action.move_by_offset(17,0).perform()

    sleep(0.5)

动作释放

action.release()

#退出浏览器

bro.quit()

模拟登陆qq空间

1.bro.switch_to.frame('iframe_id') bro.switch_to.frame('login_frame')

2.send_keys('input标签里面添加数据') bro.find_element_by_id('u').send_keys('用户名')

3.page_source 获取页面数据 page_text=bro.page_source

#模拟登陆qq空间

from selenium import webdriver

from lxml import etree

from time import sleep

#实例化浏览器

bro=webdriver.Chrome(executable_path='./chromedriver.exe')

#发起请求

bro.get('https://qzone.qq.com/')

#定位标签,并点击

bro.switch_to.frame('login_frame')

bro.find_element_by_id('switcher_plogin').click()

#定位标签,添加数据seed_keys

bro.find_element_by_id('u').send_keys('用户名')

bro.find_element_by_id('p').send_keys('密码')

bro.find_element_by_id('login_button').click()

sleep(2)

#获取页面数据

page_text=bro.page_source

#使用etree获取数据

tree=etree.HTML(page_text)

data= tree.xpath('//*[@id="feed_478881649_311_0_1560636904_0_1"]/div[1]//text()')

print(data)

sleep(3)

#退出浏览器

bro.quit()

selenium的识别与规避　　

　　识别:现在不少大网站有对selenium采取了监测机制,window.navigator.webdriver的值为 undefined。而使用selenium访问则该值为true

　　规避:在启动Chromedrever之前,为Chrome开启实验性功能参数,完整代码如下

1.#Chrome开启实验性功能参数
from selenium.webdriver import ChromeOptions

option = ChromeOptions()

option.add_experimental_option('excludeSwitches', ['enable-automation'])

2.#浏览器实例化后,需要添加参数options

bro=webdriver.Chrome(executable_path='./chromedriver.exe',options=option)

谷歌无头浏览器操作:代码如下

from selenium.webdriver.chrome.options import Options

# 创建一个参数对象，用来控制chrome以无界面模式打开

chrome_options = Options()

chrome_options.add_argument('--headless')

chrome_options.add_argument('--disable-gpu')

#添加参数chrome_options

bro=webdriver.Chrome(executable_path='./chromedriver.exe',chrome_options=chrome_options)

3.截屏操作: bro.save_screenshot('1.png')

#如何设置浏览器无可视化界面

from selenium.webdriver.chrome.options import Options

# 创建一个参数对象，用来控制chrome以无界面模式打开

chrome_options = Options()

chrome_options.add_argument('--headless')

chrome_options.add_argument('--disable-gpu')

url = 'https://bj.meituan.com/'

#添加参数chrome_options

bro=webdriver.Chrome(executable_path='./chromedriver.exe',chrome_options=chrome_options)


bro.get(url)

sleep(2)

bro.get(url)

sleep(2)
#截屏操作

bro.save_screenshot('1.png')

print(bro.page_source)

phantomJs:就是一款无可视化界面的浏览器,现在也无人维护,一般都使用谷歌无头浏览器

1.实例化phantomJs

from selenium import webdriver

# phantomjs路径

path = r'PhantomJS驱动路径'

browser = webdriver.PhantomJS(path)

2.其他操作和谷歌浏览器都一样

3.截屏操作 browser.save_screenshot(r'phantomjs\show.png')

from selenium import webdriver

import time

# phantomjs路径

path = r'PhantomJS驱动路径'

browser = webdriver.PhantomJS(path)

# 打开百度

url = 'http://www.baidu.com/'

browser.get(url)

time.sleep(3)

browser.save_screenshot(r'phantomjs\baidu.png')

# 查找input输入框

my_input = browser.find_element_by_id('kw')

# 往框里面写文字

my_input.send_keys('美女')

time.sleep(3)

#截屏

browser.save_screenshot(r'phantomjs\meinv.png')

# 查找搜索按钮

button = browser.find_elements_by_class_name('s_btn')[0]

button.click()

time.sleep(3)

browser.save_screenshot(r'phantomjs\show.png')

time.sleep(3)

browser.quit()

总结:

- selenium:基于浏览器自动化的模块.
-浏览器自动化操作:通过脚本程序或者python代码,这组程序或者代码表示一些行为动作,selenium可以让这些行为动作映射到浏览器中,根据设定好的行为动作完成自动化的操作.
- phantomJs:一款无头浏览器,由于现在已经停止维护,所以使用谷歌无头浏览器代替
- selenium和爬虫之间的关联:
    - 便捷的获取动态加载的数据
    - 实现模拟登录
- 缺点:
    - 爬取数据的效率低
    - 环境部署繁琐
        - 部署selenium环境
　　　　　　- pip install selenium
        - 部署浏览器的环境
　　　　　　- 下载安装对应的浏览器
- 编码流程:
    - 导包
　　　　-  from selenium import webdriver 

    - 创建一个浏览器对象,且在创建的过程中需要使用浏览器的驱动程序
　　　　-  bro=webdriver.Chrome(executable_path='./chromedriver.exe') 

    - 使用get方法进行请求发送
　　　　-  bro.get('https://qzone.qq.com/') 

    - 指定其他的行为动作对应的代码

    - 关闭浏览器
　　　　-  bro.quit() 

- 行为动作:
    - 标签定位:find系列的函数
　　　　-  bro.switch_to.frame('iframeResult')#如果定位的标签存在于iframe标签之下,则必须执行该操作后,在进行标签定位 
    - 节点交互:
        - 点击:click()
        - send_keys('xxx')
    - 执行js:
        - excute_script('jsCode')
    - 动作链:
        - 导包:from selemium.webdriver import ActionChians
        - 创建一个动作链对象:action = ActionChians(bro)
        - 调用动作链对象中封装的属性和方法:
            - action.click_and_hold(ele)
            - move_by_offset(x,y)
            - perform():立即执行动作链
            - release()
    - page_source:返回当前浏览器显示页面的全部页面源码数据
- 无头的设置 phantomJs  谷歌无头浏览器

- selenium规避监测

- 截图:save_screenshot('./1.png')

selenium模块 phantomJs 谷歌无可视界面的更多相关文章

第三百三十七节，web爬虫讲解2—PhantomJS虚拟浏览器+selenium模块操作PhantomJS
第三百三十七节,web爬虫讲解2—PhantomJS虚拟浏览器+selenium模块操作PhantomJS PhantomJS虚拟浏览器 phantomjs 是一个基于js的webkit内核无头浏览器 ...
十六 web爬虫讲解2—PhantomJS虚拟浏览器+selenium模块操作PhantomJS
PhantomJS虚拟浏览器 phantomjs 是一个基于js的webkit内核无头浏览器也就是没有显示界面的浏览器,利用这个软件,可以获取到网址js加载的任何信息,也就是可以获取浏览器异步加载的 ...
使用phantomjs进行无界面UI自动化测试
PhantomJS(http://phantomjs.org/) 是一个基于WebKit的服务器端JavaScript API.它全面支持web而不需浏览器支持,其快速.原生支持各种Web标准:DOM ...
XVFB实现selenium在linux上无界面运行安装篇
selenium在linux上无界面运行,其实是非常简单的.具体的方法有使用HtmlUnitDriver或者PhantomJSDriver,有时间我会写写关于这两个东东的文章,其实基本和ChromeD ...
Robot Framework使用Phantomjs进行无界面UI自动化测试
Robot Framework 是一款关键字驱动的验收自动化测试框架,现在在国内使用的越来越广泛了.一种通用的Web UI自动化测试解决方案是Robot Framework+Selenium2Libr ...
【tips】自动化测试工具 - selenium和phantomJS
### 目录清单 selenium和phantomjs概述 selenium常用API 案例操作:模拟登陆csdn 1. selenium和phantomJS是什么东西 selenium是一套web网 ...
爬虫之 selenium模块
selenium模块阅读目录一介绍二安装三基本使用四选择器五等待元素被加载六元素交互操作七其他八项目练习一介绍 selenium最初是一个自动化测试工具,而爬 ...
selenium模块基础用法详解
目录 selenium模块官方文档介绍安装有界面浏览器无界浏览器 selenium+谷歌浏览器headless模式基本使用选择器基本用法 xpath 获取标签属性等待元素被加载隐式 ...
爬虫（五）—— selenium模块启动浏览器自动化测试
目录 selenium模块一.selenium介绍二.环境搭建三.使用selenium模块 1.使用chrome并设置为无GUI模式 2.使用chrome有GUI模式 3.查找元素 4.获取标签 ...

随机推荐

SpringCloud个人笔记-02-Feign初体验
项目结构 sb_cloud_product <?xml version="1.0" encoding="UTF-8"?> <project x ...
leedcode_13 罗马数字转整数
罗马数字包含以下七种字符: I, V, X, L,C,D 和 M. 字符数值I 1V 5X 10L 50C 100D 500M 1000例如, 罗马数字 2 写做 II ,即为两个并列的 1 .12 ...
面试官：什么是MySQL 事务与 MVCC 原理？
作者:小林coding 图解计算机基础网站:https://xiaolincoding.com/ 大家好,我是小林. 之前写过一篇 MySQL 的 MVCC 的工作原理,最近有读者在网站上学习的时候, ...
H5进阶篇--实现微信摇一摇功能
在HTML5中,DeviceOrientation特性所提供的DeviceMotion事件封装了设备的运动传感器时间,通过改时间可以获取设备的运动状态.加速度等数据(另还有deviceOrientat ...
【二次元的CSS】—— 用 DIV + CSS3 画大白（详解步骤）
原本自己也想画大白,正巧看到一位同学(github:https://github.com/shiyiwang)也用相同的方法画了. 且细节相当到位.所以我就fork了一下,在此我也分享一下.同时,我也 ...
WebGL2系列之顶点数组对象
使用了顶点缓冲技术后,绘制效率有了较大的提升.但是还有一点不尽如人意,那就是顶点的位置坐标.法向量.纹理坐标等不同方面的数据每次使用时需要单独指定,重复了一些不必要的工作.WebGL2提供了一种专门用 ...
AS之去掉顶部标题栏
在该目录下,将原本<style name的这行代码改为: <style name="Theme.Tongxunlu" parent="Theme.Materi ...
source /etc/profile 不起作用？
给Linux配置了环境变量,source /etc/profile 完成之后只在当前用户下起作用,切换用户后设置的环境变量竟然没有生效!重启后虽然生效了,但是想知道怎么回事. 找到了如下解答: 假设你 ...
HTML表格CSS美化
效果展示 style.css html{ width: 100%; height: 100%; overflow: hidden;}body{ width: 100%; height: 100%; f ...
界面跳转+信息传递+AS中如何将ADV转移到其他盘中
今日所学:界面跳转信息传递遇到的问题: 昨天遇到不能新建java类,在网上百度了很多,大多原因是没有新建java类的模板,但是我有,换了一个新的新建的方式后,发现虽然能建立了,但在测试时还是不能页 ...

selenium模块 phantomJs 谷歌无可视界面

selenium模块 phantomJs 谷歌无可视界面的更多相关文章

随机推荐

热门专题