Web自动化之Headless Chrome编码实战

API 概览 && 编码Tips

文档地址

github Chrome DevTools Protocol 协议本身的仓库有问题可以在这里提issue
github debugger-protocol-viewer 协议API文档的仓库
API 文档地址 API展示的地方，这个经常用

常用API

Network 网络请求、Cookie、缓存、证书等相关内容
Page 页面的加载、资源内容、弹层、截图、打印等相关内容
DOM 文档DOM的获取、修改、删除、查询等相关内容
Runtime JavaScript代码的执行，这里面我们可以搞事情~~

编码Tips

我们这里不会直接调用Websocket相关的内容来调用chrome的调试命令，而是用chrome-remote-interface 这个封装的库来做，它是基于Promise风格的
每一个功能块成为一个单独的domain,像Network，Page，DOM等都是不同的domain
几乎每一个个头大的domain都有enable方法，需要先调用这个方法启用之后再使用
各个domain的接口方法参数都是第一个对象或者说一个Map，不用考虑参数的位置了
各个domain的接口返回值也是一个对象，取对应的key就行
参数值和返回值经常是meta信息，经常是各种对象的id信息，而不是具体的对象内容（这里可能需要切一下风格）

编码实例

首先做一个简单的封装，准备API的执行环境，具体可参考前一篇关于工具库的。

const chromeLauncher = require('chrome-launcher');

const chromeRemoteInterface = require('chrome-remote-interface');

const prepareAPI = (config = {}) => {

    const {host = 'localhost', port = 9222, autoSelectChrome = true, headless = true} = config;

    const wrapperEntry = chromeLauncher.launch({

        host,

        port,

        autoSelectChrome,

        additionalFlags: [

            '--disable-gpu',

            headless ? '--headless' : ''

        ]

    }).then(chromeInstance => {

        const remoteInterface = chromeRemoteInterface(config).then(chromeAPI => chromeAPI).catch(err => {

            throw err;

        });

        return Promise.all([chromeInstance, remoteInterface])

    }).catch(err => {

        throw err

    });

    return wrapperEntry

};

打开百度，获取页面性能数据，参考 Navigation Timing W3C规范

const wrapper = require('the-wrapper-module');

const performanceParser = (perforceTiming) => {

    let timingGather = {};

    perforceTiming = perforceTiming || {};

    timingGather.redirect = perforceTiming.redirectEnd - perforceTiming.redirectEnd-perforceTiming.redirectStart;

    timingGather.dns = perforceTiming.domainLookupEnd - perforceTiming.domainLookupStart;

    timingGather.tcp = perforceTiming.connectEnd - perforceTiming.connectStart;

    timingGather.request = perforceTiming.responseStart - perforceTiming.requestStart;

    timingGather.response = perforceTiming.responseEnd - perforceTiming.responseStart;

    timingGather.domReady = perforceTiming.domContentLoadedEventStart - perforceTiming.navigationStart;

    timingGather.load = perforceTiming.loadEventStart - perforceTiming.navigationStart;

    return timingGather;

};

const showPerformanceInfo = (performanceInfo) => {

    performanceInfo = performanceInfo || {};

    console.log(`页面重定向耗时:${performanceInfo.redirect}`);

    console.log(`DNS查找耗时:${performanceInfo.dns}`);

    console.log(`TCP连接耗时:${performanceInfo.tcp}`);

    console.log(`请求发送耗时:${performanceInfo.request}`);

    console.log(`响应接收耗时:${performanceInfo.response}`);

    console.log(`DOMReady耗时:${performanceInfo.domReady}`);

    console.log(`页面加载耗时:${performanceInfo.load}`);

};

wrapper.prepareAPI().then(([chromeInstance, remoteInterface]) => {

    const {Runtime,Page} = remoteInterface;

    Page.loadEventFired(() => {

        Runtime.evaluate({

            expression:'window.performance.timing.toJSON()',

            returnByValue:true  //不加这个参数，拿到的是一个对象的meta信息,还需要getProperties

        }).then((resultObj) => {

            let {result,exceptionDetails} = resultObj;

            if(!exceptionDetails){

                showPerformanceInfo(performanceParser(result.value))

            }else{

                throw exceptionDetails;

            }

        });

    });

    Page.enable().then(() => {

        Page.navigate({

            url:'http://www.baidu.com'

        })

    });

});

打开百度搜索`Web自动化 headless chrome`，并爬取首屏结果链接

const wrapper = require('the-wrapper-module');

//有this的地方写成箭头函数要注意，这里会有问题

const buttonClick = function () {

    this.click();

};

const setInputValue = () => {

    var input = document.getElementById('kw');

    input.value = 'Web自动化 headless chrome';

};

const parseSearchResult = () => {

    let resultList = [];

    const linkBlocks = document.querySelectorAll('div.result.c-container');

    for (let block of Array.from(linkBlocks)) {

        let targetObj = block.querySelector('h3');

        resultList.push({

            title: targetObj.textContent,

            link: targetObj.querySelector('a').getAttribute('href')

        });

    }

    return resultList;

};

wrapper.prepareAPI({

    // headless: false  //加上这行代码可以查看浏览器的变化

}).then(([chromeInstance, remoteInterface]) => {

    const {Runtime, DOM, Page, Network} = remoteInterface;

    let framePointer;

    Promise.all([Page.enable(), Network.enable(), DOM.enable(),Page.setAutoAttachToCreatedPages({autoAttach:true})]).then(() => {

        Page.domContentEventFired(() => {

            console.log('Page.domContentEventFired')

            Runtime.evaluate({

                expression:`window.location.href`,

                returnByValue:true

            }).then(result => {

                console.log(result)

            })

        });

        Page.frameNavigated(() => {

            console.log('Page.frameNavigated')

            Runtime.evaluate({

                expression:`window.location.href`,

                returnByValue:true

            }).then(result => {

                console.log(result)

            })

        })

        Page.loadEventFired(() => {

            console.log('Page.loadEventFired')

            Runtime.evaluate({

                expression:`window.location.href`,

                returnByValue:true

            }).then(result => {

                console.log(result)

            })

            DOM.getDocument().then(({root}) => {

                //百度首页表单

                DOM.querySelector({

                    nodeId: root.nodeId,

                    selector: '#form'

                }).then(({nodeId}) => {

                    Promise.all([

                        //找到 搜索框填入值

                        DOM.querySelector({

                            nodeId: nodeId,

                            selector: '#kw'

                        }).then((inputNode) => {

                            Runtime.evaluate({

                                // 两种写法

                                // expression:'document.getElementById("kw").value = "Web自动化 headless chrome"',

                                expression: `(${setInputValue})()`

                            });

                            //这段代码不起作用 日狗

                            // DOM.setNodeValue({

                            //     nodeId:inputNode.nodeId,

                            //     value:'Web自动化 headless chrome'

                            // });

                            //上面的代码需求要这么写

                            // DOM.setAttributeValue({

                            //     nodeId:inputNode.nodeId,

                            //     name:'value',

                            //     value:'headless chrome'

                            // });

                        })

                        //找到 提交按钮setInputValue

                        , DOM.querySelector({

                            nodeId,

                            selector: '#su'

                        })

                    ]).then(([inputNode, buttonNode]) => {

                        Runtime.evaluate({

                            expression: 'document.getElementById("kw").value',

                        }).then(({result}) => {

                            console.log(result)

                        });

                        return DOM.resolveNode({

                            nodeId: buttonNode.nodeId

                        }).then(({object}) => {

                            const {objectId} = object;

                            return Runtime.callFunctionOn({

                                objectId,

                                functionDeclaration: `${buttonClick}`

                            })

                        });

                    }).then(() => {

                        setTimeout(() => {

                            Runtime.evaluate({

                                expression: `(${parseSearchResult})()`,

                                returnByValue: true

                            }).then(({result}) => {

                                console.log(result.value)

                                //百度的URL有加密，需要再请求一次拿到真实URL

                            })

                        },3e3)

                    });

                })

            });

        });

        Page.navigate({

            url: 'http://www.baidu.com'

        }).then((frameObj) => {

            framePointer = frameObj

        });

    })

});

Web自动化之Headless Chrome编码实战的更多相关文章

Web自动化之Headless Chrome测试框架集成
使用Selenium操作headless chrome 推荐简介 WebDriver是一个W3C标准, 定义了一套检查和控制用户代理(比如浏览器)的远程控制接口,各大主流浏览器来实现这些接口以便调用 ...
Web自动化之Headless Chrome概览
Web自动化这里所说的Web自动化是所有跟页面相关的自动化,比如页面爬取,数据抓取,页面内容检测,页面功能测试,页面加载性能测试,页面回归测试等等,当前主要由如下几种解决方式: 文本数据获取这就是 ...
Web自动化之Headless Chrome开发工具库
命令行运行Headless Chrome Chrome 安装(需要带梯子) 下载地址几个版本的比较 Chromium 不是Chrome,但Chrome的内容基本来源于Chromium,这个是开源的版 ...
爬虫实战：爬虫之 web 自动化终极杀手 ( 上）
欢迎大家前往腾讯云技术社区,获取更多腾讯海量技术实践干货哦~ 作者:陈象导语: 最近写了好几个简单的爬虫,踩了好几个深坑,在这里总结一下,给大家在编写爬虫时候能给点思路.本次爬虫内容有:静态页面的爬 ...
Selenium Web 自动化 - 项目实战（一）
Selenium Web 自动化 - 测试框架(一) 2016-08-05 目录 1 框架结构雏形2 把Java项目转变成Maven项目3 加入TestNG配置文件4 Eclipse编码修改5 编写代 ...
docker+headless+robotframework+jenkins实现web自动化持续集成
在Docker环境使headless实现web自动化持续集成一.制作镜像原则:自动化测试基于基础制作镜像命令:docker run --privileged --name=$1 --net=ho ...
《Selenium+Pytest Web自动化实战》随到随学在线课程，零基础也能学！
课程介绍课程主题:<Selenium+Pytest Web自动化实战> 适合人群: 1.功能测试转型自动化测试 2.web自动化零基础的小白 3.对python 和 selenium 有 ...
Selenium Web 自动化 - 项目实战（三）
Selenium Web 自动化 - 项目实战(三) 2016-08-10 目录 1 关键字驱动概述2 框架更改总览3 框架更改详解 3.1 解析新增页面目录 3.2 解析新增测试用例目录 3. ...
Selenium Web 自动化 - 项目实战环境准备
Selenium Web 自动化 - 项目实战环境准备 2016-08-29 目录 1 部署TestNG 1.1 安装TestNG 1.2 添加TestNG类库2 部署Maven 2.1 mav ...

随机推荐

（入门篇 NettyNIO开发指南）第三章-Netty入门应用
作为Netty的第一个应用程序,我们依然以第2章的时间服务器为例进行开发,通过Netty版本的时间服务报的开发,让初学者尽快学到如何搭建Netty开发环境和!运行Netty应用程序. 如果你已经熟悉N ...
DOM4J介绍与代码示例(2)-XPath 详解
XPath 详解,总结 XPath简介 XPath是W3C的一个标准.它最主要的目的是为了在XML1.0或XML1.1文档节点树中定位节点所设计.目前有XPath1.0和 XPath2.0两个版本.其 ...
C# 字典 Dictionary
原文地址http://www.cnblogs.com/txw1958/archive/2012/11/07/csharp-dictionary.html 侵删
Submin1安装记录（CentOS5）
安装SVN和Apache wget http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco -O /tmp/RPM-GPG-KEY-WANdisco & ...
Android依赖管理与私服搭建
在Android开发中,一个项目需要依赖许多的库,我们自己写的,第三方的等等,这篇文件介绍的就是自己搭建私服,创建自己的仓库,进行对我们自己写的库依赖管理.本文是在 mac book pro 环境上搭 ...
OpenGL判断一个点是否可见
关于OpenGL中判断一个点是否可见,可以分成两种情况讨论:点在2D空间中和3D空间中的时候.并且"在2D空间中"可以看作"在3D空间中"的特殊情况. 温馨提示 ...
作为前端，我为什么选择 Angular 2？
转自:https://sanwen8.cn/p/2226GkX.html 没有选择是痛苦的,有太多的选择却更加痛苦.而后者正是目前前端领域的真实写照.新的框架层出不穷:它难吗?它写得快吗?可维护性怎样 ...
html、css和js注释的规范用法
成为专业的前端工程师!!! html注释:  css注释: //注释内容单行注释(不推荐使用,因为有的浏览器可能不兼容,没有效果)/*注释内容*/ 多行注释(推荐使 ...
ACL2016信息抽取与知识图谱相关论文掠影
实体关系推理与知识图谱补全 Unsupervised Person Slot Filling based on Graph Mining 作者:Dian Yu, Heng Ji 机构:Computer ...
DDD理论学习系列（6）-- 实体
DDD理论学习系列--案例及目录 1.引言实体对应的英语单词为Entity.提到实体,你可能立马就想到了代码中定义的实体类.在使用一些ORM框架时,比如Entity Framework,实体作为直接 ...

Web自动化之Headless Chrome编码实战

API 概览 && 编码Tips

文档地址

常用API

编码Tips

编码实例

打开百度，获取页面性能数据，参考 Navigation Timing W3C规范

打开百度 搜索Web自动化 headless chrome，并爬取首屏结果链接

Web自动化之Headless Chrome编码实战的更多相关文章

随机推荐

热门专题

打开百度搜索`Web自动化 headless chrome`，并爬取首屏结果链接