htmlunit最具有参考意义项目
### HtmlUnit What?
- 项目1 https://gitee.com/dgwcode/spiderTmallTradeInfo
- 项目2 https://gitee.com/dgwcode/SimulationFang
这两个项目,是最新Htmlunit包下的新项目,很多东西在国内博客,百度,技术网站都找不到例子,有时间可以看看。
Introduction
The dependencies page lists all the jars that you will need to have in your classpath.
The class com.gargoylesoftware.htmlunit.WebClient is the main starting point. This simulates a web browser and will be used to execute all of the tests.
Most unit testing will be done within a framework like JUnit so all the examples here will assume that we are using that.
In the first sample, we create the web client and have it load the homepage from the HtmlUnit website. We then verify that this page has the correct title. Note that getPage() can return different types of pages based on the content type of the returned data. In this case we are expecting a content type of text/html so we cast the result to an com.gargoylesoftware.htmlunit.html.HtmlPage.
@Test
public void homePage() throws Exception {
try (final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); final String pageAsXml = page.asXml();
Assert.assertTrue(pageAsXml.contains("<body class=\"composite\">")); final String pageAsText = page.asText();
Assert.assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));
}
}
Submitting a form
Frequently we want to change values in a form and submit the form back to the server. The following example shows how you might do this.
@Test
public void submittingForm() throws Exception {
try (final WebClient webClient = new WebClient()) { // Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url"); // Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform"); final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid"); // Change the value of the text field
textField.type("root"); // Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
}
}
Imitating a specific browser
Often you will want to simulate a specific browser. This is done by passing a com.gargoylesoftware.htmlunit.BrowserVersion into the WebClient constructor. Constants have been provided for some common browsers but you can create your own specific version by instantiating a BrowserVersion.
@Test
public void homePage_Firefox() throws Exception {
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52)) {
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}
}
Specifying this BrowserVersion will change the user agent header that is sent up to the server and will change the behavior of some of the JavaScript.
Finding a specific element
Once you have a reference to an HtmlPage, you can search for a specific HtmlElement by one of 'get' methods, or by using XPath or CSS selectors.
Traversing the DOM tree
Below is an example of finding a 'div' by an ID, and getting an anchor by name:
@Test
public void getElements() throws Exception {
try (final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("http://some_url");
final HtmlDivision div = page.getHtmlElementById("some_div_id");
final HtmlAnchor anchor = page.getAnchorByName("anchor_name");
}
}
A simple way for finding elements might be to find all elements of a specific type.
@Test
public void getElements() throws Exception {
try (final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("http://some_url");
NodeList inputs = page.getElementsByTagName("input");
final Iterator<E> nodesIterator = nodes.iterator();
// now iterate
}
}
There is rich set of methods usable to locate page elements e.g.
- HtmlPage.getAnchors(); HtmlPage.getAnchorByHref(String); HtmlPage.getAnchorByName(String); HtmlPage.getAnchorByText(String)
- HtmlPage.getElementById(String); HtmlPage.getElementsById(String); HtmlPage.getElementsByIdAndOrName(String);
- HtmlPage.getElementByName(String); HtmlPage.getElementsByName(String)
- HtmlPage.getFormByName(String); HtmlPage.getForms()
- HtmlPage.getFrameByName(String); HtmlPage.getFrames()
You can also start searching from the document element (HtmlPage.getDocumentElement()) and then traverse the dom tree
- HtmlElement.getElementsByAttribute(String, String, String)
- DomElement.getElementsByTagName(String); DomElement.getElementsByTagNameNS(String, String)
- DomElement.getChildElements(); DomElement.getChildElementCount()
- DomElement.getFirstElementChild(); DomElement.getLastElementChild()
- HtmlElement.getEnclosingElement(String); HtmlElement.getEnclosingForm()
- DomNode.getChildNodes(); DomNode.getChildren(); DomNode.getDescendants(); DomNode.getDomElementDescendants(); DomNode.getFirstChild(); DomNode.getHtmlElementDescendants() DomNode.getLastChild(); DomNode.getNextElementSibling(); DomNode.getNextSibling(); DomNode.getPreviousElementSibling(); getPreviousSibling()
XPath queries
XPath is the suggested way for more complex searches, a brief tutorial can be found in W3Schools
@Test
public void xpath() throws Exception {
try (final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); //get list of all divs
final List<?> divs = page.getByXPath("//div"); //get div which has a 'name' attribute of 'John'
final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@name='John']").get(0);
}
}
CSS Selectors
You can also use CSS selectors
@Test
public void cssSelector() throws Exception {
try (final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); //get list of all divs
final DomNodeList<DomNode> divs = page.querySelectorAll("div");
for (DomNode div : divs) {
....
} //get div which has the id 'breadcrumbs'
final DomNode div = page.querySelector("div#breadcrumbs");
}
}
Using a proxy server
The last WebClient constructor allows you to specify proxy server information in those cases where you need to connect through one.
@Test
public void homePage_proxy() throws Exception {
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52, "myproxyserver", myProxyPort)) { //set proxy username and password
final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
credentialsProvider.addCredentials("username", "password"); final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}
}
Specifying this BrowserVersion will change the user agent header that is sent up to the server and will change the behavior of some of the JavaScript.
htmlunit最具有参考意义项目的更多相关文章
- 项目开发中的一些注意事项以及技巧总结 基于Repository模式设计项目架构—你可以参考的项目架构设计 Asp.Net Core中使用RSA加密 EF Core中的多对多映射如何实现? asp.net core下的如何给网站做安全设置 获取服务端https证书 Js异常捕获
项目开发中的一些注意事项以及技巧总结 1.jquery采用ajax向后端请求时,MVC框架并不能返回View的数据,也就是一般我们使用View().PartialView()等,只能返回json以 ...
- Hbase运维参考(项目)
1 Hbase日常运维 1.1 监控Hbase运行状况 1.1.1 操作系统 1.1.1.1 IO 群集网络IO,磁盘IO,HDFS IO IO越大说明文件读写操作越多.当IO突然增加时,有可能:1. ...
- 基于Repository模式设计项目架构—你可以参考的项目架构设计
关于Repository模式,直接百度查就可以了,其来源是<企业应用架构模式>.我们新建一个Infrastructure文件夹,这里就是基础设施部分,EF Core的上下文类以及Repos ...
- 一个很有参考意义的unity博客
http://blog.csdn.net/lyh916/article/details/45133101
- [转]C,C++开源项目中的100个Bugs
[转]C,C++开源项目中的100个Bugs http://tonybai.com/2013/04/10/100-bugs-in-c-cpp-opensource-projects/ 俄罗斯OOO P ...
- 又见angular----步一步做一个angular4小项目
这两天看了看angular4的文档,发现他和angular1.X的差别真的是太大了,官方给出的那个管理英雄的Demo是一个非常好的入门项目,这里给出一个管理个人计划的小项目,从头至尾一步一步讲解如何去 ...
- C++框架_之Qt的开始部分_概述_安装_创建项目_快捷键等一系列注意细节
C++框架_之Qt的开始部分_概述_安装_创建项目_快捷键等一系列注意细节 1.Qt概述 1.1 什么是Qt Qt是一个跨平台的C++图形用户界面应用程序框架.它为应用程序开发者提供建立艺术级图形界面 ...
- 再遇angular(angular4项目实战指南)
这两天看了看angular4的文档,发现他和angular1.X的差别真的是太大了,官方给出的那个管理英雄的Demo是一个非常好的入门项目,这里给出一个管理个人计划的小项目,从头至尾一步一步讲解如何去 ...
- 开源项目商业模式分析(2) - 持续维护的重要性 - Selenium和WatiN
该系列第一篇发布后收到不少反馈,包括: 第一篇里说的MonicaHQ不一定盈利 没错,但是问题在于绝大多数开源项目商业数据并没有公开,从而无法判断其具体是否盈利.难得MonicaHQ是公开的,所以才用 ...
随机推荐
- FEC之异或运算应用
话说为啥FEC需要异或( ^/⊕ )操作呢? 异或:xor 异或运算规则: 0 xor 0 = 0 0 xor 1 = 1 1 xor 0 = 1 1 xor 1 = 0 异或运算特性: 1). a ...
- 图形化升级单机oracle 11.2.0.4 到 12.2.0.1
1. 讲补丁包上传到 Oracle server ,解压.安装 [oracle@11g tmp]$ unzip linuxx64_12201_database.zip 2. 检查当前版本 SQL> ...
- vmware station中 UDEV 无法获取共享存储磁盘的UUID,症状: scsi_id -g -u -d /dev/sdb 无返回结果。
1.确认在所有RAC节点上已经安装了必要的UDEV包 [root@11gnode1 ~]# rpm -qa|grep udevsystem-config-printer-udev-1.1.16-25. ...
- bzoj 1070 修车 —— 费用流
题目:https://www.lydsy.com/JudgeOnline/problem.php?id=1070 需要考虑前面修的车对后面等待的车造成的时间增加: 其实可以从每个人修车的顺序考虑,如果 ...
- Java中String和byte[]间的 转换
数据库的字段中使用了blob类型时,在entity中此字段可以对应为byte[] 类型,保存到数据库中时需要把传入的参数转为byte[]类型,读取的时候再通过将byte[]类型转换为String类型. ...
- [转]unity3d中创建双面材质
在其它三维软件中设置好的双面材质导入到unity3d中就失去了效果,不过我们可以通过自定义材质来在unity3d中实现双面材质的效果.步骤如下:1.在资源库中新建一新shader:代码如下: Shad ...
- 第 五 课 golang语言变量
1 变量三种声明: (第一种的var和类型都是多余: 第二种最简洁,但是第二种只能用在函数中,不能是全局变量的声明) 第一种: var v_name v_type(注意顺序) v_nam ...
- mysql中有多少种日志
Mysql的日志包括如下几种日志: 错误日志 普通查询日志 二进制日志 慢查询日志 Mysql版本 此文档测试mysql的版本为 mysql -V 错误日志 error log Mysql错误日志主要 ...
- 6.6 安装IDEA
非常感谢Kevin指导.让我简化了安装步骤.安装包可以直接到我的公司文件夹中sunny文件夹中获取. 首先准备好安装包: 然后打开终端: 解压,进入bin目录,执行idea.sh;或者,直接运行: b ...
- Statement, PreparedStatement和CallableStatement的区别
Statement用于执行不带参数的简单SQL语句,并返回它所生成的结果,每次执行SQL豫剧时,数据库都要编译该SQL语句. Satatement stmt = conn.getStatement() ...