selenium WebDriver 截取网站的验证码

在做爬虫项目的时候，有时候会遇到验证码的问题，由于某些网站的验证码是动态生成的，即使是同一个链接，在不同的时间访问可能产生不同的验证码，

一刚开始的思路就是打开这个验证码的链接，然后通过java代码get请求保存验证码图片到本地，然后用打码工具解析验证码，将验证码自动输入验证框就

可以把验证码的问题解决了，但是问题来，每次的请求同一个地址，产生的验证码图片是不一样的，所以这种方法行不通。所以只能将图片先用selenium WebDriver

截取到本地，然后用打码工具解析ok ,自动填写验证，很好把验证码的问题解决了。

package com.entrym.main;

import java.awt.image.BufferedImage;

import java.io.File;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Date;

import java.util.HashMap;

import java.util.List;

import java.util.Set;

import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;

import org.apache.commons.lang3.StringUtils;

import org.json.JSONObject;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.openqa.selenium.By;

import org.openqa.selenium.Cookie;

import org.openqa.selenium.OutputType;

import org.openqa.selenium.Point;

import org.openqa.selenium.TakesScreenshot;

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.WebElement;

import org.openqa.selenium.chrome.ChromeDriver;

import org.openqa.selenium.support.ui.ExpectedCondition;

import org.openqa.selenium.support.ui.WebDriverWait;

import com.entrym.crawler.util.verifyCode.Captcha;

import com.entrym.crawler.util.verifyCode.DamaUtil;

import com.entrym.domain.SogouInfo;

import com.entrym.domain.Wxinfo;

import com.entrym.util.ConfigUtil;

import com.entrym.util.DateUtil;

import com.entrym.util.HttpUtils;

import com.google.gson.Gson;

import com.vdurmont.emoji.EmojiParser;

public class WebTest {

	private static final String GET_TITLE="/titles/getxiaoshuo";

        private static final String PATH=new File("config/config.properties").getAbsolutePath();

	private static final String CHROME_HOME=new File("config/chromedriver.exe").getAbsolutePath();

	private static final String CHROME_HOME_LINUX=new File("config/chromedriver").getAbsolutePath();

	private static final String BASEURL=ConfigUtil.reads(PATH, "baseurl");

	public static void main(String[] args) throws IOException {

			WebDriver driver=null;

//			System.setProperty("webdriver.gecko.driver", FIREFOX_HOME);

				System.out.println(PATH);

			String osname=System.getProperty("os.name").toLowerCase();

			if(osname.indexOf("linux")>=0){

				System.setProperty("webdriver.chrome.driver", CHROME_HOME_LINUX);

//				driver = new MarionetteDriver();

			}else{

				System.setProperty("webdriver.chrome.driver", CHROME_HOME);

//				driver = new MarionetteDriver();

			}

			driver=new ChromeDriver();

			driver.get("http://weixin.sogou.com/antispider/?from=%2fweixin%3Ftype%3d2%26query%3dz+%26ie%3dutf8%26s_from%3dinput%26_sug_%3dy%26_sug_type_%3d");

			WebElement ele = driver.findElement(By.id("seccodeImage"));

			// Get entire page screenshot

			File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);

			BufferedImage  fullImg = ImageIO.read(screenshot);

			// Get the location of element on the page

			Point point = ele.getLocation();

			// Get width and height of the element

			int eleWidth = ele.getSize().getWidth();

			int eleHeight = ele.getSize().getHeight();

			// Crop the entire page screenshot to get only element screenshot

			BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),

			    eleWidth, eleHeight);

			ImageIO.write(eleScreenshot, "png", screenshot);

			// Copy the element screenshot to disk

			File screenshotLocation = new File("D:/captcha/test.png");

			FileUtils.copyFile(screenshot, screenshotLocation);

			WebElement classelement = driver.findElement(By.className("p2"));

			String errorText=classelement.getText();

			System.out.println("输出的内容是"+classelement.getText());

			if(errorText.indexOf("用户您好，您的访问过于频繁，为确认本次访问为正常用户行为")>=0){

				System.out.println("*********************");

				DamaUtil util=new DamaUtil();

		            System.out.println("===================");

		            String code="";           //验证码

					Captcha captcha=new Captcha();

					captcha.setFilePath("test.png");

					code = DamaUtil.getCaptchaResult(captcha);

					System.out.println("打码处理出来的验证码是"+code);

					WebElement elementsumbit = driver.findElement(By.id("seccodeInput"));

			        // 输入关键字

					elementsumbit.sendKeys(code);

					try {

						Thread.sleep(1000);

					} catch (InterruptedException e) {

						// TODO Auto-generated catch block

						e.printStackTrace();

					}

			        // 提交 input 所在的  form

					elementsumbit.submit();

					System.out.println("成功");

			}

		}

}

以上就代码，关键的代码在Stack Overflow得到的，不得不说谷歌还是很强大的

喜欢呼呼的文章的朋友，可以关注呼呼的个人公众号：

driver.get("http://www.google.com");

WebElement ele = driver.findElement(By.id("hplogo"));

// Get entire page screenshot

File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);

BufferedImage  fullImg = ImageIO.read(screenshot);

// Get the location of element on the page

Point point = ele.getLocation();

// Get width and height of the element

int eleWidth = ele.getSize().getWidth();

int eleHeight = ele.getSize().getHeight();

// Crop the entire page screenshot to get only element screenshot

BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),

    eleWidth, eleHeight);

ImageIO.write(eleScreenshot, "png", screenshot);

// Copy the element screenshot to disk

File screenshotLocation = new File("C:\\images\\GoogleLogo_screenshot.png");

FileUtils.copyFile(screenshot, screenshotLocation);

以上就是关键的截取代码，在国外的链接是http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin
感兴趣的小伙伴可以研究一下

selenium WebDriver 截取网站的验证码的更多相关文章

使用 mitmdump 进行 selenium webDriver绕过网站反爬服务的方法 pdd某宝可用
安装: pip install mitmproxy 新建一个脚本脚本代码: from mitmproxy import ctx injected_javascript = ''' // over ...
selenium webdriver 相关网站
ITeye:http://shijincheng0223.iteye.com/blog/1481446 http://ztreeapi.iteye.com/blog/1750554 http://sm ...
Selenium WebDriver对cookie进行处理绕过登录验证码
现在几乎所有登录页面都会带一个验证码,做起自动化这块比较麻烦, 所以要绕过网站的验证码. 首先需要手动登录一次你的测试网站,去chrome的F12里获取这个网站的cookie信息,找到对应的保存登录信 ...
（java）selenium webdriver爬虫学习--爬取阿里指数网站的每个分类的top50 相关数据；
主题:java 爬虫--爬取'阿里指数'网站的每个分类的top50 相关数据: 网站网址为:http://index.1688.com/alizs/top.htm?curType=offer& ...
selenium webdriver (python)的基本用法一
阅在线 AIP 文档:http://selenium.googlecode.com/git/docs/api/py/index.html目录一.selenium+python 环境搭建........ ...
python利用selenium库识别点触验证码
利用selenium库和超级鹰识别点触验证码(学习于静谧大大的书,想自己整理一下思路) 一.超级鹰注册:超级鹰入口 1.首先注册一个超级鹰账号,然后在超级鹰免费测试地方可以关注公众号,领取1000积分 ...
一行js代码识别Selenium+Webdriver及其应对方案
有不少朋友在开发爬虫的过程中喜欢使用Selenium + Chromedriver,以为这样就能做到不被网站的反爬虫机制发现. 先不说淘宝这种基于用户行为的反爬虫策略,仅仅是一个普通的小网站,使用一行 ...
Selenium+Webdriver被检测识别出来的应对方案
在写爬虫,面对很多js 加载的页面,很多人束手无策,更多的人喜欢用Senlenium+ Webdriver,古语有云:道高一尺魔高一丈.已淘宝为首,众多网站都针对 Selenium的js监测机制, 比 ...
利用selenium库自动执行滑动验证码模拟登陆
破解流程 #1.输入账号.密码,然后点击登陆 #2.点击按钮,弹出没有缺口的图 #3.针对没有缺口的图片进行截图 #4.点击滑动按钮,弹出有缺口的图 #5.针对有缺口的图片进行截图 #6.对比两张图片 ...

随机推荐

linux后台运行的几种方式
1.nohup将程序以忽略挂起信号的方式运行起来补充说明nohup命令可以将程序以忽略挂起信号的方式运行起来,被运行的程序的输出信息将不会显示到终端.无论是否将 nohup 命令的输出重定向到终端 ...
Django-用户-组-权限
前言 RBAC(Role-Based Access Control,基于角色的访问控制)就是用户通过角色与权限进行关联.简单地说,一个用户拥有若干角色,每一个角色拥有若干权限.这样,就构造成“用户-角 ...
nginx之gzip压缩提升网站速度
目录: 为啥使用gzip压缩 nginx使用gzip gzip的常用配置参数 nginx配置gzip 注意为啥使用gzip压缩开启nginx的gzip压缩,网页中的js,css等静态资源的大小会大 ...
试试 python-dotenv，避免敏感信息被硬编码到代码中
我们开发的每个系统都离不开配置信息,例如数据库密码.Redis密码.邮件配置.各种第三方配置信息,这些信息都非常敏感,一旦泄露出去后果非常严重,被泄露的原因一般是程序员将配置信息和代码混在一起导致的. ...
（转）2019年给Java编程初学者的建议（附学习大纲）
本文链接:https://blog.csdn.net/javajlb/article/details/85920904 1. 引言这是一篇初学者干货,请耐心看完,希望对你有帮助作为初学者的你,命中了 ...
上个月，我赚了2W外快。。。
前段时间和室友一起给某个公司做了一个管理系统,每个人分2W多.这里和大家分享一下做完项目后一点点感受,想到啥就说点啥. 核心竞争力两个月就挣了2W块,挣了我爸妈两个人一年的收入,每天还贼辛苦,披星戴 ...
Spring Context 你真的懂了吗
今天介绍一下大家常见的一个单词 context 应该怎么去理解,正确的理解它有助于我们学习 spring 以及计算机系统中的其他知识. 1. context 是什么我们经常在编程中见到 contex ...
springboot整合html时的页面的跳转404
在用springboot对html的页面进行渲染时,页面找不到报404(type=Not Found, status=404)., 解决办法:是在ctroller层加相应的 @Re ...
更新！ArcMap和ArcGIS Pro加载百度影像地图
上一篇文章写了ArcMap和ArcGIS Pro中加载百度地图的方法一次没有把百度影像加载的功能开发出来,趁这几天有空整理了下加载方法按照上次那篇文章操作. 百度影像wmts加载地址:http: ...
Zabbix安装时出现缺少PHP模块，解决过程
我在安装时PHP缺少gettext模块和bcmath模块:一下为解决步骤: 1.进入到PHP源码包目录下的ext目录: #cd /soft/php-/ext 2.会看到ext目录下有gettext目录 ...

selenium WebDriver 截取网站的验证码

selenium WebDriver 截取网站的验证码的更多相关文章

随机推荐

热门专题