编写爬虫(spider)的预备知识:用java发送HTTP请求
使用原生API来发送http请求,而不是使用apache的库,原因在于这个第三方库变化实在太快了,每个版本都有不小的变化。对于程序员来说,使用它反而会有很多麻烦,比如自己曾经写过的代码将无法复用。
原理简介
使用Java发送这两种请求的代码大同小异,只是一些参数设置的不同。步骤如下:
1.生成统一资源定位器(java.net.URL),并据此生成一个连接(java.net.URLConnection)
2.设置请求的参数
3.发送请求(get和post有区别)
4.以输入流的形式获取返回内容
5.关闭输入流
抓取百度网页
上代码:
package test; import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
import java.net.URLConnection;
import java.util.List;
import java.util.Map; public class HttpRequest {
/**
* 向指定URL发送GET方法的请求
*
* @param url
* 发送请求的URL
* @param param
* 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return URL 所代表远程资源的响应结果
*/
public static String sendGet(String url, String param) {
String result = "";
BufferedReader in = null;
try {
String urlNameString = url + "?" + param;
URL realUrl = new URL(urlNameString);
// 打开和URL之间的连接
URLConnection connection = realUrl.openConnection();
// 设置通用的请求属性
connection.setRequestProperty("accept", "*/*");
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
// 建立实际的连接
connection.connect();
// 获取所有响应头字段
Map<String, List<String>> map = connection.getHeaderFields();
// 遍历所有的响应头字段
for (String key : map.keySet()) {
System.out.println(key + "--->" + map.get(key));
}
// 定义 BufferedReader输入流来读取URL的响应
in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line;
while ((line = in.readLine()) != null) {
result += line;
System.out.println("@2" + line);
}
} catch (Exception e) {
System.out.println("发送GET请求出现异常!" + e);
e.printStackTrace();
}
// 使用finally块来关闭输入流
finally {
try {
if (in != null) {
in.close();
}
} catch (Exception e2) {
e2.printStackTrace();
}
}
return result;
} /**
* 向指定 URL 发送POST方法的请求
*
* @param url
* 发送请求的 URL
* @param param
* 请求参数,请求参数应该是 name1=value1&name2=value2 的形式。
* @return 所代表远程资源的响应结果
*/
public static String sendPost(String url, String param) {
PrintWriter out = null;
BufferedReader in = null;
String result = "";
try {
URL realUrl = new URL(url);
// 打开和URL之间的连接
URLConnection conn = realUrl.openConnection();
// 设置通用的请求属性
conn.setRequestProperty("accept", "*/*");
conn.setRequestProperty("connection", "Keep-Alive");
conn.setRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
// 发送POST请求必须设置如下两行
conn.setDoOutput(true);
conn.setDoInput(true);
// 获取URLConnection对象对应的输出流
out = new PrintWriter(conn.getOutputStream());
// 发送请求参数
out.print(param);
// flush输出流的缓冲
out.flush();
// 定义BufferedReader输入流来读取URL的响应
in = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
String line;
while ((line = in.readLine()) != null) {
result += line;
}
} catch (Exception e) {
System.out.println("发送 POST 请求出现异常!"+e);
e.printStackTrace();
}
//使用finally块来关闭输出流、输入流
finally{
try{
if(out!=null){
out.close();
}
if(in!=null){
in.close();
}
}
catch(IOException ex){
ex.printStackTrace();
}
}
return result;
} public static void main(String[] args) {
//发送 GET 请求
//String s=HttpRequest.sendGet("https://www.baidu.com", "key=123&v=456");
String s=HttpRequest.sendPost("https://www.baidu.com", "");
System.out.println("@1" + s); //发送 POST 请求
//String sr=HttpRequest.sendPost("https://www.baidu.com", "key=123&v=456");
//System.out.println(sr);
}
}
两种方法在控制台中的输出是不同的,用post请求才能得到整个html。原因是,如果用get方法发送请求,会被服务器要求重定向到http协议的url:
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
tip:
httpUrlConnection.setDoOutput(true);以后就可以使用conn.getOutputStream().write()
httpUrlConnection.setDoInput(true);以后就可以使用conn.getInputStream().read();
get请求用不到conn.getOutputStream(),因为参数直接追加在地址后面,因此默认是false。
post请求(比如:文件上传)需要往服务区传输大量的数据,这些数据是放在http的body里面的,因此需要在建立连接以后,往服务端写数据。
因为总是使用conn.getInputStream()获取服务端的响应,因此默认值是true。
下面分别详细介绍。
使用Get方法:
package test; import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection; public class HttpGetRequest { /**
* Main
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
System.out.println(doGet());
} /**
* Get Request
* @return
* @throws Exception
*/
public static String doGet() throws Exception {
URL localURL = new URL("http://localhost:8080/OneHttpServer/");
URLConnection connection = localURL.openConnection();
HttpURLConnection httpURLConnection = (HttpURLConnection)connection; httpURLConnection.setRequestProperty("Accept-Charset", "utf-8");
httpURLConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); InputStream inputStream = null;
InputStreamReader inputStreamReader = null;
BufferedReader reader = null;
StringBuffer resultBuffer = new StringBuffer();
String tempLine = null; if (httpURLConnection.getResponseCode() >= 300) {
throw new Exception("HTTP Request is not success, Response code is " + httpURLConnection.getResponseCode());
} try {
inputStream = httpURLConnection.getInputStream();
inputStreamReader = new InputStreamReader(inputStream);
reader = new BufferedReader(inputStreamReader); while ((tempLine = reader.readLine()) != null) {
resultBuffer.append(tempLine);
} } finally { if (reader != null) {
reader.close();
} if (inputStreamReader != null) {
inputStreamReader.close();
} if (inputStream != null) {
inputStream.close();
} } return resultBuffer.toString();
} } HttpGetRequest
使用Post方法:
package test; import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection; public class HttpPostRequest { /**
* Main
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
System.out.println(doPost());
} /**
* Post Request
* @return
* @throws Exception
*/
public static String doPost() throws Exception {
String parameterData = "username=nickhuang&blog=http://www.cnblogs.com/nick-huang/"; URL localURL = new URL("http://localhost:8080/OneHttpServer/");
URLConnection connection = localURL.openConnection();
HttpURLConnection httpURLConnection = (HttpURLConnection)connection; httpURLConnection.setDoOutput(true);
httpURLConnection.setRequestMethod("POST");
httpURLConnection.setRequestProperty("Accept-Charset", "utf-8");
httpURLConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
httpURLConnection.setRequestProperty("Content-Length", String.valueOf(parameterData.length())); OutputStream outputStream = null;
OutputStreamWriter outputStreamWriter = null;
InputStream inputStream = null;
InputStreamReader inputStreamReader = null;
BufferedReader reader = null;
StringBuffer resultBuffer = new StringBuffer();
String tempLine = null; try {
outputStream = httpURLConnection.getOutputStream();
outputStreamWriter = new OutputStreamWriter(outputStream); outputStreamWriter.write(parameterData.toString());
outputStreamWriter.flush(); if (httpURLConnection.getResponseCode() >= 300) {
throw new Exception("HTTP Request is not success, Response code is " + httpURLConnection.getResponseCode());
} inputStream = httpURLConnection.getInputStream();
inputStreamReader = new InputStreamReader(inputStream);
reader = new BufferedReader(inputStreamReader); while ((tempLine = reader.readLine()) != null) {
resultBuffer.append(tempLine);
} } finally { if (outputStreamWriter != null) {
outputStreamWriter.close();
} if (outputStream != null) {
outputStream.close();
} if (reader != null) {
reader.close();
} if (inputStreamReader != null) {
inputStreamReader.close();
} if (inputStream != null) {
inputStream.close();
} } return resultBuffer.toString();
} } HttpPostRequest
封装&复用
这样,这个类的实例就引用了一个请求器,帮助线程完成抓取任务
package test; import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.URL;
import java.net.URLConnection;
import java.util.Iterator;
import java.util.Map; public class HttpRequestor { private String charset = "utf-8";
private Integer connectTimeout = null;
private Integer socketTimeout = null;
private String proxyHost = null;
private Integer proxyPort = null; /**
* Do GET request
* @param url
* @return
* @throws Exception
* @throws IOException
*/
public String doGet(String url) throws Exception { URL localURL = new URL(url); URLConnection connection = openConnection(localURL);
HttpURLConnection httpURLConnection = (HttpURLConnection)connection; httpURLConnection.setRequestProperty("Accept-Charset", charset);
httpURLConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); InputStream inputStream = null;
InputStreamReader inputStreamReader = null;
BufferedReader reader = null;
StringBuffer resultBuffer = new StringBuffer();
String tempLine = null; if (httpURLConnection.getResponseCode() >= 300) {
throw new Exception("HTTP Request is not success, Response code is " + httpURLConnection.getResponseCode());
} try {
inputStream = httpURLConnection.getInputStream();
inputStreamReader = new InputStreamReader(inputStream);
reader = new BufferedReader(inputStreamReader); while ((tempLine = reader.readLine()) != null) {
resultBuffer.append(tempLine);
} } finally { if (reader != null) {
reader.close();
} if (inputStreamReader != null) {
inputStreamReader.close();
} if (inputStream != null) {
inputStream.close();
} } return resultBuffer.toString();
} /**
* Do POST request
* @param url
* @param parameterMap
* @return
* @throws Exception
*/
public String doPost(String url, Map parameterMap) throws Exception { /* Translate parameter map to parameter date string */
StringBuffer parameterBuffer = new StringBuffer();
if (parameterMap != null) {
Iterator iterator = parameterMap.keySet().iterator();
String key = null;
String value = null;
while (iterator.hasNext()) {
key = (String)iterator.next();
if (parameterMap.get(key) != null) {
value = (String)parameterMap.get(key);
} else {
value = "";
} parameterBuffer.append(key).append("=").append(value);
if (iterator.hasNext()) {
parameterBuffer.append("&");
}
}
} System.out.println("POST parameter : " + parameterBuffer.toString()); URL localURL = new URL(url); URLConnection connection = openConnection(localURL);
HttpURLConnection httpURLConnection = (HttpURLConnection)connection; httpURLConnection.setDoOutput(true);
httpURLConnection.setRequestMethod("POST");
httpURLConnection.setRequestProperty("Accept-Charset", charset);
httpURLConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
httpURLConnection.setRequestProperty("Content-Length", String.valueOf(parameterBuffer.length())); OutputStream outputStream = null;
OutputStreamWriter outputStreamWriter = null;
InputStream inputStream = null;
InputStreamReader inputStreamReader = null;
BufferedReader reader = null;
StringBuffer resultBuffer = new StringBuffer();
String tempLine = null; try {
outputStream = httpURLConnection.getOutputStream();
outputStreamWriter = new OutputStreamWriter(outputStream); outputStreamWriter.write(parameterBuffer.toString());
outputStreamWriter.flush(); if (httpURLConnection.getResponseCode() >= 300) {
throw new Exception("HTTP Request is not success, Response code is " + httpURLConnection.getResponseCode());
} inputStream = httpURLConnection.getInputStream();
inputStreamReader = new InputStreamReader(inputStream);
reader = new BufferedReader(inputStreamReader); while ((tempLine = reader.readLine()) != null) {
resultBuffer.append(tempLine);
} } finally { if (outputStreamWriter != null) {
outputStreamWriter.close();
} if (outputStream != null) {
outputStream.close();
} if (reader != null) {
reader.close();
} if (inputStreamReader != null) {
inputStreamReader.close();
} if (inputStream != null) {
inputStream.close();
} } return resultBuffer.toString();
} private URLConnection openConnection(URL localURL) throws IOException {
URLConnection connection;
if (proxyHost != null && proxyPort != null) {
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort));
connection = localURL.openConnection(proxy);
} else {
connection = localURL.openConnection();
}
return connection;
} /**
* Render request according setting
* @param request
*/
private void renderRequest(URLConnection connection) { if (connectTimeout != null) {
connection.setConnectTimeout(connectTimeout);
} if (socketTimeout != null) {
connection.setReadTimeout(socketTimeout);
} } /*
* Getter & Setter
*/
public Integer getConnectTimeout() {
return connectTimeout;
} public void setConnectTimeout(Integer connectTimeout) {
this.connectTimeout = connectTimeout;
} public Integer getSocketTimeout() {
return socketTimeout;
} public void setSocketTimeout(Integer socketTimeout) {
this.socketTimeout = socketTimeout;
} public String getProxyHost() {
return proxyHost;
} public void setProxyHost(String proxyHost) {
this.proxyHost = proxyHost;
} public Integer getProxyPort() {
return proxyPort;
} public void setProxyPort(Integer proxyPort) {
this.proxyPort = proxyPort;
} public String getCharset() {
return charset;
} public void setCharset(String charset) {
this.charset = charset;
} } HttpRequestor
HttpRequestor的测试代码
客户端代码:
package test; import java.util.HashMap;
import java.util.Map; public class Call { public static void main(String[] args) throws Exception { /* Post Request */
Map dataMap = new HashMap();
dataMap.put("username", "Nick Huang");
dataMap.put("blog", "IT");
System.out.println(new HttpRequestor().doPost("http://localhost:8080/OneHttpServer/", dataMap)); /* Get Request */
System.out.println(new HttpRequestor().doGet("http://localhost:8080/OneHttpServer/"));
} } Call
服务端代码:
import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse; public class LoginServlet extends HttpServlet {
private static final long serialVersionUID = 1L; public LoginServlet() {
super();
} protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
this.doPost(request, response);
} protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String username = request.getParameter("username");
String blog = request.getParameter("blog"); System.out.println(username);
System.out.println(blog); response.setContentType("text/plain; charset=UTF-8");
response.setCharacterEncoding("UTF-8");
response.getWriter().write("It is ok!");
} } LoginServlet
web.xml
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
<display-name>OneHttpServer</display-name>
<welcome-file-list>
<welcome-file>LoginServlet</welcome-file>
</welcome-file-list> <servlet>
<description></description>
<display-name>LoginServlet</display-name>
<servlet-name>LoginServlet</servlet-name>
<servlet-class>LoginServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>LoginServlet</servlet-name>
<url-pattern>/LoginServlet</url-pattern>
</servlet-mapping> </web-app> web.xml
编写爬虫(spider)的预备知识:用java发送HTTP请求的更多相关文章
- Java发送Http请求并获取状态码
通过Java发送url请求,查看该url是否有效,这时我们可以通过获取状态码来判断. try { URL u = new URL("http://10.1.2.8:8080/fqz/page ...
- 通过java发送http请求
通常的http请求都是由用户点击某个连接或者按钮来发起的,但是在一些后台的Java程序中需要发送一些get或这post请求,因为不涉及前台页面,该怎么办呢? 下面为大家提供一个Java发送http请求 ...
- Java发送HTTPS请求
前言 上篇文章介绍了 java 发送 http 请求,大家都知道发送http是不安全的 .我也是由于对接了其他企业后总结了一套发送 https的工具.大家网上找方法很多的,但是可不是你粘过来就能用啊, ...
- 使用Java发送Http请求的内容
公司要将自己的产品封装一个WebService平台,所以最近开始学习使用Java发送Http请求的内容.这一块之前用PHP的时候写的也比较多,从用最基本的Socket和使用第三方插件都用过. 学习了J ...
- Java发送socket请求的工具
package com.tech.jin.util; import java.io.ByteArrayOutputStream; import java.io.IOException; import ...
- Java发送Http请求
package com.liuyu.test; import java.io.BufferedReader; import java.io.IOException; import java.io.In ...
- Java发送post请求
package com.baoxiu.test; import java.io.BufferedReader;import java.io.InputStreamReader;import java. ...
- java发送post请求 ,请求数据放到body里
java利用httpclient发送post请求 ,请求数据放到body里. /** * post请求 ,请求数据放到body里 * * @author lifq * * 2017年3月15日 下午3 ...
- JAVA发送HttpClient请求及接收请求结果
1.写一个HttpRequestUtils工具类,包括post请求和get请求 ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2 ...
随机推荐
- python库:fuzzywuzzy
fuzzywuzzy 用于字符串匹配率.令牌匹配等 复制代码代码如下: from fuzzywuzzy import fuzzfuzz.ratio("Hit me with your bes ...
- 【设计模式六大原则3】依赖倒置原则(Dependence Inversion Principle)
定义:高层模块不应该依赖低层模块,二者都应该依赖其抽象:抽象不应该依赖细节:细节应该依赖抽象. 问题由来:类A直接依赖类B,假如要将类A改为依赖类C,则必须通过修改类A的代码来达成.这种场景下,类 ...
- Spring @ Component 的作用
1.@controller 控制器(注入服务) 2.@service 服务(注入dao) 3.@repository dao(实现dao访问) 4.@component (把普通pojo实例化到spr ...
- Splay树再学习
队友最近可能在学Splay,然后让我敲下HDU1754的题,其实是很裸的一个线段树,不过用下Splay也无妨,他说他双旋超时,单旋过了,所以我就敲来看下.但是之前写的那个Splay越发的觉得不能看,所 ...
- Android ImageButton的背景(图片)大小
使用ImageButton的background属性,而不用src属性. 然后使用width和height进行调整.
- lintcode:最大子数组II
题目 最大子数组 II 给定一个整数数组,找出两个不重叠子数组使得它们的和最大. 每个子数组的数字在数组中的位置应该是连续的. 返回最大的和. 样例 给出数组[1, 3, -1, 2, -1, 2], ...
- hdu 1003(详解) java
算法分析: 一列数 a[0],a[1],……a[i]…… 定义数组b[0],b[1,]……b[i]…… ***b[i]为i之前的任意位置到i的最大连续和!!! b[i]=max{b[i-1]+a[ ...
- kmeans理解
最近看到Andrew Ng的一篇论文,文中用到了Kmeans和DL结合的思想,突然发现自己对ML最基本的聚类算法都不清楚,于是着重的看了下Kmeans,并在网上找了程序跑了下. kmeans是unsu ...
- *[codility]CartesianSequence
https://codility.com/programmers/challenges/upsilon2012 求笛卡尔树的高度,可以用单调栈来做. 维持一个单调递减的栈,每次进栈的时候记录下它之后有 ...
- 【PSR规范专题(1)】PSR-0+namespace+spl_autoload_register实现框架模型
了解命名空间 namespace是PHP5.3版本加入的新特性,用来解决在编写类库或应用程序时创建可重用的代码如类或函数时碰到的两类问题: 用户编写的代码与PHP内部的类/函数/常量或第三方类/函数/ ...