httpClient download file(爬虫)
package com.opensource.httpclient.bfs;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.http.Header;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
public class DownLoadFile
{
public String getFileNameByUrl(String url, String contentType)
{
url = url.substring(7);
if (contentType.indexOf("html") != -1)
{
url = url.replaceAll("[\\?/:*|<>\"]", "_") + ".html";
return url;
}
else
{
return url.replaceAll("[\\?/:*|<>\"]", "_") + "." + contentType.substring(contentType.lastIndexOf("/") + 1);
}
}
public void saveToLocal(byte[] data, String filePath)
{
try
{
DataOutputStream out = new DataOutputStream(new FileOutputStream(new File(filePath)));
for (int i = 0; i < data.length; i++)
out.write(data[i]);
out.flush();
out.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public String downloadFile(String url)
throws ClientProtocolException, IOException
{
String filePath = null;
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
HttpResponse rsp = httpClient.execute(get);
if (rsp.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
{
System.err.println("Method failed: " + rsp.getStatusLine());
filePath = null;
}
Header[] header = rsp.getHeaders("Content-Type");
filePath = "D:\\" + getFileNameByUrl(url, header[0].getValue());
saveToLocal(rsp.toString().getBytes(), filePath);
return filePath;
}
public static void main(String[] args)
throws ClientProtocolException, IOException
{
DownLoadFile downLoadFile = new DownLoadFile();
String temp = downLoadFile.downloadFile("http://www.huawei.com/cn/");
System.out.println(temp);
}
}
httpClient download file(爬虫)的更多相关文章
- HttpClient的使用-爬虫学习1
HttpClient的使用-爬虫学习(一) Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文 ...
- Csharp:WebClient and WebRequest use http download file
//Csharp:WebClient and WebRequest use http download file //20140318 塗聚文收錄 string filePath = "20 ...
- 基于HttpClient实现网络爬虫~以百度新闻为例
转载请注明出处:http://blog.csdn.net/xiaojimanman/article/details/40891791 基于HttpClient4.5实现网络爬虫请訪问这里:http:/ ...
- [Powershell] FTP Download File
# Config $today = Get-Date -UFormat "%Y%m%d" $LogFilePath = "d:\ftpLog_$today.txt&quo ...
- Angular HttpClient upload file with FormData
从sof上找到一个example:https://stackoverflow.com/questions/46206643/asp-net-core-2-0-and-angular-4-3-file- ...
- FTP Download File By Some Order List
@Echo Off REM -- Define File Filter, i.e. files with extension .RBSet FindStrArgs=/E /C:".asp&q ...
- httpclient upload file
用httpclient upload上传文件时,代码如下: HttpPost httpPost = new HttpPost(uploadImg); httpPost.addHeader(" ...
- Download file using libcurl in C/C++
http://stackoverflow.com/questions/1636333/download-file-using-libcurl-in-c-c #include <stdio.h&g ...
- HttpClient的使用-爬虫学习(一)
Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文,引进这个类库,我们对于http的操作会变得 ...
随机推荐
- python运维开发(十五)----JavaScript
内容目录: HTML补充 javascript HTML补充 1.display标签 display的inline-block 属性会自动带3px的宽度 <span style="di ...
- twisted 使用
工欲善其事,必先利其器,我们先来进行 twisted 框架的安装,由于平时使用的都是 Windows 系统,那么下面我们就讲解下 Windows 下 twisted 框架的安装(1)下载 twiste ...
- Train Problem I
问题陈述: 杭州电子科技大学HANGZHOU DIANZI UNIVERSITY Online Judge Problem - 1022 问题解析: 栈(stack)的简单应用 代码详解: #incl ...
- ActionResult 之HttpGet HttpPost
GET一般用于获取和查询数据. 当浏览器发送HTTP GET 请求的时候,会找到使用HttpGet限定的对应Action. POST 一般用于更新数据. 当Action上面没有限定的时候,浏览器发送任 ...
- WeUI
从WeUI学习到的知识点 WeUI是微信Web服务开发的UI套件, 目前包含12个模块 (Button, Cell, Toast, Dialog, Progress, Msg, Article, ...
- css案例学习之float浮动
代码: <!DOCTYPE html PUBliC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3. ...
- web前端开发工程师
web前端开发工程师 百科名片 Web前端开发工程师是一个很新的职业,在国内乃至国际上真正开始受到重视的时间不超过5年.Web前端开发是从网页制作演变而来的,名称上有很明显的时代特征.在互联网的演化进 ...
- uva12096 The SetStack Computer By sixleaves
代码 typedef map<Set, vector<Set> Setcache; stack< ci ...
- UVA 400 Unix ls by sixleaves
题目其实很简单,答题意思就是从管道读取一组文件名,并且按照字典序排列,但是输入的时候按列先输出,再输出行.而且每一行最多60个字符.而每个文件名所占的宽度为最大文件名的长度加2,除了输出在最右边的文件 ...
- 深入理解JavaScript的闭包特性 如何给循环中的对象添加事件(转载)
原文参考:http://blog.csdn.net/gaoshanwudi/article/details/7355794 初学者经常碰到的,即获取HTML元素集合,循环给元素添加事件.在事件响应函数 ...