httpClient download file(爬虫)
package com.opensource.httpclient.bfs;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.http.Header;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
public class DownLoadFile
{
public String getFileNameByUrl(String url, String contentType)
{
url = url.substring(7);
if (contentType.indexOf("html") != -1)
{
url = url.replaceAll("[\\?/:*|<>\"]", "_") + ".html";
return url;
}
else
{
return url.replaceAll("[\\?/:*|<>\"]", "_") + "." + contentType.substring(contentType.lastIndexOf("/") + 1);
}
}
public void saveToLocal(byte[] data, String filePath)
{
try
{
DataOutputStream out = new DataOutputStream(new FileOutputStream(new File(filePath)));
for (int i = 0; i < data.length; i++)
out.write(data[i]);
out.flush();
out.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public String downloadFile(String url)
throws ClientProtocolException, IOException
{
String filePath = null;
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
HttpResponse rsp = httpClient.execute(get);
if (rsp.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
{
System.err.println("Method failed: " + rsp.getStatusLine());
filePath = null;
}
Header[] header = rsp.getHeaders("Content-Type");
filePath = "D:\\" + getFileNameByUrl(url, header[0].getValue());
saveToLocal(rsp.toString().getBytes(), filePath);
return filePath;
}
public static void main(String[] args)
throws ClientProtocolException, IOException
{
DownLoadFile downLoadFile = new DownLoadFile();
String temp = downLoadFile.downloadFile("http://www.huawei.com/cn/");
System.out.println(temp);
}
}
httpClient download file(爬虫)的更多相关文章
- HttpClient的使用-爬虫学习1
HttpClient的使用-爬虫学习(一) Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文 ...
- Csharp:WebClient and WebRequest use http download file
//Csharp:WebClient and WebRequest use http download file //20140318 塗聚文收錄 string filePath = "20 ...
- 基于HttpClient实现网络爬虫~以百度新闻为例
转载请注明出处:http://blog.csdn.net/xiaojimanman/article/details/40891791 基于HttpClient4.5实现网络爬虫请訪问这里:http:/ ...
- [Powershell] FTP Download File
# Config $today = Get-Date -UFormat "%Y%m%d" $LogFilePath = "d:\ftpLog_$today.txt&quo ...
- Angular HttpClient upload file with FormData
从sof上找到一个example:https://stackoverflow.com/questions/46206643/asp-net-core-2-0-and-angular-4-3-file- ...
- FTP Download File By Some Order List
@Echo Off REM -- Define File Filter, i.e. files with extension .RBSet FindStrArgs=/E /C:".asp&q ...
- httpclient upload file
用httpclient upload上传文件时,代码如下: HttpPost httpPost = new HttpPost(uploadImg); httpPost.addHeader(" ...
- Download file using libcurl in C/C++
http://stackoverflow.com/questions/1636333/download-file-using-libcurl-in-c-c #include <stdio.h&g ...
- HttpClient的使用-爬虫学习(一)
Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文,引进这个类库,我们对于http的操作会变得 ...
随机推荐
- shopnc 导出Excel数据问题实例 && ajax 获取当前值并传递
任务:从商家中心导出数据,各个商品所属情况. 商品导出到Excel文件功能 /导出exel 功能make-in-lemon public function createExcelOp(){ $mode ...
- Laravel Packages 开发
Packages是向Laravel中添加功能最重要的途径.composer.json中require的都是包.关于包的详细说明请查看 API . 下面一起创建一个简单的Package : 1. 环境配 ...
- Light Bulb--zoj3203(三分法)
Light Bulb Time Limit: 1 Second Memory Limit: 32768 KB Compared to wildleopard's wealthiness, h ...
- Azure上Linux VM DDOS攻击预防: 慢速攻击
在上篇博客(http://www.cnblogs.com/cloudapps/p/4996046.html)中,介绍了如何使用Apache的模块mod_evasive进行反DDOS攻击的设置,在这种模 ...
- android---EditText黄色边框
http://liuzhichao.com/p/612.html 自定义android控件EditText边框背景 柳志超博客 » Program » Andriod » 自定义android控件Ed ...
- StrPCopy与StrPas功能正好相反,作用是与C语言字符串和Delphi的String相互转化
StrPCopy = Copies an AnsiString (long string) to a null-terminated string.function StrPCopy(Dest: PA ...
- UML_行为图
活动图是UML用于对系统的动态行为建模的另一种常用工具,它描述活动的顺序,展现从一个活动到另一个活动的控制流.活动图在本质上是一种流程图.活动图着重表现从一个活动到另一个活动的控制流,是内部处理驱动的 ...
- POJ 1811 Prime Test 素性测试 分解素因子
题意: 给你一个数n(n <= 2^54),判断n是不是素数,如果是输出Prime,否则输出n最小的素因子 解题思路: 自然数素性测试可以看看Matrix67的 素数与素性测试 素因子分解利用 ...
- Redis 实践笔记
本文来自:http://www.cnblogs.com/me-sa/archive/2012/03/13/redis-in-action.html 最近在项目中实践了一下Redis,过程中遇到并解决了 ...
- jquery如何获得页面元素的坐标值
http://www.cnblogs.com/pansly/archive/2011/05/25/2056222.html jquery如何获得页面元素的坐标值 yulutxt是输入经典语录的输入 ...