httpClient download file(爬虫)
package com.opensource.httpclient.bfs;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.http.Header;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
public class DownLoadFile
{
public String getFileNameByUrl(String url, String contentType)
{
url = url.substring(7);
if (contentType.indexOf("html") != -1)
{
url = url.replaceAll("[\\?/:*|<>\"]", "_") + ".html";
return url;
}
else
{
return url.replaceAll("[\\?/:*|<>\"]", "_") + "." + contentType.substring(contentType.lastIndexOf("/") + 1);
}
}
public void saveToLocal(byte[] data, String filePath)
{
try
{
DataOutputStream out = new DataOutputStream(new FileOutputStream(new File(filePath)));
for (int i = 0; i < data.length; i++)
out.write(data[i]);
out.flush();
out.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public String downloadFile(String url)
throws ClientProtocolException, IOException
{
String filePath = null;
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
HttpResponse rsp = httpClient.execute(get);
if (rsp.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
{
System.err.println("Method failed: " + rsp.getStatusLine());
filePath = null;
}
Header[] header = rsp.getHeaders("Content-Type");
filePath = "D:\\" + getFileNameByUrl(url, header[0].getValue());
saveToLocal(rsp.toString().getBytes(), filePath);
return filePath;
}
public static void main(String[] args)
throws ClientProtocolException, IOException
{
DownLoadFile downLoadFile = new DownLoadFile();
String temp = downLoadFile.downloadFile("http://www.huawei.com/cn/");
System.out.println(temp);
}
}
httpClient download file(爬虫)的更多相关文章
- HttpClient的使用-爬虫学习1
HttpClient的使用-爬虫学习(一) Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文 ...
- Csharp:WebClient and WebRequest use http download file
//Csharp:WebClient and WebRequest use http download file //20140318 塗聚文收錄 string filePath = "20 ...
- 基于HttpClient实现网络爬虫~以百度新闻为例
转载请注明出处:http://blog.csdn.net/xiaojimanman/article/details/40891791 基于HttpClient4.5实现网络爬虫请訪问这里:http:/ ...
- [Powershell] FTP Download File
# Config $today = Get-Date -UFormat "%Y%m%d" $LogFilePath = "d:\ftpLog_$today.txt&quo ...
- Angular HttpClient upload file with FormData
从sof上找到一个example:https://stackoverflow.com/questions/46206643/asp-net-core-2-0-and-angular-4-3-file- ...
- FTP Download File By Some Order List
@Echo Off REM -- Define File Filter, i.e. files with extension .RBSet FindStrArgs=/E /C:".asp&q ...
- httpclient upload file
用httpclient upload上传文件时,代码如下: HttpPost httpPost = new HttpPost(uploadImg); httpPost.addHeader(" ...
- Download file using libcurl in C/C++
http://stackoverflow.com/questions/1636333/download-file-using-libcurl-in-c-c #include <stdio.h&g ...
- HttpClient的使用-爬虫学习(一)
Apache真是伟大,为我们提供了HttpClient.jar,这个HttpClient是客户端的http通信实现库,这个类库的作用是接受和发送http报文,引进这个类库,我们对于http的操作会变得 ...
随机推荐
- CSS发抖
纯CSS发抖 当您在 @keyframes 中创建动画时,请把它捆绑到某个选择器,否则不会产生动画效果. 通过规定至少以下两项 CSS3 动画属性,即可将动画绑定到选择器: 规定动画的名称 规定动画 ...
- Font Awesome 4.0.3 字体图标完美兼容IE7
1.下载Font Awesome 4.0.3兼容包,http://www.thinkcmf.com/index.php?m=font 2.解压,并放到自己网站系统合适的位置(如果你的站已使用Font ...
- python 的内建函数
lambda 函数:lambda语句中,冒号前是参数,可以有多个,用逗号隔开,冒号右边的返回值 1. map/reduce 函数 (1)map()函数接收两个参数,一个是函数,一个是序列,map将传入 ...
- Windows 10上快速尝鲜bash on Ubuntu
今年微软Build 2016大会最让开发人员兴奋的消息之一,就是在Windows上可以原生运行Linux bash,对于非开发人员来讲,可能不知道这意味着什么,而对于开发人员来说,意味着Windows ...
- C++14介绍
C++14标准是 ISO/IEC 14882:2014 Information technology -- Programming languages -- C++ 的简称[1] .在标准正式通过之 ...
- hdu 2222 Keywords_ac自动机模板
题意:给你n个单词,再给你一串字符,求在字符中有多少个单词出现过 #include <iostream> #include<cstdio> #include<cstrin ...
- ROS的tf_tree相关
1.相关问答 http://answers.ros.org/question/11682/robot_pose_ekf-with-an-external-sensor/ http://ros-user ...
- DSP TMS320C6000基础学习(1)——介绍
主要内容 1. Why process signals digitally? (1)模拟电路由模拟组件构成:电阻.电容及电感等,这些组件随着电压.温度或机械结构的改变会动态影响到模拟电路的效果: (2 ...
- Android二维码开源项目zxing编译
ZXing是一个开放源代码的,用Java实现的多种格式的1D/2D条码图像处理库,它包括了联系到其它语言的port.Zxing能够实现使用手机的内置的摄像头完毕条形码的扫描及解码.该项目可实现的条形码 ...
- git clone 命令报错 +diffie-hellman-group1-sha1
解决方法: 在.ssh目录下新建文件config , 添加 Host * KexAlgorithms +diffie-hellman-group1-sha1 到文件config,即可.