图片抓取器web + winform

原文发布时间为：2009-11-21 —— 来源于本人的百度文章 [由搬家工具导入]

请先学习：http://hi.baidu.com/handboy/blog/item/bfef61000a67ea16738b6565.html

string x = "Live for nothing,die for something";
Regex r = new Regex(@"^Live for no(?<g1>[a-z]{5}),die for some\1$");
if (r.IsMatch(x))
{
Console.WriteLine("group1 value:" + r.Match(x).Groups["g1"].Value);//输出：thing
}
//可根据组名进行索引。使用以下格式为标识一个组的名称(?<groupname>…)。

string x = "Live for nothing nothing";
Regex r = new Regex(@"([a-z]+) \1");
if (r.IsMatch(x))
{
x = r.Replace(x, "$1");
Console.WriteLine("var x:" + x);//输出：Live for nothing
}
//删除原字符串中重复出现的“nothing”。在表达式之外，使用“$1”来引用第一个组，下面则是通过

组名来引用：
string x = "Live for nothing nothing";
Regex r = new Regex(@"(?<g1>[a-z]+) \1");
if (r.IsMatch(x))
{
x = r.Replace(x, "${g1}");
Console.WriteLine("var x:" + x);//输出：Live for nothing
}

string x = "Live for nothing";
Regex r = new Regex(@"^Live for no(?:[a-z]{5})$");
if (r.IsMatch(x))
{
Console.WriteLine("group1 value:" + r.Match(x).Groups[1].Value);//输出：(空)
}
//在组前加上“?:”表示这是个“非捕获组”，即引擎将不保存该组的内容。

========

最近闲来无事，重温了一下正则表达式，然后做了这个图片抓取器。
原则就是根据分析新浪博文的共同特征，把图片抓取到本地下来，自动下载下来。这个原理就是用正则表达式去匹配，如果有一天新浪博文网页格式变化了，可能这个就用不了了，但是可以进行修改去满足。这只是一个范例，O(∩_∩)O哈！
winform下载预览：http://www.xmaspx.com/Services/FileAttachment.ashx?AttachmentID=51
首先：
在根目录下，建一个名为 DownLoadImages 的文件夹

前台：
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="SinaImage.aspx.cs" Inherits="SinaImage" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head runat="server">
<title>无标题页</title>
</head>
<body>
<form id="form1" runat="server">
<div>
<asp:TextBox ID="TextBox1" runat="server" Width="495px">http://blog.sina.com.cn/s/articlelist_1270540911_0_1.html</asp:TextBox>
<asp:Button ID="Button1" runat="server" OnClick="Button1_Click" Text="Button" OnClientClick="javascript:alert('开始下载，可能要等几分钟，请勿关闭')" /><br />
<asp:TextBox ID="TextBox2" runat="server" Height="296px" TextMode="MultiLine" Width="498px"></asp:TextBox></div>
</form>
</body>
</html>

后台：

using System;
using System.Web;
using System.Web.UI.WebControls;
using System.Net;
using System.IO;
using System.Text;
using System.Collections;
using System.Text.RegularExpressions;

public partial class SinaImage : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{

}
protected void Button1_Click(object sender, EventArgs e)
{
int num = 0;
TextBox2.Text = "";
string p = @"http://blog.sina.com.cn/s/blog_([\w])*.html";
string p2 = @"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?";

ArrayList arrUrl = GetUrl(this.TextBox1.Text, p);

for (int i = 0; i < arrUrl.Count; i++)
{

string imgPage = arrUrl[i].ToString();
ArrayList arrImgUrl = GetUrl(imgPage, p2);

for (int j = 0; j < arrImgUrl.Count; j++)
{
string imgUrl = arrImgUrl[j].ToString();
if (!imgUrl.Contains("simg") && !imgUrl.Contains("sinaimg") && !imgUrl.Contains(".js"))
{
if (imgUrl.Contains("photo") || imgUrl.Contains("image") || imgUrl.Contains("img"))
{
TextBox2.Text += imgUrl + "\n";
try
{
DownLoadImage(imgUrl, j.ToString());
num++;
}
catch
{
}
}
}
}

}
ClientScript.RegisterStartupScript(this.GetType(), "alert", "alert('下载了" + num.ToString() + "张，请打开文件夹DownLoadImages，以缩略图形式进行筛选')", true);
}

protected void DownLoadImage(string fromUrl, string fileName)
{
string savePath = Server.MapPath("DownLoadImages/") + DateTime.Now.ToString("yyyyMMddhhmmss") + fileName + ".jpg";
WebClient myWebClient = new WebClient();
myWebClient.DownloadFile(fromUrl, savePath);
}

protected ArrayList GetUrl(string web_url, string p)
{
string all_code = string.Empty;
ArrayList arrUrl = new ArrayList();
HttpWebRequest all_codeRequest = (HttpWebRequest)WebRequest.Create(web_url);
WebResponse all_codeResponse = all_codeRequest.GetResponse();
StreamReader the_Reader = new StreamReader(all_codeResponse.GetResponseStream(), Encoding.GetEncoding("GB2312"));
all_code = the_Reader.ReadToEnd();
the_Reader.Close();
ArrayList my_list = new ArrayList();
Regex re = new Regex(p, RegexOptions.IgnoreCase);
MatchCollection mc = re.Matches(all_code);

for (int i = 0; i <= mc.Count - 1; i++)
{
bool _foo = false;
string name = mc[i].ToString();
foreach (string list in my_list)
{
if (name == list)
{
_foo = true;
break;
}

}//过滤

if (!_foo)
{
arrUrl.Add(name);
}
}
return arrUrl;
}
}

图片抓取器web + winform的更多相关文章

[转]使用Scrapy建立一个网站抓取器
英文原文:Build a Website Crawler based upon Scrapy 标签: Scrapy Python 209人收藏此文章, 我要收藏renwofei423 推荐于 11个月 ...
[python应用]python简单图片抓取
前言 emmmm python简单图片抓取 1 import requests 2 import threading 3 import queue 4 from subprocess import P ...
php远程图片抓取存放到本地路径并生成缩略图
private function _getcontent($content) { $img_dir='../Public/Img/Ycimg'; //远程图片抓取存放 ...
联系我们_鲲鹏Web数据抓取 - 专业Web数据采集服务提供者
联系我们_鲲鹏Web数据抓取 - 专业Web数据采集服务提供者首页 > 联系我们我们的联系方式如下: 029 - 82542052(陕西西安) 13389148466 或 13571845 ...
arpspoof+driftnet+ ARP欺骗简单图片抓取
arpspoof+driftnet+ ARP欺骗简单图片抓取 driftnet是一款简单而使用的图片捕获工具,可以很方便的在网络数据包中抓取图片.该工具可以实时和离线捕获指定数据包中是图片环境受害 ...
Python爬虫入门教程 25-100 知乎文章图片爬取器之一
1. 知乎文章图片写在前面今天开始尝试爬取一下知乎,看一下这个网站都有什么好玩的内容可以爬取到,可能断断续续会写几篇文章,今天首先爬取最简单的,单一文章的所有回答,爬取这个没有什么难度. 找到我们要 ...
Python selenium自动化网页抓取器
(开开心心每一天~ ---虫瘾师) 直接入正题---Python selenium自动控制浏览器对网页的数据进行抓取,其中包含按钮点击.跳转页面.搜索框的输入.页面的价值数据存储.mongodb自动i ...
简易数据分析 09 | Web Scraper 自动控制抓取数量 & Web Scraper 父子选择器
这是简易数据分析系列的第 9 篇文章. 今天我们说说 Web Scraper 的一些小功能:自动控制 Web Scraper 抓取数量和 Web Scraper 的父子选择器. 如何只抓取前 100 ...
C#实现通过程序自动抓取远程Web网页信息的代码
http://www.jb51.net/article/9499.htm 通过程序自动的读取其它网站网页显示的信息,类似于爬虫程序.比方说我们有一个系统,要提取BaiDu网站上歌曲搜索排名.分析系统在 ...

随机推荐

2018.10.30 NOIp模拟赛 T1 改造二叉树
[题目描述] 小Y在学树论时看到了有关二叉树的介绍:在计算机科学中,二叉树是每个结点最多有两个子结点的有序树.通常子结点被称作“左孩子”和“右孩子”.二叉树被用作二叉搜索树和二叉堆.随后他又和他人讨论 ...
认识mysql(3)
认识mysql第三篇,发出的内容适合初学者,如果能持续关注我的博客,可以全面的掌握mysql的常用知识,后续我也会陆续发出python相关的知识,关注我,和我一共进步吧! 1.SQL查询 1.执行顺序 ...
django+xadmin在线教育平台（二）
老话总是没错的,工欲善其事,必先利其器教你安装pycharm,mysql,navicat,python相关环境. windows下搭建开发环境 2-1 pycharm,mysql,Navicat安装 ...
mybatis中实现动态SQL
动态SQL语句,也就意味着SQL语句不在是一成不变的而是具有多样性. if if的用法还是跟平常差不多的(不过没有else if也没有else) <update id="modify& ...
05.VUE学习之表达式
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http ...
2 > 1 and 3 < 4 or 4 > 5 and 2 < 1
a,b,c,d,e=1,2,3,4,5 m = b >a and c < d n = d > e and b < a y = m or n info = ''' m is %s ...
IQueryable与IEnumerable区别
前者可以延迟加载,即执行完后不马上执行数据库语句,用到再加载.
HDU 5657 CA Loves Math 状压DP + 枚举
题意: 给出$A(2 \leq A \leq 11), n(0 \leq n \leq 10^9), k(1 \leq k \leq 10^9)$. 求区间$[1, A^n]$中各个数字互不相 ...
ElasticSearch学习笔记（一）-- 查询索引分词
# 查看所有索引 GET _cat/indices # 创建一个索引 PUT /test_index # 插入一条数据(指定id)PUT /test_index/doc/ { "userna ...
luogu4169 [Violet]天使玩偶/SJY摆棋子 / bzoj2648 SJY摆棋子 k-d tree
k-d tree + 重构的思想,就能卡过luogu和bzoj啦orz #include <algorithm> #include <iostream> #include &l ...

图片抓取器web + winform

图片抓取器web + winform的更多相关文章

随机推荐

热门专题