C#写爬虫，版本V2.1

　　这次是对2.0的小修补，2.0交互几乎没有，这次添加了进度条，和文本框，同时由于取得的链接主要会出现错误是：webResponse错误。

针对这种情况，设置了

 try

                {

                    webResponse = (HttpWebResponse)webRequest.GetResponse();

                }

                catch(WebException ex)

                {

                    webResponse = (HttpWebResponse)ex.Response;

                }

截取错误信息，这里我们不处理，后续直接判定statecode属性来决定是否还要执行下面的程序。

另外一点变化就是以前是通过将所获取的网页存到文本中去，这次

WebRequest myRequest = WebRequest.Create("http://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1466307565574_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=" + Uri.EscapeDataString(keyWord));

            HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();

            if (myResponse.StatusCode == HttpStatusCode.OK)

            {

                Stream strm = myResponse.GetResponseStream();

                StreamReader sr = new StreamReader(strm);

                string line = sr.ReadToEnd();

将它全放入了string中。

最后一点是去掉了DownloadPage这个方法，如上，它的功能可以放入按钮的单击事件中实现，没有必要把一件事做两遍。

下面是前台页面：

后台代码：

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.IO;

using System.Linq;

using System.Net;

using System.Text;

using System.Text.RegularExpressions;

using System.Threading.Tasks;

using System.Windows.Forms;

namespace 百度图片爬虫V2._1

{

    public partial class Form1 : Form

    {

        public delegate void AsynFunction(string s,int i);

        public Form1()

        {

            InitializeComponent();

        }

        private static string[] getLinks(string html, out int counts)

        {

            const string pattern = @"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?";

            Regex r = new Regex(pattern, RegexOptions.IgnoreCase); //新建正则模式

            MatchCollection m = r.Matches(html); //获得匹配结果

            string[] links = new string[m.Count];

            int count = ;

            for (int i = ; i < m.Count; i++)

            {

                if (isValiable(m[i].ToString()))

                {

                    links[count] = m[i].ToString(); //提取出结果

                    count++;

                }

            }

            counts = count;

            return links;

        }

        private void button1_Click(object sender, EventArgs e)

        {

            string keyWord = this.textBox1.Text;

            WebRequest myRequest = WebRequest.Create("http://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1466307565574_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=" + Uri.EscapeDataString(keyWord));

            HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();

            if (myResponse.StatusCode == HttpStatusCode.OK)

            {

                Stream strm = myResponse.GetResponseStream();

                StreamReader sr = new StreamReader(strm);

                string line = sr.ReadToEnd();

                int counts = ;

                string[] str = getLinks(line, out counts);

                this.progressBar1.Maximum = counts;

                for (int i = ; i < counts; i++)

                {

                    AsynFunction fun = new AsynFunction(savePicture);

                    fun.BeginInvoke(str[i],i, ar => {

                        fun.EndInvoke(ar);

                        this.progressBar1.BeginInvoke(new Action(() =>

                        {

                            this.progressBar1.Value =progressBar1.Maximum;

                        }));

                        this.textBox2.BeginInvoke(new Action(() =>

                        {

                            StringBuilder sb=new StringBuilder();

                            sb.Append(Environment.NewLine);

                          //  sb.Append(str[i].ToString());

                            sb.Append("下载结束");

                            this.textBox2.Text += sb.ToString();

                        }));

                    }, fun);

                }

            }

        }

        private static bool isValiable(string url)

        {

            if (url.Contains(".jpg") || url.Contains(".gif") || url.Contains(".png"))

            {

                return true; //得到一些图片之类的资源

            }

            return false;

        }

        public void savePicture(string path,int i)

        {

            if (path != "" && path != null)

            {

                DataClasses1DataContext db = new DataClasses1DataContext();

                Uri url = new Uri(path);

                HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);

                webRequest.Referer = "http://image.baidu.com";

                webRequest.Timeout = ;

                //设置连接超时时间

                webRequest.AllowAutoRedirect = true;

                webRequest.Headers.Set("Pragma", "no-cache");

                webRequest.UserAgent = "Mozilla-Firefox-Spider(Wenanry)";

                HttpWebResponse webResponse;

                try

                {

                    webResponse = (HttpWebResponse)webRequest.GetResponse();

                }

                catch(WebException ex)

                {

                    webResponse = (HttpWebResponse)ex.Response;

                }

                if(webResponse!=null&&webResponse.StatusCode==HttpStatusCode.OK)

                {

                    if (isValiable(path))//判断如果是图片，就将其存储到数据库中。

                    {

                        Bitmap myImage = new Bitmap(webResponse.GetResponseStream());

                        MemoryStream ms = new MemoryStream();

                        myImage.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);

                        var p = new pictureUrl

                        {

                            pictureUrl1 = ms.ToArray()

                        };

                        db.pictureUrl.InsertOnSubmit(p);

                        db.SubmitChanges();

                        this.progressBar1.BeginInvoke(new Action(() =>

                        {

                            this.progressBar1.Value = i;

                        }));

                        this.textBox2.BeginInvoke(new Action(() =>

                        {

                            StringBuilder sb1 = new StringBuilder();

                            sb1.Append(path);

                            sb1.Append("图片下载开始" + Environment.NewLine);

                            this.textBox2.Text += sb1.ToString();

                        }));

                    }

                }

            }

        }

        private void button2_Click(object sender, EventArgs e)

        {

            this.Close();

        }

    }

}

C#写爬虫，版本V2.1的更多相关文章

[Python]新手写爬虫全过程（已完成）
今天早上起来,第一件事情就是理一理今天该做的事情,瞬间get到任务,写一个只用python字符串内建函数的爬虫,定义为v1.0,开发中的版本号定义为v0.x.数据存放?这个是一个练手的玩具,就写在tx ...
[Python]新手写爬虫全过程（转）
今天早上起来,第一件事情就是理一理今天该做的事情,瞬间get到任务,写一个只用python字符串内建函数的爬虫,定义为v1.0,开发中的版本号定义为v0.x.数据存放?这个是一个练手的玩具,就写在tx ...
ECSHOP 数据库结构说明 (适用版本v2.7.3)
ECSHOP 数据库结构说明 (适用版本v2.7.3) 1.account_log 用户账目日志表字段类型 Null/默认注释 log_id mediumint(8) 否 / 自增 ID 号 u ...
手把手教你用.NET Core写爬虫
写在前面自从上一个项目58HouseSearch从.NET迁移到.NET core之后,磕磕碰碰磨蹭了一个月才正式上线到新版本. 然后最近又开了个新坑,搞了个Dy2018Crawler用来爬dy20 ...
用go写爬虫服务并发请求，限制并发数
java写爬虫服务,思路是线程池,任务队列,限制并行线程数即可. go要用另一种设计思路,不能在线程层面限制,协程的异步请求,如果不作处理,并行发出所有网络请求,因网络请求数过多,会抛出异常低版本的 ...
让你从零开始学会写爬虫的5个教程（Python）
写爬虫总是非常吸引IT学习者,毕竟光听起来就很酷炫极客,我也知道很多人学完基础知识之后,第一个项目开发就是自己写一个爬虫玩玩. 其实懂了之后,写个爬虫脚本是很简单的,但是对于新手来说却并不是那么容易. ...
scrapy写爬虫是出现no module named win32api错误
windows下利用scrapy(python2.7)写爬虫,运行 scrapy crawl dmoz 命令时提示:exceptions.ImportError: No module named wi ...
PHP, Python, Node.js 哪个比较适合写爬虫？
PHP, Python, Node.js 哪个比较适合写爬虫? 1.对页面的解析能力2.对数据库的操作能力(mysql)3.爬取效率4.代码量推荐语言时说明所需类库或者框架,谢谢.比如:python+ ...
怎么用Python写爬虫抓取网页数据
机器学习首先面临的一个问题就是准备数据,数据的来源大概有这么几种:公司积累数据,购买,交换,政府机构及企业公开的数据,通过爬虫从网上抓取.本篇介绍怎么写一个爬虫从网上抓取公开的数据. 很多语言都可以写 ...
python写爬虫时的编码问题解决方案
在使用Python写爬虫的时候,常常会遇到各种令人抓狂的编码错误问题.下面给出一些简单的解决编码错误问题的思路,希望对大家有所帮助. 首先,打开你要爬取的网站,右击查看源码,查看它指定的编码是什么,如 ...

随机推荐

Git Shell 基本命令(官网脱水版)
用户信息当安装完 Git 应该做的第一件事就是设置你的用户名称与邮件地址. 这样做很重要,因为每一个 Git 的提交都会使用这些信息,并且它会写入到你的每一次提交中,不可更改: $ git conf ...
SQL Azure (18) 使用External Table实现垮库查询
<Windows Azure Platform 系列文章目录> 问题 1.我们在进行SQL Server开发的时候,经常会使用垮库查询.但是在默认情况下,使用Azure SQL Datab ...
Atitit 索引技术--位图索引
Atitit 索引技术--位图索引索引在数据结构上可以分为三种B树索引.位图索引和散列索引存储原理编辑位图索引对数据表的列的每一个键值分别存储为一个位图,Oracle对于不同的版本,不同的操作 ...
Tips for newbie to read source code
This post is first posted on my WeChat public account: GeekArtT Reading source code is always one bi ...
webpack的安装和使用
Webpack是什么首先可以看下官方文档 ,文档是最好的老师. Webpack是由Tobias Koppers开发的一个开源前端模块构建工具.它的基本功能是将以模块格式书写的多个JavaScrip ...
Distribution2：Distribution Writer
Distribution Writer 调用Statement Delivery 存储过程,将Publication的改变同步到Subscriber中.查看Publication Properties ...
查看Validate Subscription 的结果
Sql Server Replication Monitor 提供一个feature,能够verify Replication的 Publication 和 Subscription 的数据同步sta ...
LINQ系列：LINQ to SQL Group by/Having分组
1. 简单形式 var expr = from p in context.Products group p by p.CategoryID into g select g; foreach (var ...
jQuery 2.0.3 源码分析 Deferred概念
JavaScript编程几乎总是伴随着异步操作,传统的异步操作会在操作完成之后,使用回调函数传回结果,而回调函数中则包含了后续的工作.这也是造成异步编程困难的主要原因:我们一直习惯于“线性”地编写代码 ...
解密jQuery事件核心 - 自定义设计（三）
接上文http://www.cnblogs.com/aaronjs/p/3447483.html 本文重点:自定义事件 “通过事件机制,可以将类设计为独立的模块,通过事件对外通信,提高了程序的开发效率 ...

C#写爬虫，版本V2.1

C#写爬虫，版本V2.1的更多相关文章

随机推荐

热门专题