Golang高并发抓取HTML图片

使用准备

1.安装Golang

2.下载爬虫包

go get -v github.com/hunterhug/marmot/util
go get -v github.com/hunterhug/marmot/tool

程序

该程序只能抓取HTML中src="http"中的图片, 必须带有协议头http(s), 其他如data-src和混淆在JS中的无法抓取

See: https://github.com/hunterhug/marmot/blob/master/example/practice/pictures/main.go

package main

import (
"fmt"
"github.com/hunterhug/marmot/util"
"github.com/hunterhug/marmot/tool"
) // Num of miner, We can run it at the same time to crawl data fast
var MinerNum = 5 // You can update this decide whether to proxy
var ProxyAddress interface{} func main() {
// You can Proxy!
// ProxyAddress = "socks5://127.0.0.1:1080" fmt.Println(`Welcome: Input "url" and picture keep "dir"`)
fmt.Println("---------------------------------------------")
url := util.Input(`URL(Like: "http://publicdomainarchive.com")`, "http://publicdomainarchive.com")
dir := util.Input(`DIR(Default: "./picture")`, "./picture")
fmt.Printf("You will keep %s picture in dir %s\n", url, dir)
fmt.Println("---------------------------------------------") // Start Catch
err := tool.DownloadHTMLPictures(url, dir, MinerNum, ProxyAddress)
if err != nil {
fmt.Println("Error:" + err.Error())
}
}

解释均写, 运行后:

Welcome: Input "url" and picture keep "dir"

---------------------------------------------
URL(Like: "http://publicdomainarchive.com") DIR(Default: "./picture") You will keep http://publicdomainarchive.com picture in dir ./picture
---------------------------------------------
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_modern.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_google_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-003-667x1000-192684_667x675.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_powered-by-wp-engine.png
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_divi.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-002-1000x667.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_public-domain-mark.png
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos008-1000x625.jpg
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_09_03_Weekly.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-054-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_10_03_instagram_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_vintage.jpg
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-001-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-070-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_twitter02_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos001-1000x750-167066_1000x675.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-035-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_facebook_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-060-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-013-1000x667.jpg
---------------------------------------------
URL(Like: "http://publicdomainarchive.com") SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_powered-by-wp-engine.png
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_divi.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-002-1000x667.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_public-domain-mark.png
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos008-1000x625.jpg
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_09_03_Weekly.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-054-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_10_03_instagram_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_vintage.jpg
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-001-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-070-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_twitter02_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos001-1000x750-167066_1000x675.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-035-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_facebook_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-060-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-013-1000x667.jpg
---------------------------------------------
URL(Like: "http://publicdomainarchive.com")

Golang高并发抓取HTML图片的更多相关文章

  1. android高仿抖音、点餐界面、天气项目、自定义view指示、爬取美女图片等源码

    Android精选源码 一个爬取美女图片的app Android高仿抖音 android一个可以上拉下滑的Ui效果 android用shape方式实现样式源码 一款Android上的新浪微博第三方轻量 ...

  2. Scrapy爬取美女图片第三集 代理ip(上) (原创)

    首先说一声,让大家久等了.本来打算那天进行更新的,可是一细想,也只有我这样的单身狗还在做科研,大家可能没心思看更新的文章,所以就拖到了今天.不过忙了521,522这一天半,我把数据库也添加进来了,修复 ...

  3. Scrapy爬取美女图片续集 (原创)

    上一篇咱们讲解了Scrapy的工作机制和如何使用Scrapy爬取美女图片,而今天接着讲解Scrapy爬取美女图片,不过采取了不同的方式和代码实现,对Scrapy的功能进行更深入的运用.(我的新书< ...

  4. Python爬虫学习(6): 爬取MM图片

    为了有趣我们今天就主要去爬取以下MM的图片,并将其按名保存在本地.要爬取的网站为: 大秀台模特网 1. 分析网站 进入官网后我们发现有很多分类: 而我们要爬取的模特中的女模内容,点进入之后其网址为:h ...

  5. Scrapy爬取美女图片 (原创)

    有半个月没有更新了,最近确实有点忙.先是华为的比赛,接着实验室又有项目,然后又学习了一些新的知识,所以没有更新文章.为了表达我的歉意,我给大家来一波福利... 今天咱们说的是爬虫框架.之前我使用pyt ...

  6. 百度图片爬虫-python版-如何爬取百度图片?

    上一篇我写了如何爬取百度网盘的爬虫,在这里还是重温一下,把链接附上: http://www.cnblogs.com/huangxie/p/5473273.html 这一篇我想写写如何爬取百度图片的爬虫 ...

  7. php远程抓取网站图片并保存

    以前看到网上别人说写程序抓取网页图片的,感觉挺神奇,心想什么时候我自己也写一个抓取图片的方法! 刚好这两天没什么事,就参考了网上一个php抓取图片代码,重点借鉴了 匹配img标签和其src属性正则的写 ...

  8. 百度UEditor编辑器关闭抓取远程图片功能(默认开启)

    这个坑娘的功能,开始时居然不知道如何触发,以为有个按钮,点击一下触发,翻阅了文档,没有发现,然后再网络上看到原来是复制粘贴非白名单内的图片到编辑框时触发,坑娘啊............... 问题又来 ...

  9. Scrapy-多层爬取天堂图片网

    1.根据图片分类对爬取的图片进行分类 开发者选项 --> 找到分类地址         爬取每个分类的地址通过回调函数传入下一层 name = 'sky'start_urls = ['http: ...

随机推荐

  1. 【Java/Json】Java对Json进行建模,分词,递归向下解析构建Json对象树

    伸手党的福音 代码下载:https://files.cnblogs.com/files/xiandedanteng/JsonLexerBuilder20191202.rar 互联网上成型的对Json进 ...

  2. 回声消除(AEC)原理

    一.前言 因为工作的关系,笔者从2004年开始接触回声消除(Echo Cancellation)技术,而后一直在某大型通讯企业从事与回声消除技术相关的工作,对回声消除这个看似神秘.高端和难以理解的技术 ...

  3. SpringBoot配置加载顺序

    一般我们会将SpringBoot应用需要的配置内容放在项目工程中,然后通过spring.profiles.active或是通过Maven来实现多环境的支持.但是,当团队逐渐壮大,分工越来越细之后,往往 ...

  4. Python记录-基础语法入门

    # -*- coding: utf-8 -*- #数字计算 a=1 b=2 print(a+b) print(a*b) print((a+b)/1) #浮点数 print((a+b)//2) ##保留 ...

  5. Error:java: Compilation failed: internal java compiler

    errorInformation:java: javacTask: 源发行版 1.8 需要目标发行版 1.8 解决此类问题 检查自己的JDK是否一致 1  setting--------built  ...

  6. mybatis 存储过程与游标的使用

    MyBatis还能对存储过程进行完全支持,这节开始学习存储过程.在讲解之前,我们需要对存储过程有一个基本的认识,首先存储过程是数据库的一个概念,它是数据库预先编译好,放在数据库内存中的一个程序片段,所 ...

  7. ECMAScript 6复习<一>

    1.let和const命令: let不存在变量提升 暂时性死区 let在相同作用域内不允许重复声明 2.块级作用域: 3.全局对象的属性: ; window.a let b = ; window.b ...

  8. LODOP中带caption的表格被关联并次页偏移测试

    ADD_PRINT_TABLE中的thead和tfoot可以每页输出,后面的打印项关联表格,可以紧跟着表格,实现在表格后面紧跟着输出内容的效果,表格可以自动分页,并总是跟在表格后面 ,在表格最后输出. ...

  9. 【Leetcode_easy】953. Verifying an Alien Dictionary

    problem 953. Verifying an Alien Dictionary solution: class Solution { public: bool isAlienSorted(vec ...

  10. linux如何找回已经删除的文件?lsof

    简介 lsof(list open files)是一个列出当前系统打开文件的工具.在linux环境下,任何事物都以文件的形式存在,通过文件不仅仅可以访问常规数据,还可以访问网络连接和硬件.所以如传输控 ...