单线程爬虫VS多线程爬虫的效率对比
单线程爬虫:
import re
import requests
import time url_EB = 'http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG&rh=n%3A9063592011%2Ck%3Aprojector&bbn=9063592011&keywords=projector&pickerToList=brandtextbin&ie=UTF8&qid=1461902521'
headers_EB = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_AML = '''https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A3UJI9WWE6PRP5&rh=i%3Amerchant-items
&pickerToList=brandtextbin&ie=UTF8&qid=1461899728'''
headers_AML ={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_DL= 'https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=AS7ZU4MN0FPOY&rh=i%3Amerchant-items&pickerToList=brandtextbin&ie=UTF8&qid=1461901862'
headers_DL = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} name = {'a':'ExclusiveBulbs',
'b':'Amazing Lamps',
'c':'Dynamic Lamps'} # listing_count = re.findall('<span class="narrowValue">(.*?)</span',data.text)
# f = dict(map(lambda x,y:[x,y],store_name,listing_count))
#
# for k,v in f.items():
# print(k,v) def foo_one(url,headers,name):
print('--------------------------开始爬去{0}at{1}---------------------------'.format(name,time.ctime())) response = requests.get(url,headers=headers)
store_name = re.findall('<span class="refinementLink">(.*?)</span><span class="narrowValue">(.*?)</span',response.text)
for i in store_name:
print(i)
print('--------------------------爬去完毕at{}----------------------------'.format(time.ctime()))
time.sleep(1) if __name__ == '__main__':
foo_one(url_EB,headers_EB,name['a'])
foo_one(url_AML,headers_AML,name['b'])
foo_one(url_DL,headers_DL,name['c'])
输出:00:25:33开始,00:26:02结束 耗时29秒
--------------------------开始爬去ExclusiveBulbsatSat Apr 30 00:25:33 2016---------------------------
('A.Shine', ' (97)')
('AmpacElectronics', ' (1,644)')
('AuraBeam', ' (33,084)')
('AWO', ' (1,206)')
('Battery1inc', ' (694)')
('Comoze Lamps', ' (6,172)')
('Compatible Lamp', ' (317)')
('Corgi Lamps', ' (2,124)')
('CTLAMP', ' (3,499)')
('Dell', ' (191)')
('Diamond Lamps', ' (966)')
('Dynamic', ' (4)')
('Eiki', ' (460)')
('ePharos', ' (2,592)')
('Epson', ' (1,456)')
('EREPLACEMENT', ' (115)')
('eReplacements', ' (814)')
('eWo's', ' (120)')
('eWorldlamp', ' (354)')
('FI Lamps', ' (5,707)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('For Epson', ' (3)')
('Generic', ' (9,769)')
('Good Lamp', ' (819)')
('HCDZ', ' (2,746)')
('Hitachi', ' (935)')
('IET Lamps', ' (2,144)')
('InFocus', ' (44)')
('JVC', ' (326)')
('KCL', ' (3,781)')
('Lampedia', ' (618)')
('Lutema', ' (1,956)')
('Mitsubishi', ' (1,006)')
('Mogobe', ' (1,335)')
('MyProjectorLamps', ' (473)')
('NEC', ' (446)')
('Nec Computers', ' (13)')
('Optoma', ' (956)')
('Osram Sylvania', ' (78)')
('Panasonic', ' (820)')
('Philips', ' (7,502)')
('Powerwarehouse', ' (9,971)')
('Projector Lamps World', ' (112)')
('Pureglare', ' (369)')
('Samsung', ' (1,078)')
('Sharp', ' (426)')
('Shopforbattery', ' (2,510)')
('SMART BOARD', ' (66)')
('Sony', ' (990)')
('TVLampsforless', ' (14)')
('Unknown', ' (722)')
--------------------------爬去完毕atSat Apr 30 00:25:57 2016----------------------------
--------------------------开始爬去Amazing LampsatSat Apr 30 00:25:58 2016---------------------------
('AWO', ' (1)')
('Comoze Lamps', ' (2)')
('DNGO', ' (8)')
('Electrified', ' (9)')
('ELECTRIFIED', ' (10)')
('Electrified Discounters', ' (5)')
('ELECTRIFIED LAMPS', ' (1,177)')
('ELECTRIFIED PRINTHEAD', ' (24)')
('ELECTRIFIED PRINTHEADS', ' (2)')
('FI Lamps', ' (2)')
('Generic', ' (34)')
('GloWatt', ' (1)')
('KCL', ' (1)')
('OEM', ' (1)')
('Powerwarehouse', ' (7)')
('SKU', ' (5)')
('Top Lamp', ' (1)')
('Unknown', ' (1)')
('USOM', ' (3)')
--------------------------爬去完毕atSat Apr 30 00:26:00 2016----------------------------
--------------------------开始爬去Dynamic LampsatSat Apr 30 00:26:01 2016---------------------------
('Battery1inc', ' (85)')
('BenQ', ' (237)')
('Buslink', ' (31)')
('Calumet', ' (2)')
('Comoze Lamps', ' (405)')
('CTLAMP', ' (615)')
('Dell', ' (82)')
('Divine Lighting', ' (36)')
('DNGO', ' (63)')
('Dynamic', ' (4)')
('Eiko', ' (140)')
('Electrified', ' (2)')
('ELECTRIFIED LAMPS', ' (24)')
('Electronix Xpress', ' (418)')
('ePharos', ' (502)')
('Epson', ' (631)')
('eReplacements', ' (119)')
('FI Lamps', ' (505)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('G-lamps', ' (43)')
('GE', ' (248)')
('GE Lighting', ' (152)')
('General Electric', ' (53)')
('Generic', ' (1,671)')
('Genie', ' (101)')
('GLAMPS', ' (2)')
('Impact', ' (7)')
('Industrial Lighting Solutions', ' (9)')
('KCL', ' (280)')
('Kodak', ' (1)')
('Lampedia', ' (63)')
('M-Wave', ' (830)')
('Mitsubishi', ' (406)')
('Mitsubishi DLP TV Bulbs', ' (29)')
('Mocpinc', ' (10)')
('MyProjectorLamps', ' (344)')
('Nec', ' (19)')
('Optoma', ' (161)')
('Osram', ' (1,295)')
('Panasonic', ' (245)')
('Philips', ' (988)')
('Powerwarehouse', ' (239)')
('Projector Lamps World', ' (45)')
('Pureglare', ' (107)')
('Samsung', ' (323)')
('ShopJimmy', ' (3)')
('Sony', ' (141)')
('Sylvania', ' (115)')
('Technical Precision', ' (10)')
('Unknown', ' (167)')
('Welch Allyn Compatible', ' (1)')
--------------------------爬去完毕atSat Apr 30 00:26:02 2016----------------------------
多线程:00:32:37开始00:32:39结束 耗时2秒
import re
import requests import threading
import time
from time import ctime,sleep url_EB = 'http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG&rh=n%3A9063592011%2Ck%3Aprojector&bbn=9063592011&keywords=projector&pickerToList=brandtextbin&ie=UTF8&qid=1461902521'
headers_EB = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_AML = '''https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A3UJI9WWE6PRP5&rh=i%3Amerchant-items
&pickerToList=brandtextbin&ie=UTF8&qid=1461899728'''
headers_AML ={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_DL= 'https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=AS7ZU4MN0FPOY&rh=i%3Amerchant-items&pickerToList=brandtextbin&ie=UTF8&qid=1461901862'
headers_DL = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} name = {'a':'ExclusiveBulbs',
'b':'Amazing Lamps',
'c':'Dynamic Lamps'} # listing_count = re.findall('<span class="narrowValue">(.*?)</span',data.text)
# f = dict(map(lambda x,y:[x,y],store_name,listing_count))
#
# for k,v in f.items():
# print(k,v) def foo_one(url,headers,name):
print('--------------------------开始爬去{0}at{1}---------------------------'.format(name,time.ctime())) response = requests.get(url,headers=headers)
store_name = re.findall('<span class="refinementLink">(.*?)</span><span class="narrowValue">(.*?)</span',response.text)
for i in store_name:
print(i)
print('--------------------------爬去完毕{0}at{1}----------------------------'.format(name,time.ctime())) threads = []
t1 = threading.Thread(target=foo_one,args=(url_EB,headers_EB,name['a']))
threads.append(t1)
t2 = threading.Thread(target=foo_one,args=(url_AML,headers_AML,name['b']))
threads.append(t2)
t3 = threading.Thread(target=foo_one,args=(url_DL,headers_DL,name['c']))
threads.append(t3) if __name__ == '__main__':
for t in threads:
t.setDaemon(True)
t.start()
t.join() print ("all over %s" %ctime())
输出:
--------------------------开始爬去ExclusiveBulbsatSat Apr 30 00:32:37 2016---------------------------
--------------------------开始爬去Amazing LampsatSat Apr 30 00:32:37 2016---------------------------
--------------------------开始爬去Dynamic LampsatSat Apr 30 00:32:37 2016---------------------------
('A.Shine', ' (97)')
('AmpacElectronics', ' (1,645)')
('AuraBeam', ' (33,088)')
('AWO', ' (1,209)')
('Battery1inc', ' (694)')
('Comoze Lamps', ' (6,172)')
('Compatible Lamp', ' (317)')
('Corgi Lamps', ' (2,123)')
('CTLAMP', ' (3,501)')
('Dell', ' (191)')
('Diamond Lamps', ' (966)')
('Dynamic', ' (4)')
('Eiki', ' (457)')
('ePharos', ' (2,592)')
('Epson', ' (1,456)')
('EREPLACEMENT', ' (115)')
('eReplacements', ' (813)')
('eWo's', ' (120)')
('eWorldlamp', ' (354)')
('FI Lamps', ' (5,710)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('For Epson', ' (3)')
('Generic', ' (9,771)')
('Good Lamp', ' (819)')
('HCDZ', ' (2,748)')
('Hitachi', ' (935)')
('IET Lamps', ' (2,137)')
('InFocus', ' (44)')
('JVC', ' (326)')
('KCL', ' (3,783)')
('Lampedia', ' (618)')
('Lutema', ' (1,955)')
('Mitsubishi', ' (1,006)')
('Mogobe', ' (1,336)')
('MyProjectorLamps', ' (473)')
('NEC', ' (450)')
('Nec Computers', ' (13)')
('Optoma', ' (956)')
('Osram Sylvania', ' (78)')
('Panasonic', ' (820)')
('Philips', ' (7,502)')
('Powerwarehouse', ' (9,972)')
('Projector Lamps World', ' (112)')
('Pureglare', ' (369)')
('Samsung', ' (1,078)')
('Sharp', ' (426)')
('Shopforbattery', ' (2,511)')
('SMART BOARD', ' (66)')
('Sony', ' (990)')
('TVLampsforless', ' (14)')
('Unknown', ' (722)')
--------------------------爬去完毕ExclusiveBulbsatSat Apr 30 00:32:38 2016----------------------------
('Battery1inc', ' (85)')
('BenQ', ' (237)')
('Buslink', ' (31)')
('Calumet', ' (2)')
('Comoze Lamps', ' (405)')
('CTLAMP', ' (615)')
('Dell', ' (82)')
('Divine Lighting', ' (36)')
('DNGO', ' (63)')
('Dynamic', ' (4)')
('Eiko', ' (140)')
('Electrified', ' (2)')
('ELECTRIFIED LAMPS', ' (24)')
('Electronix Xpress', ' (418)')
('ePharos', ' (502)')
('Epson', ' (631)')
('eReplacements', ' (119)')
('FI Lamps', ' (505)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('G-lamps', ' (43)')
('GE', ' (248)')
('GE Lighting', ' (152)')
('General Electric', ' (53)')
('Generic', ' (1,671)')
('Genie', ' (101)')
('GLAMPS', ' (2)')
('Impact', ' (7)')
('Industrial Lighting Solutions', ' (9)')
('KCL', ' (280)')
('Kodak', ' (1)')
('Lampedia', ' (63)')
('M-Wave', ' (830)')
('Mitsubishi', ' (406)')
('Mitsubishi DLP TV Bulbs', ' (29)')
('Mocpinc', ' (10)')
('MyProjectorLamps', ' (344)')
('Nec', ' (19)')
('Optoma', ' (161)')
('Osram', ' (1,295)')
('Panasonic', ' (245)')
('Philips', ' (988)')
('Powerwarehouse', ' (239)')
('Projector Lamps World', ' (45)')
('Pureglare', ' (107)')
('Samsung', ' (323)')
('ShopJimmy', ' (3)')
('Sony', ' (141)')
('Sylvania', ' (115)')
('Technical Precision', ' (10)')
('Unknown', ' (167)')
('Welch Allyn Compatible', ' (1)')
--------------------------爬去完毕Dynamic LampsatSat Apr 30 00:32:39 2016----------------------------
all over Sat Apr 30 00:32:39 2016
单线程爬虫VS多线程爬虫的效率对比的更多相关文章
- 【Python爬虫实战】多线程爬虫---糗事百科段子爬取
多线程爬虫:即程序中的某些程序段并行执行,合理地设置多线程,可以让爬虫效率更高糗事百科段子普通爬虫和多线程爬虫分析该网址链接得出:https://www.qiushibaike.com/8hr/pag ...
- Java 多线程爬虫及分布式爬虫架构探索
这是 Java 爬虫系列博文的第五篇,在上一篇 Java 爬虫服务器被屏蔽,不要慌,咱们换一台服务器 中,我们简单的聊反爬虫策略和反反爬虫方法,主要针对的是 IP 被封及其对应办法.前面几篇文章我们把 ...
- Java 多线程爬虫及分布式爬虫架构
这是 Java 爬虫系列博文的第五篇,在上一篇 Java 爬虫服务器被屏蔽,不要慌,咱们换一台服务器 中,我们简单的聊反爬虫策略和反反爬虫方法,主要针对的是 IP 被封及其对应办法.前面几篇文章我们把 ...
- python多线程爬虫设计及实现示例
爬虫的基本步骤分为:获取,解析,存储.假设这里获取和存储为io密集型(访问网络和数据存储),解析为cpu密集型.那么在设计多线程爬虫时主要有两种方案:第一种方案是一个线程完成三个步骤,然后运行多个线程 ...
- 抓包分析、多线程爬虫及xpath学习
1.抓包分析 1.1 Fiddler安装及基本操作 由于很多网站采用的是HTTPS协议,而fiddler默认不支持HTTPS,先通过设置使fiddler能抓取HTTPS网站,过程可参考(https:/ ...
- python爬虫之多线程、多进程+代码示例
python爬虫之多线程.多进程 使用多进程.多线程编写爬虫的代码能有效的提高爬虫爬取目标网站的效率. 一.什么是进程和线程 引用廖雪峰的官方网站关于进程和线程的讲解: 进程:对于操作系统来说,一个任 ...
- c#中@标志的作用 C#通过序列化实现深表复制 细说并发编程-TPL 大数据量下DataTable To List效率对比 【转载】C#工具类:实现文件操作File的工具类 异步多线程 Async .net 多线程 Thread ThreadPool Task .Net 反射学习
c#中@标志的作用 参考微软官方文档-特殊字符@,地址 https://docs.microsoft.com/zh-cn/dotnet/csharp/language-reference/toke ...
- 【python3两小时快速入门】入门笔记03:简单爬虫+多线程爬虫
作用,之间将目标网页保存金本地 1.爬虫代码修改自网络,目前运行平稳,博主需要的是精准爬取,数据量并不大,暂未加多线程. 2.分割策略是通过查询条件进行分类,循环启动多条线程. 1.单线程简单爬虫(第 ...
- Python爬虫进阶 | 多线程
一.简介 为了提高爬虫程序效率,由于python解释器GIL,导致同一进程中即使有多个线程,实际上也只会有一个线程在运行,但通过request.get发送请求获取响应时有阻塞,所以采用了多线程依然可以 ...
随机推荐
- 使用Python开发SQLite代理服务器(转载)
转载:https://mp.weixin.qq.com/s?timestamp=1498531736&src=3&ver=1&signature=Eq6DPvkuGJi*G5s ...
- subscription group permisson
- Charles的HTTPS抓包方法及原理,下载安装ssl/https证书
转自:https://zhubangbang.com/charles-https-packet-capture-method-and-principle.html 本文的Charles,适应windo ...
- asp.net+mvc+easyui+sqlite 简单用户系统学习之旅(七)—— 添加用户到数据库-obj转json
这一节讲一下如何添加用户名和密码到已建的sqlite.db数据库中. 当在datagrid的toolbar中输入用户名.密码,然后点击添加按钮时,将该用户加入数据库,并显示出来.datagrid表格里 ...
- node.js 学习02
读写文件中的路径问题 readFile()读取文件函数中的./(相对路径)这个参数,相对的是执行node命令的路径,而不是相对于正在执行的这个js文件来查找.为了解决这个问题: __dirname(两 ...
- nvidia显卡驱动卸载和卸载后的问题
因为装了nvidia显卡驱动后开机一直处于循环登录界面.password输入正确也是进不去.然后就决定卸载nvidia显卡驱动.安装之后出现还是循环登陆. 是openGL的问题 有至少两种解决方 ...
- curl 重定向问题
今天在curl一个网站的时候遇到一个奇怪的问题,下面是输出: lxg@lxg-X240:~$ curl -L http://www.yngs.gov.cn/ -v * Hostname was NOT ...
- Linux 性能监控 —— Load Average
一. 简单介绍 top. uptime. cat /proc/loadavg 命令中 Load average: 4.90, 5.51, 5.77 整体含义: 正在执行的任务数量 + 排 ...
- 【Akka】在并发程序中使用Future
引言 在Akka中, 一个Future是用来获取某个并发操作的结果的数据结构.这个操作一般是由Actor运行或由Dispatcher直接运行的. 这个结果能够以同步(堵塞)或异步(非堵塞)的方式訪问. ...
- Codeforces Round #297 (Div. 2) 525C Ilya and Sticks(脑洞)
C. Ilya and Sticks time limit per test 2 seconds memory limit per test 256 megabytes input standard ...