单线程爬虫VS多线程爬虫的效率对比
单线程爬虫:
import re
import requests
import time url_EB = 'http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG&rh=n%3A9063592011%2Ck%3Aprojector&bbn=9063592011&keywords=projector&pickerToList=brandtextbin&ie=UTF8&qid=1461902521'
headers_EB = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_AML = '''https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A3UJI9WWE6PRP5&rh=i%3Amerchant-items
&pickerToList=brandtextbin&ie=UTF8&qid=1461899728'''
headers_AML ={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_DL= 'https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=AS7ZU4MN0FPOY&rh=i%3Amerchant-items&pickerToList=brandtextbin&ie=UTF8&qid=1461901862'
headers_DL = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} name = {'a':'ExclusiveBulbs',
'b':'Amazing Lamps',
'c':'Dynamic Lamps'} # listing_count = re.findall('<span class="narrowValue">(.*?)</span',data.text)
# f = dict(map(lambda x,y:[x,y],store_name,listing_count))
#
# for k,v in f.items():
# print(k,v) def foo_one(url,headers,name):
print('--------------------------开始爬去{0}at{1}---------------------------'.format(name,time.ctime())) response = requests.get(url,headers=headers)
store_name = re.findall('<span class="refinementLink">(.*?)</span><span class="narrowValue">(.*?)</span',response.text)
for i in store_name:
print(i)
print('--------------------------爬去完毕at{}----------------------------'.format(time.ctime()))
time.sleep(1) if __name__ == '__main__':
foo_one(url_EB,headers_EB,name['a'])
foo_one(url_AML,headers_AML,name['b'])
foo_one(url_DL,headers_DL,name['c'])
输出:00:25:33开始,00:26:02结束 耗时29秒
--------------------------开始爬去ExclusiveBulbsatSat Apr 30 00:25:33 2016---------------------------
('A.Shine', ' (97)')
('AmpacElectronics', ' (1,644)')
('AuraBeam', ' (33,084)')
('AWO', ' (1,206)')
('Battery1inc', ' (694)')
('Comoze Lamps', ' (6,172)')
('Compatible Lamp', ' (317)')
('Corgi Lamps', ' (2,124)')
('CTLAMP', ' (3,499)')
('Dell', ' (191)')
('Diamond Lamps', ' (966)')
('Dynamic', ' (4)')
('Eiki', ' (460)')
('ePharos', ' (2,592)')
('Epson', ' (1,456)')
('EREPLACEMENT', ' (115)')
('eReplacements', ' (814)')
('eWo's', ' (120)')
('eWorldlamp', ' (354)')
('FI Lamps', ' (5,707)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('For Epson', ' (3)')
('Generic', ' (9,769)')
('Good Lamp', ' (819)')
('HCDZ', ' (2,746)')
('Hitachi', ' (935)')
('IET Lamps', ' (2,144)')
('InFocus', ' (44)')
('JVC', ' (326)')
('KCL', ' (3,781)')
('Lampedia', ' (618)')
('Lutema', ' (1,956)')
('Mitsubishi', ' (1,006)')
('Mogobe', ' (1,335)')
('MyProjectorLamps', ' (473)')
('NEC', ' (446)')
('Nec Computers', ' (13)')
('Optoma', ' (956)')
('Osram Sylvania', ' (78)')
('Panasonic', ' (820)')
('Philips', ' (7,502)')
('Powerwarehouse', ' (9,971)')
('Projector Lamps World', ' (112)')
('Pureglare', ' (369)')
('Samsung', ' (1,078)')
('Sharp', ' (426)')
('Shopforbattery', ' (2,510)')
('SMART BOARD', ' (66)')
('Sony', ' (990)')
('TVLampsforless', ' (14)')
('Unknown', ' (722)')
--------------------------爬去完毕atSat Apr 30 00:25:57 2016----------------------------
--------------------------开始爬去Amazing LampsatSat Apr 30 00:25:58 2016---------------------------
('AWO', ' (1)')
('Comoze Lamps', ' (2)')
('DNGO', ' (8)')
('Electrified', ' (9)')
('ELECTRIFIED', ' (10)')
('Electrified Discounters', ' (5)')
('ELECTRIFIED LAMPS', ' (1,177)')
('ELECTRIFIED PRINTHEAD', ' (24)')
('ELECTRIFIED PRINTHEADS', ' (2)')
('FI Lamps', ' (2)')
('Generic', ' (34)')
('GloWatt', ' (1)')
('KCL', ' (1)')
('OEM', ' (1)')
('Powerwarehouse', ' (7)')
('SKU', ' (5)')
('Top Lamp', ' (1)')
('Unknown', ' (1)')
('USOM', ' (3)')
--------------------------爬去完毕atSat Apr 30 00:26:00 2016----------------------------
--------------------------开始爬去Dynamic LampsatSat Apr 30 00:26:01 2016---------------------------
('Battery1inc', ' (85)')
('BenQ', ' (237)')
('Buslink', ' (31)')
('Calumet', ' (2)')
('Comoze Lamps', ' (405)')
('CTLAMP', ' (615)')
('Dell', ' (82)')
('Divine Lighting', ' (36)')
('DNGO', ' (63)')
('Dynamic', ' (4)')
('Eiko', ' (140)')
('Electrified', ' (2)')
('ELECTRIFIED LAMPS', ' (24)')
('Electronix Xpress', ' (418)')
('ePharos', ' (502)')
('Epson', ' (631)')
('eReplacements', ' (119)')
('FI Lamps', ' (505)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('G-lamps', ' (43)')
('GE', ' (248)')
('GE Lighting', ' (152)')
('General Electric', ' (53)')
('Generic', ' (1,671)')
('Genie', ' (101)')
('GLAMPS', ' (2)')
('Impact', ' (7)')
('Industrial Lighting Solutions', ' (9)')
('KCL', ' (280)')
('Kodak', ' (1)')
('Lampedia', ' (63)')
('M-Wave', ' (830)')
('Mitsubishi', ' (406)')
('Mitsubishi DLP TV Bulbs', ' (29)')
('Mocpinc', ' (10)')
('MyProjectorLamps', ' (344)')
('Nec', ' (19)')
('Optoma', ' (161)')
('Osram', ' (1,295)')
('Panasonic', ' (245)')
('Philips', ' (988)')
('Powerwarehouse', ' (239)')
('Projector Lamps World', ' (45)')
('Pureglare', ' (107)')
('Samsung', ' (323)')
('ShopJimmy', ' (3)')
('Sony', ' (141)')
('Sylvania', ' (115)')
('Technical Precision', ' (10)')
('Unknown', ' (167)')
('Welch Allyn Compatible', ' (1)')
--------------------------爬去完毕atSat Apr 30 00:26:02 2016----------------------------
多线程:00:32:37开始00:32:39结束 耗时2秒
import re
import requests import threading
import time
from time import ctime,sleep url_EB = 'http://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A22XNR713HGDVG&rh=n%3A9063592011%2Ck%3Aprojector&bbn=9063592011&keywords=projector&pickerToList=brandtextbin&ie=UTF8&qid=1461902521'
headers_EB = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_AML = '''https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=A3UJI9WWE6PRP5&rh=i%3Amerchant-items
&pickerToList=brandtextbin&ie=UTF8&qid=1461899728'''
headers_AML ={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} url_DL= 'https://www.amazon.com/gp/search/other/ref=sr_sa_p_4?me=AS7ZU4MN0FPOY&rh=i%3Amerchant-items&pickerToList=brandtextbin&ie=UTF8&qid=1461901862'
headers_DL = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'} name = {'a':'ExclusiveBulbs',
'b':'Amazing Lamps',
'c':'Dynamic Lamps'} # listing_count = re.findall('<span class="narrowValue">(.*?)</span',data.text)
# f = dict(map(lambda x,y:[x,y],store_name,listing_count))
#
# for k,v in f.items():
# print(k,v) def foo_one(url,headers,name):
print('--------------------------开始爬去{0}at{1}---------------------------'.format(name,time.ctime())) response = requests.get(url,headers=headers)
store_name = re.findall('<span class="refinementLink">(.*?)</span><span class="narrowValue">(.*?)</span',response.text)
for i in store_name:
print(i)
print('--------------------------爬去完毕{0}at{1}----------------------------'.format(name,time.ctime())) threads = []
t1 = threading.Thread(target=foo_one,args=(url_EB,headers_EB,name['a']))
threads.append(t1)
t2 = threading.Thread(target=foo_one,args=(url_AML,headers_AML,name['b']))
threads.append(t2)
t3 = threading.Thread(target=foo_one,args=(url_DL,headers_DL,name['c']))
threads.append(t3) if __name__ == '__main__':
for t in threads:
t.setDaemon(True)
t.start()
t.join() print ("all over %s" %ctime())
输出:
--------------------------开始爬去ExclusiveBulbsatSat Apr 30 00:32:37 2016---------------------------
--------------------------开始爬去Amazing LampsatSat Apr 30 00:32:37 2016---------------------------
--------------------------开始爬去Dynamic LampsatSat Apr 30 00:32:37 2016---------------------------
('A.Shine', ' (97)')
('AmpacElectronics', ' (1,645)')
('AuraBeam', ' (33,088)')
('AWO', ' (1,209)')
('Battery1inc', ' (694)')
('Comoze Lamps', ' (6,172)')
('Compatible Lamp', ' (317)')
('Corgi Lamps', ' (2,123)')
('CTLAMP', ' (3,501)')
('Dell', ' (191)')
('Diamond Lamps', ' (966)')
('Dynamic', ' (4)')
('Eiki', ' (457)')
('ePharos', ' (2,592)')
('Epson', ' (1,456)')
('EREPLACEMENT', ' (115)')
('eReplacements', ' (813)')
('eWo's', ' (120)')
('eWorldlamp', ' (354)')
('FI Lamps', ' (5,710)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('For Epson', ' (3)')
('Generic', ' (9,771)')
('Good Lamp', ' (819)')
('HCDZ', ' (2,748)')
('Hitachi', ' (935)')
('IET Lamps', ' (2,137)')
('InFocus', ' (44)')
('JVC', ' (326)')
('KCL', ' (3,783)')
('Lampedia', ' (618)')
('Lutema', ' (1,955)')
('Mitsubishi', ' (1,006)')
('Mogobe', ' (1,336)')
('MyProjectorLamps', ' (473)')
('NEC', ' (450)')
('Nec Computers', ' (13)')
('Optoma', ' (956)')
('Osram Sylvania', ' (78)')
('Panasonic', ' (820)')
('Philips', ' (7,502)')
('Powerwarehouse', ' (9,972)')
('Projector Lamps World', ' (112)')
('Pureglare', ' (369)')
('Samsung', ' (1,078)')
('Sharp', ' (426)')
('Shopforbattery', ' (2,511)')
('SMART BOARD', ' (66)')
('Sony', ' (990)')
('TVLampsforless', ' (14)')
('Unknown', ' (722)')
--------------------------爬去完毕ExclusiveBulbsatSat Apr 30 00:32:38 2016----------------------------
('Battery1inc', ' (85)')
('BenQ', ' (237)')
('Buslink', ' (31)')
('Calumet', ' (2)')
('Comoze Lamps', ' (405)')
('CTLAMP', ' (615)')
('Dell', ' (82)')
('Divine Lighting', ' (36)')
('DNGO', ' (63)')
('Dynamic', ' (4)')
('Eiko', ' (140)')
('Electrified', ' (2)')
('ELECTRIFIED LAMPS', ' (24)')
('Electronix Xpress', ' (418)')
('ePharos', ' (502)')
('Epson', ' (631)')
('eReplacements', ' (119)')
('FI Lamps', ' (505)')
('FL Projector Lamp For Mitsubishi', ' (1)')
('G-lamps', ' (43)')
('GE', ' (248)')
('GE Lighting', ' (152)')
('General Electric', ' (53)')
('Generic', ' (1,671)')
('Genie', ' (101)')
('GLAMPS', ' (2)')
('Impact', ' (7)')
('Industrial Lighting Solutions', ' (9)')
('KCL', ' (280)')
('Kodak', ' (1)')
('Lampedia', ' (63)')
('M-Wave', ' (830)')
('Mitsubishi', ' (406)')
('Mitsubishi DLP TV Bulbs', ' (29)')
('Mocpinc', ' (10)')
('MyProjectorLamps', ' (344)')
('Nec', ' (19)')
('Optoma', ' (161)')
('Osram', ' (1,295)')
('Panasonic', ' (245)')
('Philips', ' (988)')
('Powerwarehouse', ' (239)')
('Projector Lamps World', ' (45)')
('Pureglare', ' (107)')
('Samsung', ' (323)')
('ShopJimmy', ' (3)')
('Sony', ' (141)')
('Sylvania', ' (115)')
('Technical Precision', ' (10)')
('Unknown', ' (167)')
('Welch Allyn Compatible', ' (1)')
--------------------------爬去完毕Dynamic LampsatSat Apr 30 00:32:39 2016----------------------------
all over Sat Apr 30 00:32:39 2016
单线程爬虫VS多线程爬虫的效率对比的更多相关文章
- 【Python爬虫实战】多线程爬虫---糗事百科段子爬取
多线程爬虫:即程序中的某些程序段并行执行,合理地设置多线程,可以让爬虫效率更高糗事百科段子普通爬虫和多线程爬虫分析该网址链接得出:https://www.qiushibaike.com/8hr/pag ...
- Java 多线程爬虫及分布式爬虫架构探索
这是 Java 爬虫系列博文的第五篇,在上一篇 Java 爬虫服务器被屏蔽,不要慌,咱们换一台服务器 中,我们简单的聊反爬虫策略和反反爬虫方法,主要针对的是 IP 被封及其对应办法.前面几篇文章我们把 ...
- Java 多线程爬虫及分布式爬虫架构
这是 Java 爬虫系列博文的第五篇,在上一篇 Java 爬虫服务器被屏蔽,不要慌,咱们换一台服务器 中,我们简单的聊反爬虫策略和反反爬虫方法,主要针对的是 IP 被封及其对应办法.前面几篇文章我们把 ...
- python多线程爬虫设计及实现示例
爬虫的基本步骤分为:获取,解析,存储.假设这里获取和存储为io密集型(访问网络和数据存储),解析为cpu密集型.那么在设计多线程爬虫时主要有两种方案:第一种方案是一个线程完成三个步骤,然后运行多个线程 ...
- 抓包分析、多线程爬虫及xpath学习
1.抓包分析 1.1 Fiddler安装及基本操作 由于很多网站采用的是HTTPS协议,而fiddler默认不支持HTTPS,先通过设置使fiddler能抓取HTTPS网站,过程可参考(https:/ ...
- python爬虫之多线程、多进程+代码示例
python爬虫之多线程.多进程 使用多进程.多线程编写爬虫的代码能有效的提高爬虫爬取目标网站的效率. 一.什么是进程和线程 引用廖雪峰的官方网站关于进程和线程的讲解: 进程:对于操作系统来说,一个任 ...
- c#中@标志的作用 C#通过序列化实现深表复制 细说并发编程-TPL 大数据量下DataTable To List效率对比 【转载】C#工具类:实现文件操作File的工具类 异步多线程 Async .net 多线程 Thread ThreadPool Task .Net 反射学习
c#中@标志的作用 参考微软官方文档-特殊字符@,地址 https://docs.microsoft.com/zh-cn/dotnet/csharp/language-reference/toke ...
- 【python3两小时快速入门】入门笔记03:简单爬虫+多线程爬虫
作用,之间将目标网页保存金本地 1.爬虫代码修改自网络,目前运行平稳,博主需要的是精准爬取,数据量并不大,暂未加多线程. 2.分割策略是通过查询条件进行分类,循环启动多条线程. 1.单线程简单爬虫(第 ...
- Python爬虫进阶 | 多线程
一.简介 为了提高爬虫程序效率,由于python解释器GIL,导致同一进程中即使有多个线程,实际上也只会有一个线程在运行,但通过request.get发送请求获取响应时有阻塞,所以采用了多线程依然可以 ...
随机推荐
- fl2440 platform总线led字符设备驱动
首先需要知道的是,设备跟驱动是分开的.设备通过struct device来定义,也可以自己将结构体封装到自己定义的device结构体中: 例如:struct platform_device: 在inc ...
- Yii2 使用十二 配合ajaxFileUpload 上传文件
1.js $("input#upload").change(function () { $.ajaxFileUpload({ url: '/members/web-members- ...
- 通过案例对SparkStreaming透彻理解三板斧之三
本课将从二方面阐述: 一.解密SparkStreaming Job架构和运行机制 二.解密SparkStreaming容错架构和运行机制 一切不能进行实时流处理的数据都将是无效的数据.在流处理时代,S ...
- Java线程:概念与原理(转)
一.操作系统中线程和进程的概念 现在的操作系统是多任务操作系统.多线程是实现多任务的一种方式. 进程是指一个内存中运行的应用程序,每个进程都有自己独立的一块内存空间,一个进程中可以启动多个线程.比如在 ...
- js apply和call区别
<!DOCTYPE html> <html lang="zh"> <head> <meta charset="UTF-8&quo ...
- 每日一个机器学习算法——adaboost
在网上找到一篇好文,直接粘贴过来,加上一些补充和自己的理解,算作此文. My education in the fundamentals of machine learning has mainly ...
- 来自oaim的一些推广信息
笔者几年工作经历亲身走访过一些玻璃深加工企业,发现很重要的一种工具装载玻璃的铁架.而许多企业由于缺少实际操作的经验,导致部分铁架从被制作出来就让我们的成品存在质量缺陷的隐患,最常见的是装好中空玻璃,当 ...
- 为 jquery validate 添加验证失败回调
转载自:https://blog.csdn.net/huang100qi/article/details/52619227 1. jquery Validation Plugin - v1.15.1 ...
- 通过内存映射文件来颠倒文本内容(暂没有处理Unicode和换行符)
// ReverseFileDemo.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include <windows.h> ...
- 自定义ios NSLog
#ifdef DEBUG #define MyLog(FORMAT, ...) fprintf(stderr,"Time:%s\t File:%s\t Methods:%s\t Line:% ...