Python 爬取1688货源重量，自动发邮件到指定邮箱（qq），设置定时运行程序

  1 # -*- coding: utf-8 -*-

  2 # @Time : 2020/7/6 13:46

  3 # @Author : Chunfang

  4 # @Email : 3470959534@qq.com

  5 # @File : test02.py

  6 # @Software: PyCharm

  7

  8 import os,xlrd,time

  9 import re

 10 import datetime

 11 from openpyxl import load_workbook

 12 from selenium import webdriver

 13 from time import sleep

 14 from selenium.webdriver.chrome.service import Service

 15

 16 def down_data():

 17     start = datetime.datetime.now()

 18     filepath2 = 'SKU-URL-weight.xlsx'                # 新建表格，将唯一的sku，url写入

 19

 20     wb2 = load_workbook(filepath2)

 21     ws2 = wb2.worksheets[0]

 22

 23     def down_data(url):         # 获取每个链接的数据

 24         # 加载浏览器带表头数据爬虫

 25         c_service = Service(r'D:\Python\Scripts\chromedriver.exe')

 26         c_service.command_line_args()

 27         c_service.start()

 28         profile_directory = r'--user-data-dir=C:\Users\Administrator\AppData\Local\Google\Chrome\User Data'

 29         option = webdriver.ChromeOptions()

 30         option.add_argument(profile_directory)

 31         driver = webdriver.Chrome(options=option)

 32         driver.implicitly_wait(3)

 33         driver.get(url)

 34         sleep(3)

 35         data = driver.page_source

 36         sleep(2)

 37         driver.quit()

 38         sleep(2)

 39         c_service.stop()

 40         return data

 41     def station(data):

 42         busy = re.findall('<div class="tips" style=".*?<p>.*?(亲.*?回来).*?</p>', data, re.S)

 43         # print(busy)

 44         error_404 = re.findall('h3 class="title">.*?<em>(抱歉.*?)</em>', data, re.S)

 45         # print(error_404)

 46         pro_weight = re.findall('<span>.*?<b>.*?重量</b>.*?<em>(.*?)</em>', data, re.S)

 47         # print(pro_weight)

 48         right = re.findall('title="点击此按钮.*?rel="nofollow"><span>(.*?订购)</span></a>', data, re.S)

 49         # print(right)

 50         stations.append(busy)

 51         stations.append(error_404)

 52         stations.append(pro_weight)

 53         stations.append(right)

 54

 55     for i in range(16635,ws2.max_row+1):

 56         print('第' + str(i) + '个sku：' + str(ws2.cell(i, 1).value))

 57         stations = []

 58         data=down_data(ws2.cell(i,2).value)

 59         # print(data)

 60         station(data)

 61         while len(stations[0])!=0:#没有加载出来

 62             stations = []

 63             data = down_data(ws2.cell(i, 2).value)

 64             station(data)

 65             print(stations)

 66         if len(stations[1])==0:#判断404

 67             if len(stations[2])==0:#判断重量

 68                 if len(stations[3])==0:#判断产品是否有货，可订购

 69                     ws2.cell(i, 3).value = '产品下架'

 70                 else:

 71                     ws2.cell(i,3).value = '产品有货，没有标注重量'

 72             else:

 73                 ws2.cell(i, 3).value = stations[2][1]#添加重量

 74         else:

 75             ws2.cell(i,3).value = stations[1][0]#抱歉404

 76         print(stations)

 77

 78         wb2.save(filepath2)

 79

 80     end = datetime.datetime.now()

 81     print('Running time: %s Seconds'%(end-start))

 82

 83     #发邮件给对方邮箱

 84     import smtplib

 85     from email.mime.text import MIMEText

 86     from email.mime.multipart import MIMEMultipart

 87     from email.mime.image import MIMEImage

 88     from email.mime.application import MIMEApplication

 89

 90     #设置服务器所需信息

 91     fromaddr ='3470959534@qq.com'

 92     password = '验证码'#qq的邮箱验证码

 93     toaddrs = ['3470959534@qq.com','1725714926@qq.com']

 94

 95     #邮件内容设置

 96     message = MIMEText('hello! 这是跑货源结果，请查收','plain','utf-8')

 97     message['Subject']='测试邮件'

 98

 99     excel_file =filepath2

100     excel_apart = MIMEApplication(open(excel_file,'rb').read())

101     excel_apart.add_header('Content-Disposition','atttachment',filename=excel_file)

102

103     m = MIMEMultipart()

104     m.attach(message)

105     m.attach(excel_apart)

106

107     try:

108         server = smtplib.SMTP('smtp.qq.com')

109         server.login(fromaddr,password)

110         server.sendmail(fromaddr,toaddrs,m.as_string(),)

111         print('success')

112         server.quit()

113     except smtplib.SMTPException as e:

114         print('error:',e)

115

116 down_data()

117 #设置时间跑店小秘货源

118 # while True:

119 #     time_now = time.strftime('%H:%M:%S',time.localtime())

120 #

121 #     if time_now =="20:00:10":

122 #         down_data()

123 #         # print('Hello')

124 #         subject = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime())+'定时发送测试'

125 #         print(subject)

126 #         time.sleep(2)

Python 爬取1688货源重量，自动发邮件到指定邮箱（qq），设置定时运行程序的更多相关文章

python爬取网站页面时，部分标签无指定属性而报错
在写爬取页面a标签下href属性的时候,有这样一个问题,如果a标签下没有href这个属性则会报错,如下: 百度了有师傅用正则匹配的,方法感觉都不怎么好,查了BeautifulSoup的官方文档,发现一 ...
python爬取免费优质IP归属地查询接口
python爬取免费优质IP归属地查询接口具体不表,我今天要做的工作就是: 需要将数据库中大量ip查询出起归属地刚开始感觉好简单啊,毕竟只需要从百度找个免费接口然后来个python脚本跑一晚上就o ...
没有内涵段子可以刷了，利用Python爬取段友之家贴吧图片和小视频(含源码)
由于最新的视频整顿风波,内涵段子APP被迫关闭,广大段友无家可归,但是最近发现了一个"段友"的app,版本更新也挺快,正在号召广大段友回家,如下图,有兴趣的可以下载看看(ps:我不 ...
python爬取人民币汇率中间价
python爬取人民币汇率中间价,从最权威的网站中国外汇交易中心. 首先找到相关网页,解析链接,这中间需要经验和耐心,在此不多说. 以人民币兑美元的汇率为例(CNY/USD),脚本详情如下: wind ...
萌新学习Python爬取B站弹幕+R语言分词demo说明
代码地址如下:http://www.demodashi.com/demo/11578.html 一.写在前面之前在简书首页看到了Python爬虫的介绍,于是就想着爬取B站弹幕并绘制词云,因此有了这样 ...
Python爬取招聘信息，并且存储到MySQL数据库中
前面一篇文章主要讲述,如何通过Python爬取招聘信息,且爬取的日期为前一天的,同时将爬取的内容保存到数据库中:这篇文章主要讲述如何将python文件压缩成exe可执行文件,供后面的操作. 这系列文章 ...
Python爬取视频指南
摘自:https://www.jianshu.com/p/9ca86becd86d 前言前两天尔羽说让我爬一下菜鸟窝的教程视频,这次就跟大家来说说Python爬取视频的经验正文 https://w ...
Python爬取网页信息
Python爬取网页信息的步骤以爬取英文名字网站(https://nameberry.com/)中每个名字的评论内容,包括英文名,用户名,评论的时间和评论的内容为例. 1.确认网址在浏览器中输入初 ...
Python 爬取热词并进行分类数据分析-[解释修复+热词引用]
日期:2020.02.02 博客期:141 星期日 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

随机推荐

GeneralUpdate20220323里程碑版本发布
大家好我是juster,GeneralUpdate的开源项目作者.这次将发布GeneralUpdate里程碑版本,该版本发生了巨大改变历时4个月的时间终于要和大家见面了.开源不易希望大家能多多支持.可 ...
sqlmap之waf绕过
#一点补充在老版本的安全狗中,可通过构造payload: http://xx.xx.xx.xx/sqli-labs/Less-2/index.php/x.txt?id=1 and 1=1 可通过in ...
BGP的四类属性详解
BGP的四类属性公认必遵(Well-known mandatory) 要求所有运行BGP协议的设备都必须能识别,且在更新消息中必须包含. Origin(起源) 属性用来标识路由信息的来源. 如果路 ...
使用git clone 报错curl56 errno 10054解决方法
使用git clone 报错curl56 errno 10054解决方法 ----------------版权声明:本文为CSDN博主「伽马射线爆」的原创文章,遵循CC 4.0 BY-SA版权协议,转 ...
java concurrent 并发多线程
Concurrent 包结构 ■ Concurrent 包整体类图 ■ Concurrent包实现机制综述: 在整个并发包设计上,Doug Lea大师采用了3.1 Concurrent包整体架构的三 ...
Linux Centos7使用ping命令ping不通网络的解决方案
本解决方案不配置dns,都是ping的IP地址,所以如果想ping域名,则加上DNS项的配置后自行尝试吧我使用的虚拟机系统信息: Linux:Centos7 Network:虚拟机设置的桥接模式(自 ...
String是基本数据类型吗?
基本数据类型包括byte.short.int.long.char.float.double和boolean.String不是基本类型.String是引用类型. 而且java.lang.String类是 ...
在 Java 中 CycliBarriar 和 CountdownLatch 有什么区别？
CyclicBarrier 可以重复使用,而 CountdownLatch 不能重复使用. Java 的 concurrent 包里面的 CountDownLatch 其实可以把它看作一个计数器, 只 ...
nginx使用与配置
一.nginx操作命令 nginx常用命令: 验证配置是否正确: nginx -t 查看Nginx的版本号:nginx -V 启动Nginx:start nginx 重新加载nginx:nginx.e ...
java-idea创建maven管理web项目不能解析EL的解决方法
默认会原样输出: 这是由于这样子创建的web.xml的版本不够高 2.5之前web.xml文件中的头定义中,el表达式默认是忽略不解析的,故需要显示声明解析el表达式所以我们要修改版本: 再< ...

Python 爬取1688货源重量，自动发邮件到指定邮箱（qq），设置定时运行程序

Python 爬取1688货源重量，自动发邮件到指定邮箱（qq），设置定时运行程序的更多相关文章

随机推荐

热门专题