用Python实现邮件发送Hive明细数据
一、需求描述
客户需要每周周一接收特定的活动数据,生成Excel或是CSV文件,并通过邮件发送给指定接收者。需求初步分析得出如下结论:
- 1.客户所要的数据并不太复杂,无须通过特殊处理,可以简单的认为就是SQL查询结果输出
- 2.查询结果输出CSV文件,及邮件发送技术相对成熟,通用性强
- 3.Linux系统的Crond服务支持定时任务
二、系统环境要求
- Linux CentOS6.x
- Hadoop2.x
- Python2.7
三、Python依赖库
- PyHive
- ppytools2
- thrift
- thrift-sasl
- sasl
四、工作流程

五、代码实现
项目结构图如下:

关键实现步骤如下:
1.全局配置settings.py
# -*- coding: utf-8 -*-
# __author__ = 'elkan1788@gmail.com'
from ppytools.cfgreader import ConfReader
import logging
'''日志输出格式
'''
logging.basicConfig(level=logging.INFO,
encoding='UTF-8', format='%(asctime)s [%(levelname)s] {%(name)-10s} - %(message)s')
class ProjectConfig(object):
"""ProjectConfig
"""
def __init__(self, *conf_paths):
self.cr = ConfReader(*conf_paths)
def getHiveConf(self):
return self.cr.getValues('HiveServer')
def getEmailServer(self):
return self.cr.getValues('EmailServer')
def getJobInfo(self):
return self.cr.getValues('JobInfo')
def getCSVFolder(self):
return self.cr.getValues('CSVFolder')['folder']
def getCSVFile(self):
return self.cr.getValues('CSVFile')
def getCSVHead(self):
return self.cr.getValues('CSVHead')
def getHQLScript(self):
return self.cr.getValues('HQLScript')
def getEmailInfo(self):
return self.cr.getValues('EmailInfo')
2.核心代码 main.py
# -*- coding: utf-8 -*-
# __author__ = 'elkan1788@gmail.com'
from hiveemailjob.settings import ProjectConfig
from ppytools.csvhelper import write
from ppytools.emailclient import EmailClient
from ppytools.hiveclient import HiveClient
from ppytools.lang.timerhelper import timeMeter
import datetime
import logging
import sys
import time
logger = logging.getLogger(__name__)
def build_rk(ts):
"""
Build HBase row key value
:param ts: date time
:return: row key
"""
return hex(int(time.mktime(ts.timetuple())*1000))[2:]
def email_att(folder, name):
return '{}/{}_{}.csv'.format(folder, name, datetime.datetime.now().strftime('%Y%m%d%H%M%S'))
@timeMeter()
def run(args):
"""
Email Job program execute entrance
:param args:
1. job file file path
2. start time, format: 2018-01-30 17:09:38 (not require)
3. stop time (not require)
:return: Empty
"""
'''
Read system args start
'''
args_len = len(args)
if args_len is not 2 and args_len is not 4:
logger.error('Enter args is error. Please check!!!')
logger.error('1: job file path.')
logger.error('2: start time, format: 2018-01-30 17:09:38(option)')
logger.error('3: stop time(option)')
sys.exit(1)
elif args == 4:
try:
start_time = datetime.datetime.strptime(args[2], '%Y-%m-%d %H:%M:%S')
stop_time = datetime.datetime.strptime(args[3], '%Y-%m-%d %H:%M:%S')
except Exception, e:
raise RuntimeError('Parse start or stop time failed!!!\n', e)
else:
stop_time = datetime.date.today()
start_time = stop_time - datetime.timedelta(days=1)
job_file = args[1]
start_rk = build_rk(start_time)
stop_rk = build_rk(stop_time)
'''System settings files (hard code)
'''
hive_conf = '/etc/pythoncfg/hive.ini'
email_conf = '/etc/pythoncfg/email.ini'
sets = ProjectConfig(hive_conf, email_conf, job_file)
job_info = sets.getJobInfo()
csv_folder = sets.getCSVFolder()
logger.info('Now running %s Email Job...', job_info['title'])
logger.info('Start time: %s', start_time)
logger.info('Stop time: %s', stop_time)
hc = HiveClient(**sets.getHiveConf())
csv_file = sets.getCSVFile().items()
csv_file.sort()
file_list = []
logger.info('File name list: ')
for (k, v) in csv_file:
logging.info('%s: %s', k, v)
file_list.append(v)
csv_head = sets.getCSVHead().items()
csv_head.sort()
head_list = []
logger.info('CSV file head list: ')
for (k, v) in csv_head:
logging.info('%s: %s', k, v)
head_list.append(v)
hql_scripts = sets.getHQLScript().items()
hql_scripts.sort()
email_atts = []
index = 0
for (k, hql) in hql_scripts:
logging.info('%s: %s', k, hql)
'''Please instance of your logic in here.
'''
result, size = hc.execQuery(hql.format(start_rk, stop_rk))
if size is 0:
logging.info('The above HQL script not found any data!!!')
else:
csv_file = email_att(csv_folder, file_list[index])
email_atts.append(csv_file)
write(csv_file, head_list[index].split(','), result)
index += 1
'''Flush Hive Server connected.
'''
hc.closeConn()
email_sub = sets.getEmailInfo()['subject'] % start_time
email_body = sets.getEmailInfo()['body']
email_to = sets.getEmailInfo()['to'].split(';')
email_cc = sets.getEmailInfo()['cc'].split(';')
if len(email_atts) == 0:
email_body = '抱歉当前未找到任何数据。\n\n' + email_body
ec = EmailClient(**sets.getEmailServer())
ec.send(email_to, email_cc, email_sub, email_body, email_atts, False)
ec.quit()
logger.info('Finished %s Email Job.', job_info['title'])
3.系统配置文件hive.ini与email.ini
# /etc/pythoncfg/hive.ini
[HiveServer]
host=127.0.0.1
port=10000
user=hive
db=default
# /etc/pythoncfg/email.ini
[EmailServer]
server=mail.163.com
port=25
user=elkan1788@gmail.com
passwd=xxxxxx
mode=TSL
注意: 需要在指定的目录/etc/pythoncfg/下配置上述两个文件。
4.邮件Job配置参考emailjob.ini
[JobInfo]
title=邮件报表任务测试
[CSVFolder]
folder=/opt/csv_files/
# Please notice that CSVFile,CSVHead,HQLScript must be same length.
# And suggest that use prefix+number to flag and write.
[CSVFile]
file1=省份分组统计
file2=城市分组统计
[CSVHead]
head1=省份,累计
head2=省份,城市,累计
[HQLScript]
script1=select cn_state,count(1) m from ext_act_ja1
script2=select cn_state,cn_city,count(1) m from ext_act_ja2
[EmailInfo]
to=elkan1788@gmail.com;
cc=2292706174@qq.com;
# %s it will replace as the start date.
subject=%s区域抽奖统计[测试]
body=此邮件由系统自动发送,请勿回复,谢谢!
注意: CSVFile,CSVHead,HQLScript的数量要保持一致,也包括顺序,建议使用前缀+数字的格式命名。
5.Bin文件hive-emailjob.py
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# __author__ = 'elkan1788@gmail.com'
from hiveemailjob import main
import sys
if __name__ == '__main__':
main.run(sys.argv)
6.执行效果
在系统终端中敲入python -u bin/hive_email_job.py即可,输出如下:
2018-02-20 16:28:21,561 [INFO] {__main__ } - Now running 邮件报表任务测试 Email Job...
2018-02-20 16:28:21,561 [INFO] {__main__ } - Start time: 2018-02-22
2018-02-20 16:28:21,562 [INFO] {__main__ } - Stop time: 2018-02-20
2018-02-20 16:28:21,691 [INFO] {pyhive.hive} - USE `default`
2018-02-20 16:28:21,731 [INFO] {ppytools.hive_client} - Hive server connect is ready. Transport open: True
2018-02-20 16:28:31,957 [INFO] {ppytools.email_client} - Email SMTP server connect ready.
2018-02-20 16:28:31,957 [INFO] {root } - File name list:
2018-02-20 16:28:31,957 [INFO] {root } - file1: 省份分组统计
2018-02-20 16:28:31,957 [INFO] {root } - file2: 城市分组统计
2018-02-20 16:28:31,957 [INFO] {root } - CSV file head list:
2018-02-20 16:28:31,957 [INFO] {root } - head1: 省份,累计
2018-02-20 16:28:31,957 [INFO] {root } - head2: 省份,城市,累计
2018-02-20 16:28:31,957 [INFO] {root } - script1: select cn_state,count(1) m from ext_act_ja2
2018-02-20 16:28:31,958 [INFO] {pyhive.hive} - select cn_state,count(1) m from ext_act_ja2
2018-02-20 16:29:04,258 [INFO] {ppytools.hive_client} - Hive client query completed. Records found: 31
2018-02-20 16:29:04,259 [INFO] {ppytools.lang.timer_helper} - Execute <ppytools.hive_client.execQuery> method cost 32.3012499809 seconds.
2018-02-20 16:29:04,261 [INFO] {ppytools.csv_helper} - Write a CSV file successful. --> /opt/csv_files/省份分组统计_20180223162904.csv
2018-02-20 16:29:04,262 [INFO] {ppytools.lang.timer_helper} - Execute <ppytools.csv_helper.write> method cost 0.00222992897034 seconds.
2018-02-20 16:29:04,262 [INFO] {root } - script2: select cn_state,cn_city,count(1) m from ext_act_ja2
2018-02-20 16:29:04,262 [INFO] {pyhive.hive} - select cn_state,cn_city,count(1) m from ext_act_ja2
2018-02-20 16:29:23,462 [INFO] {ppytools.hive_client} - Hive client query completed. Records found: 367
2018-02-20 16:29:23,463 [INFO] {ppytools.lang.timer_helper} - Execute <ppytools.hive_client.execQuery> method cost 19.2005498409 seconds.
2018-02-20 16:29:23,465 [INFO] {ppytools.csv_helper} - Write a CSV file successful. --> /opt/csv_files/城市分组统计_20180223162923.csv
2018-02-20 16:29:23,465 [INFO] {ppytools.lang.timer_helper} - Execute <ppytools.csv_helper.write> method cost 0.00227284431458 seconds.
2018-02-20 16:29:23,669 [INFO] {ppytools.email_client} - Send email[2018-02-22区域抽奖统计[测试]] success. To users: elkan1788@163.com.
2018-02-20 16:29:23,669 [INFO] {ppytools.lang.timer_helper} - Execute <ppytools.email_client.send> method cost 0.204078912735 seconds.
2018-02-20 16:29:23,714 [INFO] {__main__ } - Finished 邮件报表任务测试 Email Job.
2018-02-20 16:29:23,715 [INFO] {ppytools.lang.timer_helper} - Execute <emailjob.main.run> method cost 62.1566159725 seconds.
OK,一个可以通用的数据文件同步程序至此就完成啦。
用Python实现邮件发送Hive明细数据
注:本文著作权归作者,由demo大师代发,拒绝转载,转载需要作者授权
用Python实现邮件发送Hive明细数据的更多相关文章
- Python 基于Python实现邮件发送
基于Python实现邮件发送 by:授客 QQ:1033553122 测试环境: Python版本:Python 2.7 注:需要修改mimetypes.py文件(该文件可通过文章底部的网盘分 ...
- python SMTP邮件发送(转载)
Python SMTP发送邮件 SMTP(Simple Mail Transfer Protocol)即简单邮件传输协议,它是一组用于由源地址到目的地址传送邮件的规则,由它来控制信件的中转方式. py ...
- python实现邮件发送完整代码(带附件发送方式)
实例一:利用SMTP与EMAIL实现邮件发送,带附件(完整代码) __author__ = 'Administrator'#coding=gb2312 from email.Header import ...
- python实现邮件发送
实例补充: #**************************利用STMP自动发送邮件******************************import smtplibsmtp = smtp ...
- Python SMTP邮件发送
SMTP是发送邮件的协议,Python内置对SMTP的支持,可以发送纯文本邮件.HTML邮件以及带附件的邮件. Python对SMTP支持有smtplib和email两个模块: email负责构造邮件 ...
- 基于Python实现邮件发送
import smtplibfrom email.mime.text import MIMETextemail_host = 'smtp.163.com' # 邮箱地址email_user = 'sz ...
- Python之邮件发送
Python的smtplib提供了一种很方便的途径用来发送电子邮件,它有SMTP协议进行简单的封装,可以使用SMTP对象的sendmail方法发送邮件,通过help()查看SMTP所提供的方法如下: ...
- 如何用python进行邮件发送
使用Python调用邮件服务器发送邮件,使用的协议是SMTP(Simple Mail Transfer Protocol),下图为使用TCP/IP基于SMTP发送邮件的过程示意图: SMTP协议工作原 ...
- Python 实现邮件发送功能(初级)
在我们日常项目中,会经常使用到邮件的发送功能,如何利用Python发送邮件也是一项必备的技能.本文主要讲述利用Python来发送邮件的一些基本操作. 本章主要包含知识点: 邮件发送原理简述即常用smt ...
随机推荐
- C++嵌套类及对外围类成员变量的访问
C++嵌套类及对外围类成员变量的访问 在一个类中定义的类称为嵌套类,定义嵌套类的类称为外围类. 定义嵌套类的目的在于隐藏类名,减少全局的标识符,从而限制用户能否使用该类建立对象.这样可以提高类的抽象能 ...
- 【linux高级程序设计】(第十五章)UDP网络编程应用 3
UDP组播通信 组播IP地址: D类IP地址 1110.********** 224.0.0.1 ~ 239.255.255.255 组播MAC地址:低23位,直接对应IP地址, 从右数第24位为 ...
- Laravel 添加自定义辅助函数
1. 在 app 目录下新建一个文件 helpers.php 2. 在 composer.json 文件的 autoload 字典中添加 "files":["app/he ...
- 在C#中将金额转换成中文大写金额
具体代码如下: /// <summary> /// 金额转换成中文大写金额 /// </summary> /// <param name="LowerMoney ...
- 华农oj Problem B: Averyboy找密码【STL】
Problem B: Averyboy找密码 Time Limit: 1 Sec Memory Limit: 128 MB Submit: 83 Solved: 29 [Submit][Status] ...
- Union与UnionAll
UNION指令的目的是将两个SQL语句的结果合并起来.从这个角度来看, 我们会产生这样的感觉,UNION跟JOIN似乎有些许类似,因为这两个指令都可以由多个表格中撷取资料. UNION的一个限制是两个 ...
- COCOS2d 标准 android.MK
LOCAL_PATH := $(call my-dir) include$(CLEAR_VARS) LOCAL_MODULE := game_shared PP_CPPFLAGS := -frtti ...
- ScrollView起始位置不是最顶部的解决办法
最近遇到了打开带有ScrollView的页面布局默认起始位置不是最顶部的情况,最后发现问题是因为ScrollView内部嵌套了gridview,只需要设置gridview获取焦点为false即可. g ...
- 利用.net4.0的dynamic特性制造的超级简单的微信SDK
1.基础支持API /*-------------------------------------------------------------------------- * BasicAPI.cs ...
- c++中resize这个函数怎么用
c++中序列式容器的一个共性函数, vv.resize(int n,element)表示调整容器vv的大小为n,扩容后的每个元素的值为element,默认为0 resize()会改变容器的容量和当前元 ...