基于sparksql调用shell脚本运行SQL

[Author]: kwu

基于sparksql调用shell脚本运行SQL，sparksql提供了类似hive中的 -e , -f ,-i的选项

1、定时调用脚本

#!/bin/sh

# upload logs to hdfs  

yesterday=`date --date='1 days ago' +%Y%m%d`  

/opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "\

insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE='01' )

  select t1.stockId as stockId,

         t1.url as url,

         t1.clickcnt as clickcnt,

         0,

         round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,

         '01' as type,

         t1.analysis_date as analysis_date,

         t1.analysis_time as analysis_time

    from (select stock_code stockId,

                 concat('http://stockdata.stock.hexun.com/', stock_code,'.shtml') url,

                 count(1) clickcnt,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1,10) analysis_date,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),12,8) analysis_time

            from dms.tracklog_5min

           where stock_type = 'STOCK'

             and day =

                 substr(from_unixtime(unix_timestamp(), 'yyyyMMdd'), 1, 8)

           group by stock_code

           order by clickcnt desc limit 20) t1

    left join (select stock_code stockId, count(1) clickcntyesday

                 from dms.tracklog_5min a

                where stock_type = 'STOCK'

                  and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1)

                  and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'), 12, 5)

                  and day = '${yesterday}'

                group by stock_code) t2

      on t1.stockId = t2.stockId;

  "\

sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by '\001' --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql内容为载入udf:

add jar /opt/bin/UDF/hive-udf.jar;

create temporary function udtf_stockidxfund as 'com.hexun.hive.udf.stock.UDTFStockIdxFund';

create temporary function udf_getbfhourstime as 'com.hexun.hive.udf.time.UDFGetBfHoursTime';

create temporary function udf_getbfhourstime2 as 'com.hexun.hive.udf.time.UDFGetBfHoursTime2';

create temporary function udf_stockidxfund as 'com.hexun.hive.udf.stock.UDFStockIdxFund';

create temporary function udf_md5 as 'com.hexun.hive.udf.common.HashMD5UDF';

create temporary function udf_murhash as 'com.hexun.hive.udf.common.HashMurUDF';

create temporary function udf_url as 'com.hexun.hive.udf.url.UDFUrl';

create temporary function url_host as 'com.hexun.hive.udf.url.UDFHost';

create temporary function udf_ip as 'com.hexun.hive.udf.url.UDFIP';

create temporary function udf_site as 'com.hexun.hive.udf.url.UDFSite';

create temporary function udf_UrlDecode as 'com.hexun.hive.udf.url.UDFUrlDecode';

create temporary function udtf_url as 'com.hexun.hive.udf.url.UDTFUrl';

create temporary function udf_ua as 'com.hexun.hive.udf.useragent.UDFUA';

create temporary function udf_ssh as 'com.hexun.hive.udf.useragent.UDFSSH';

create temporary function udtf_ua as 'com.hexun.hive.udf.useragent.UDTFUA';

create temporary function udf_kw as 'com.hexun.hive.udf.url.UDFKW';

create temporary function udf_chdecode as 'com.hexun.hive.udf.url.UDFChDecode';

设置ui的port

--conf spark.ui.port=4075

默觉得4040，会与其它正在跑的任务冲突，这里改动为4075

设定任务使用的内存与CPU资源

--executor-memory 6g --total-executor-cores 45

原来的语句是用hive
-e 运行的，改动为spark后速度大加快了。

原来为15min，提升速度后为 45s.

基于sparksql调用shell脚本运行SQL的更多相关文章

Java 调用 shell 脚本详解
这一年的项目中,有大量的场景需要Java 进程调用 Linux的bash shell 脚本实现相关功能. 从之前的项目中拷贝的相关模块和网上的例子来看,有个别的“陷阱”造成调用shell 脚本在某些特 ...
Centos下使用php调用shell脚本
我们在实际项目中或许会遇到php调用shell脚本的需求.下面就用简单案例在Centos环境下实践准备查看php.ini中配置是否打开安全模式 //php.ini safe_mode = //这个 ...
用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql
1:创建shell脚本 touch sqoop_options.sh chmod 777 sqoop_options.sh 编辑文件特地将执行map的个数设置为变量测试可以java代码传参数 ...
Python 调用 Shell脚本的方法
Python 调用 Shell脚本的方法 1．os模块的popen方法通过 os.popen() 返回的是 file read 的对象,对其进行读取 read() 的操作可以看到执行的输出. > ...
[Python]在python中调用shell脚本,并传入参数-02python操作shell实例
首先创建2个shell脚本文件,测试用. test_shell_no_para.sh 运行时,不需要传递参数 test_shell_2_para.sh 运行时,需要传递2个参数 test_shell ...
Spring Boot 实现看门狗功能 (调用 Shell 脚本)
需要实现看门狗功能,定时检测另外一个程序是否在运行,使用 crontab 仅可以实现检测程序是否正在运行,无法做到扩展,如:手动重启.程序升级(如果只需要实现自动升级功能可以使用 inotify)等功 ...
【原】Gradle调用shell脚本和python脚本并传参
最近由于项目自动化构建的需要,研究了下gradle调用脚本并传参的用法,在此作个总结. Pre build.gradle中定义了$jenkinsJobName $jenkinsBuild两个Jenki ...
调用shell脚本，IP处理
//调用shell脚本,IP处理 package com.letv.sdns.web.utils; import org.slf4j.Logger; import org.slf4j.LoggerFa ...
C程序调用shell脚本共有三种方法
C程序调用shell脚本共有三种法子 :system().popen().exec系列函数call_exec1.c ,内容为:system() 不用你自己去产生进程,它已经封装了,直接加入自己的命令e ...

随机推荐

ELF文件格式定义
ELF(Executable and Linking Format)是一种对象文件的格式,用于定义不同类型的对象文件(Object files)中都放了什么东西.以及都以什么样的格式去放这些东西.它自 ...
根据EXCEL模板填充数据
string OutFileName = typeName+"重点源达标率" + DateTime.Now.ToString("yyyy-MM-dd"); ...
操作系统-容器-引擎容器-百科：Docker
ylbtech-操作系统-容器-引擎容器-百科:Docker Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的 Linux 机器上 ...
HD-ACM算法专攻系列（17）——find your present (2)
题目描述: 源码: #include"iostream" #include"string" using namespace std; bool IsFirstH ...
web.config配置文件使用总结
我们在开发web系统的时候,使用web.config进行配置是司空见惯的,那么web.confg到底是什么呢?什么时候使用web.config呢?有几种使用web.config方式呢? 如果不太明白的 ...
jquery.zclip.js复制到剪切板
参考http://www.cnblogs.com/PeunZhang/p/3324727.html 需要引用jquery.zclip $("#id").zclip({ path: ...
depth peeling实现半透明
aaarticlea/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aH
ZOJ 3019 Puzzle
解题思路:给出两个数列an,bn,其中an,bn中元素的顺序可以任意改变,求an,bn的LCS 因为数列中的元素可以按任意顺序排列,所以只需要求出an,bn中的元素有多少个是相同的即可. 反思:一开始 ...
ZBrush常用快捷键
ZBrush是一款数字雕刻和绘画软件,它以强大的功能和直观的工作流程彻底改变了整个三维雕刻行业.强大的功能离不开便捷的操作,为此ZBrush®提供了一系列常用操作快捷键,熟练掌握这些快捷键,可帮助您节 ...
Kattis - Virtual Friends(并查集)
Virtual Friends These days, you can do all sorts of things online. For example, you can use various ...

基于sparksql调用shell脚本运行SQL

基于sparksql调用shell脚本运行SQL的更多相关文章

随机推荐

热门专题