基于sparksql调用shell脚本运行SQL

[Author]: kwu

基于sparksql调用shell脚本运行SQL，sparksql提供了类似hive中的 -e , -f ,-i的选项

1、定时调用脚本

#!/bin/sh

# upload logs to hdfs  

yesterday=`date --date='1 days ago' +%Y%m%d`  

/opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "\

insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE='01' )

  select t1.stockId as stockId,

         t1.url as url,

         t1.clickcnt as clickcnt,

         0,

         round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,

         '01' as type,

         t1.analysis_date as analysis_date,

         t1.analysis_time as analysis_time

    from (select stock_code stockId,

                 concat('http://stockdata.stock.hexun.com/', stock_code,'.shtml') url,

                 count(1) clickcnt,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1,10) analysis_date,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),12,8) analysis_time

            from dms.tracklog_5min

           where stock_type = 'STOCK'

             and day =

                 substr(from_unixtime(unix_timestamp(), 'yyyyMMdd'), 1, 8)

           group by stock_code

           order by clickcnt desc limit 20) t1

    left join (select stock_code stockId, count(1) clickcntyesday

                 from dms.tracklog_5min a

                where stock_type = 'STOCK'

                  and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1)

                  and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'), 12, 5)

                  and day = '${yesterday}'

                group by stock_code) t2

      on t1.stockId = t2.stockId;

  "\

sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by '\001' --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql内容为载入udf:

add jar /opt/bin/UDF/hive-udf.jar;

create temporary function udtf_stockidxfund as 'com.hexun.hive.udf.stock.UDTFStockIdxFund';

create temporary function udf_getbfhourstime as 'com.hexun.hive.udf.time.UDFGetBfHoursTime';

create temporary function udf_getbfhourstime2 as 'com.hexun.hive.udf.time.UDFGetBfHoursTime2';

create temporary function udf_stockidxfund as 'com.hexun.hive.udf.stock.UDFStockIdxFund';

create temporary function udf_md5 as 'com.hexun.hive.udf.common.HashMD5UDF';

create temporary function udf_murhash as 'com.hexun.hive.udf.common.HashMurUDF';

create temporary function udf_url as 'com.hexun.hive.udf.url.UDFUrl';

create temporary function url_host as 'com.hexun.hive.udf.url.UDFHost';

create temporary function udf_ip as 'com.hexun.hive.udf.url.UDFIP';

create temporary function udf_site as 'com.hexun.hive.udf.url.UDFSite';

create temporary function udf_UrlDecode as 'com.hexun.hive.udf.url.UDFUrlDecode';

create temporary function udtf_url as 'com.hexun.hive.udf.url.UDTFUrl';

create temporary function udf_ua as 'com.hexun.hive.udf.useragent.UDFUA';

create temporary function udf_ssh as 'com.hexun.hive.udf.useragent.UDFSSH';

create temporary function udtf_ua as 'com.hexun.hive.udf.useragent.UDTFUA';

create temporary function udf_kw as 'com.hexun.hive.udf.url.UDFKW';

create temporary function udf_chdecode as 'com.hexun.hive.udf.url.UDFChDecode';

设置ui的port

--conf spark.ui.port=4075

默觉得4040，会与其它正在跑的任务冲突，这里改动为4075

设定任务使用的内存与CPU资源

--executor-memory 6g --total-executor-cores 45

原来的语句是用hive
-e 运行的，改动为spark后速度大加快了。

原来为15min，提升速度后为 45s.

基于sparksql调用shell脚本运行SQL的更多相关文章

Java 调用 shell 脚本详解
这一年的项目中,有大量的场景需要Java 进程调用 Linux的bash shell 脚本实现相关功能. 从之前的项目中拷贝的相关模块和网上的例子来看,有个别的“陷阱”造成调用shell 脚本在某些特 ...
Centos下使用php调用shell脚本
我们在实际项目中或许会遇到php调用shell脚本的需求.下面就用简单案例在Centos环境下实践准备查看php.ini中配置是否打开安全模式 //php.ini safe_mode = //这个 ...
用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql
1:创建shell脚本 touch sqoop_options.sh chmod 777 sqoop_options.sh 编辑文件特地将执行map的个数设置为变量测试可以java代码传参数 ...
Python 调用 Shell脚本的方法
Python 调用 Shell脚本的方法 1．os模块的popen方法通过 os.popen() 返回的是 file read 的对象,对其进行读取 read() 的操作可以看到执行的输出. > ...
[Python]在python中调用shell脚本,并传入参数-02python操作shell实例
首先创建2个shell脚本文件,测试用. test_shell_no_para.sh 运行时,不需要传递参数 test_shell_2_para.sh 运行时,需要传递2个参数 test_shell ...
Spring Boot 实现看门狗功能 (调用 Shell 脚本)
需要实现看门狗功能,定时检测另外一个程序是否在运行,使用 crontab 仅可以实现检测程序是否正在运行,无法做到扩展,如:手动重启.程序升级(如果只需要实现自动升级功能可以使用 inotify)等功 ...
【原】Gradle调用shell脚本和python脚本并传参
最近由于项目自动化构建的需要,研究了下gradle调用脚本并传参的用法,在此作个总结. Pre build.gradle中定义了$jenkinsJobName $jenkinsBuild两个Jenki ...
调用shell脚本，IP处理
//调用shell脚本,IP处理 package com.letv.sdns.web.utils; import org.slf4j.Logger; import org.slf4j.LoggerFa ...
C程序调用shell脚本共有三种方法
C程序调用shell脚本共有三种法子 :system().popen().exec系列函数call_exec1.c ,内容为:system() 不用你自己去产生进程,它已经封装了,直接加入自己的命令e ...

随机推荐

WebGL 权威资源站小聚
WebGL 权威资源站小聚太阳火神的漂亮人生 (http://blog.csdn.net/opengl_es) 本文遵循"署名-非商业用途-保持一致"创作公用协议转载请保留此句 ...
深入理解 C 指针阅读笔记 -- 第五章
Chapter5.h #ifndef __CHAPTER_5_ #define __CHAPTER_5_ /*<深入理解C指针>学习笔记 -- 第五章*/ /*不应该改动的字符串就应该用 ...
Android系统驱动【转】
本文转载自:http://www.hovercool.com/en/%E6%B7%BB%E5%8A%A0%E9%A9%B1%E5%8A%A8%E6%A8%A1%E5%9D%97#a_.E5.9B.9B ...
操作系统-容器-引擎容器-百科：Docker
ylbtech-操作系统-容器-引擎容器-百科:Docker Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的 Linux 机器上 ...
php.ini控制文件上传大小配置项
; Whether to allow HTTP file uploads.file_uploads = On ; Temporary directory for HTTP uploaded files ...
转:Eclipse上安装GIT插件EGit及使用
一.Eclipse上安装GIT插件EGit Eclipse的版本eclipse-java-helios-SR2-win32.zip(在Eclipse3.3版本找不到对应的 EGit插件,无法安装) E ...
web.config or app.config 中configSections配置节点
以前还真没见过,今天看项目中有在用,简单写了个Demo,这样配置的好处就是可以自定义配置,更加模块化,直接上代码; 1.配置文件由于我创建的是一个控制台项目,所以配置文件是App.Config:(这 ...
layer最大话.最小化.还原回调方法使用
<head> <meta charset="UTF-8"> <title>layer最大话.最小化.还原回调方法使用</title> ...
vue2.0中关于active-class
一.首先,active-class是什么, active-class是vue-router模块的router-link组件中的属性,用来做选中样式的切换: 相关可查阅文档:https://router ...
input输入框只允许输入数字
/* input输入框只允许输入数字*/ <input type="text" onkeypress="keyPress()" > function ...

基于sparksql调用shell脚本运行SQL

基于sparksql调用shell脚本运行SQL的更多相关文章

随机推荐

热门专题