Spark教程——(4)Spark-shell调用SQLContext(HiveContext)
启动Spark-shell:
[root@node1 ~]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc (master = yarn-client, app id = application_1554951897984_0111).
SQL context available as sqlContext.
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@272485a6
scala> sqlContext
res1: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@11c95035
上下文已经包含 sc 和 sqlContext:
Spark context available as sc (master = yarn-client, app id = application_1554951897984_0111). SQL context available as sqlContext.
本地创建people07041119.json
{"name":"zhangsan","job number":"101","age":33,"gender":"male","deptno":1,"sal":18000}
{"name":"lisi","job number":"102","age":30,"gender":"male","deptno":2,"sal":20000}
{"name":"wangwu","job number":"103","age":35,"gender":"female","deptno":3,"sal":50000}
{"name":"zhaoliu","job number":"104","age":31,"gender":"male","deptno":1,"sal":28000}
{"name":"tianqi","job number":"105","age":36,"gender":"female","deptno":3,"sal":90000}
本地创建dept.json
{"name":"development","deptno":1}
{"name":"personnel","deptno":2}
{"name":"testing","deptno":3}
将本地文件上传到HDFS上:
bash-4.2$ hadoop dfs -put /home/**/data/people07041119.json /user/** bash-4.2$ hadoop dfs -put /home/**/data/dept.json /user/**
结果如下:

执行Scala脚本,加载文件:
scala> val people=sqlContext.jsonFile("/user/**/people07041119.json")
warning: there were deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [age: bigint, deptno: bigint, gender: string, job number: string, name: string, sal: bigint]
scala> val dept=sqlContext.jsonFile("/user/**/dept.json")
warning: there were deprecation warning(s); re-run with -deprecation for details
people: org.apache.spark.sql.DataFrame = [deptno: bigint, name: string]
执行Scala脚本,查看文件内容:
scala> people.show +---+------+------+----------+--------+-----+ |age|deptno|gender|job number| name| sal| +---+------+------+----------+--------+-----+ | | | male| |zhangsan|| | | | male| | lisi|| | | |female| | wangwu|| | | | male| | zhaoliu|| | | |female| | tianqi|| +---+------+------+----------+--------+-----+
显示前三条记录:
scala> people.show() +---+------+------+----------+--------+-----+ |age|deptno|gender|job number| name| sal| +---+------+------+----------+--------+-----+ | | | male| |zhangsan|| | | | male| | lisi|| | | |female| | wangwu|| +---+------+------+----------+--------+-----+ only showing top rows
查看列信息:
scala> people.columns res5: Array[String] = Array(age, deptno, gender, job number, name, sal)
添加过滤条件:
scala> people.filter("gender='male'").count
res6: Long =
参考:
https://blog.csdn.net/xiaolong_4_2/article/details/80886371
Spark教程——(4)Spark-shell调用SQLContext(HiveContext)的更多相关文章
- spark教程(二)-shell操作
spark 支持 shell 操作 shell 主要用于调试,所以简单介绍用法即可 支持多种语言的 shell 包括 scala shell.python shell.R shell.SQL shel ...
- spark教程(八)-SparkSession
spark 有三大引擎,spark core.sparkSQL.sparkStreaming, spark core 的关键抽象是 SparkContext.RDD: SparkSQL 的关键抽象是 ...
- spark教程(11)-sparkSQL 数据抽象
数据抽象 sparkSQL 的数据抽象是 DataFrame,df 相当于表格,它的每一行是一条信息,形成了一个 Row Row 它是 sparkSQL 的一个抽象,用于表示一行数据,从表现形式上看, ...
- spark教程(四)-SparkContext 和 RDD 算子
SparkContext SparkContext 是在 spark 库中定义的一个类,作为 spark 库的入口点: 它表示连接到 spark,在进行 spark 操作之前必须先创建一个 Spark ...
- Spark教程——(11)Spark程序local模式执行、cluster模式执行以及Oozie/Hue执行的设置方式
本地执行Spark SQL程序: package com.fc //import common.util.{phoenixConnectMode, timeUtil} import org.apach ...
- spark教程
某大神总结的spark教程, 地址 http://litaotao.github.io/introduction-to-spark?s=inner
- spark教程(七)-文件读取案例
sparkSession 读取 csv 1. 利用 sparkSession 作为 spark 切入点 2. 读取 单个 csv 和 多个 csv from pyspark.sql import Sp ...
- spark教程(一)-集群搭建
spark 简介 建议先阅读我的博客 大数据基础架构 spark 一个通用的计算引擎,专门为大规模数据处理而设计,与 mapreduce 类似,不同的是,mapreduce 把中间结果 写入 hdfs ...
- Spark教程——(10)Spark SQL读取Phoenix数据本地执行计算
添加配置文件 phoenixConnectMode.scala : package statistics.benefits import org.apache.hadoop.conf.Configur ...
- 一、spark入门之spark shell:wordcount
1.安装完spark,进入spark中bin目录: bin/spark-shell scala> val textFile = sc.textFile("/Users/admin/ ...
随机推荐
- No space left on device(总结)
..1 提示磁盘满了 df -hT 没有满 请问可能原因 可能是inode满了,原因是机器上的小文件太多了 使用df -hi 查看 ..2 提示没有磁盘空间已经满了 ..2.1 问题描述: 发现是日志 ...
- laravel npm安装yarn
npm install -g yarn yarn install --no-bin-links yarn add cross-env npm run dev Laravel 默认集成了一些 NPM 扩 ...
- Java Web 笔记(杂)
Java Web 概述 什么是Java Web 在Sun的Java Servlet 规范中,对Java Web 应用做了这样的定义: "Java Web" 应用由一组Servlet ...
- oracle用户密码忘记怎么修改
安装完数据库很久不用常常会忘记其密码,碰到这种情况不要动不动就重装数据库,按其下方法修改即可. 一:忘记sys,system用户的密码 1,在开始菜单点击‘运行’,输入‘cmd’,打开命令提示窗口,输 ...
- js中数组的循环与遍历forEach,map
对于前端的循环遍历我们知道有 针对js数组的forEach().map().filter().reduce()方法 针对js对象的for/in语句(for/in也能遍历数组,但不推荐) 针对jq数组/ ...
- UVA10723 电子人的基因
UVA10723 电子人的基因 题目比较难找附上链接:https://vjudge.net/problem/UVA-10723 题目描述: 给你两个字符串,你需要找出一个最短的字符串,使得两个给定字符 ...
- mysqld: Can't change dir to 'D:\TONG\mysql-5.7.19-winx64\data\' (Errcode: 2 - No such file or directory)
mysqld: Can't change dir to 'D:\TONG\mysql-5.7.19-winx64\data\' (Errcode: 2 - No such file or direct ...
- PLSQL报错: ORA-12170:TNS connect timeout occurred
本人的问题已解决,先在安装oracle的服务器上黑窗口输入tnsping,提示说no listener,这是监听服务没有打开. 打开服务后还是不行,最后原因是服务器的网络有防火墙的问题,关掉防火墙连接 ...
- Python - python里有类似Java的接口(interface)吗?
参考 https://stackoverflow.com/questions/2124190/how-do-i-implement-interfaces-in-python https://stack ...
- Python 爬取 热词并进行分类数据分析-[热词分类+目录生成]
日期:2020.02.04 博客期:143 星期二 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[ ...