需求处理步骤

原始数据->json->过滤->列裁剪

需求二:求各省市的各个指标

原始数据

文本pmt.json,每一行都是一个json字符串。里面包含ip等信息

{"sessionid":"4KT69Su8FavGfydclctzpUBQwYfRT0KW","advertisersid":19,"adorderid":188182,"adcreativeid":2123233,"adplatformproviderid":353466,"sdkversion":"Android 5.0","adplatformkey":"hUHtJfmzYgkKQmBgS1XnmIwT1lwWJZis","putinmodeltype":1,"requestmode":1,"adprice":4295.0,"adppprice":5153.0,"requestdate":"2018-10-06","ip":"106.82.41.165","appid":"XRX1000057","appname":"Face++","uuid":"PXUw6oNaBbOLgE4qHzy1eRR0AP6zl0LC","device":"BLACK BARRY","client":4,"osversion":"","density":"","pw":960,"ph":640,"lang":"","lat":"","provincename":"","cityname":"","ispid":46003,"ispname":"电信","networkmannerid":0,"networkmannername":"WIFI","iseffective":1,"isbilling":1,"adspacetype":3,"adspacetypename":"全屏","devicetype":1,"processnode":1,"apptype":0,"district":"district","paymode":1,"isbid":1,"bidprice":4884.0,"winprice":74754.0,"iswin":1,"cur":"rmb","rate":0.0,"cnywinprice":0.0,"imei":"","mac":"52:54:00:b4:e6:10","idfa":"JOQYVMIIPWAEKPHZRDZNCDLJIUZFSBLZ","openudid":"","androidid":"","rtbprovince":"","rtbcity":"","rtbdistrict":"","rtbstreet":"","storeurl":"","realip":"222.89.26.142","isqualityapp":0,"bidfloor":0.0,"aw":0,"ah":0,"imeimd5":"","macmd5":"","idfamd5":"","openudidmd5":"","androididmd5":"","imeisha1":"","macsha1":"","idfasha1":"","openudidsha1":"","androididsha1":"","uuidunknow":"","userid":"YRNo94gOpa3hCANOpFhUBUpQKWfkDblZ","reqdate":null,"reqhour":null,"iptype":1,"initbidprice":0.0,"adpayment":171671.0,"agentrate":0.0,"lomarkrate":0.0,"adxrate":0.0,"title":"非常经典的句子:没有改变不了的未来,只有不想改变的过去","keywords":"莲藕,甘蔗,辣椒,美文","tagid":"bCvpm912U8soUBaF6QxAIC0PXn4E0KD3","callbackdate":"2018-10-06","channelid":"123489","mediatype":1,"email":"2ki0i@hotmail.com","tel":"13404821298","age":"54","sex":"1"}
{"sessionid":"retXIU76Vpp8VZzc7uQvDtObLjRHLtRe","advertisersid":81,"adorderid":140687,"adcreativeid":2321312,"adplatformproviderid":1036820,"sdkversion":"IOS 11.2","adplatformkey":"U8oDbQfH66KCkAtU092evNM1OLvlIQcK","putinmodeltype":1,"requestmode":2,"adprice":7402.0,"adppprice":3552.0,"requestdate":"2018-10-06","ip":"123.234.117.194","appid":"XRX1000033","appname":"蝉大师","uuid":"xExEur14ellSeYq1wbsDzmw9aMTcW6BU","device":"IPHONE6","client":2,"osversion":"","density":"","pw":1334,"ph":750,"lang":"","lat":"","provincename":"","cityname":"","ispid":46000,"ispname":"移动","networkmannerid":3,"networkmannername":"2G","iseffective":1,"isbilling":1,"adspacetype":2,"adspacetypename":"插屏","devicetype":1,"processnode":3,"apptype":0,"district":"district","paymode":1,"isbid":1,"bidprice":9514.0,"winprice":48180.0,"iswin":1,"cur":"rmb","rate":0.0,"cnywinprice":0.0,"imei":"778207196118215","mac":"52:54:00:a0:6b:b1","idfa":"","openudid":"","androidid":"","rtbprovince":"","rtbcity":"","rtbdistrict":"","rtbstreet":"","storeurl":"","realip":"210.41.145.252","isqualityapp":0,"bidfloor":0.0,"aw":0,"ah":0,"imeimd5":"","macmd5":"","idfamd5":"","openudidmd5":"","androididmd5":"","imeisha1":"","macsha1":"","idfasha1":"","openudidsha1":"","androididsha1":"","uuidunknow":"","userid":"G2KTkDDjamgwbP5uFngqzZPplfesjRQ4","reqdate":null,"reqhour":null,"iptype":1,"initbidprice":0.0,"adpayment":69642.0,"agentrate":0.0,"lomarkrate":0.0,"adxrate":0.0,"title":"非常经典的句子:生活中的点点滴滴,都是因果关系","keywords":"美文","tagid":"lvcFc7R4YuaOzOPZ0W3QDBgClZVCIkWk","callbackdate":"2018-10-06","channelid":"123500","mediatype":2,"email":"lgd8554@hotmail.com","tel":"13704892122","age":"35","sex":"0"}

需求说明

期望数据

按照省市为单位,输出各个指标信息

1.导入依赖

    <dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.0.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

2.代码

package com.bigdata.scala.homework

import scala.io.Source
import com.alibaba.fastjson.{JSON, JSONObject}
import org.apache.commons.httpclient.HttpClient
import org.apache.commons.httpclient.methods.GetMethod /**
* @description: TODO
* @author: HaoWu
* @create: 2020年07月25日
*/
object HomeWork {
def main(args: Array[String]): Unit = {
//1.读文件转为list
val source = Source.fromFile("C:\\Users\\HaoWu\\Desktop\\pmt.json", "utf-8").getLines().toList
//2.转为json数据
val data = source.map(jsonStr => {
val json = JSON.parseObject(jsonStr)
val ip = json.getString("ip")
val provincename = json.getString("provincename")
val cityname = json.getString("cityname")
val adplatformproviderid = json.getLong("adplatformproviderid")
val requestmode = json.getInteger("requestmode")
val processnode = json.getInteger("processnode")
val iseffective = json.getInteger("iseffective")
val isbilling = json.getInteger("isbilling")
val isbid = json.getInteger("isbid")
val iswin = json.getInteger("iswin")
val adorderid = json.getLong("adorderid")
val adcreativeid = json.getLong("adplatformproviderid")
val winprice = json.getDouble("winprice")
val adpayment = json.getDouble("adpayment")
(ip, provincename, cityname, adplatformproviderid, requestmode, processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment)
})
//3.过滤出ip为空的数据
.filter({
case (ip, provincename, cityname, adplatformproviderid, requestmode, processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment) => "" != ip && null != ip
})
//4.//发起http请求根据ip获取省、市,返回省、市...信息
.map({
case (ip, provincename, cityname, adplatformproviderid, requestmode, processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment) => {
val client = new HttpClient()
val url = s"https://restapi.amap.com/v3/ip?ip=${ip}&key=f75418e64363b8a96d3565108638c5f1"
val method = new GetMethod(url)
val code = client.executeMethod(method)
var provincename = ""
var cityname = ""
if (code == 200) {
val responseBodyAsString = method.getResponseBodyAsString
provincename = JSON.parseObject(responseBodyAsString).getString("province")
cityname = JSON.parseObject(responseBodyAsString).getString("city")
}
//
(provincename, cityname, adplatformproviderid, requestmode, processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment)
}
}) //List(([],[],804821,1,2,1,1,0,0,31547,804821,26153.0,37318.0), (陕西省,西安市,215884,3,3,1,1,1,0,167967,215884,14094.0,46195.0), (上海市,上海市,405441,1,3,1,1,0,1,52433,405441,22976.0,103778.0),
//5.//过滤出省、市不为空的数据
.filter({
case (provincename, cityname, adplatformproviderid, requestmode,
processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment) =>
"" != provincename && null != provincename && "" != cityname && null != cityname
}) //List((安徽省,合肥市,213685,2,3,1,1,1,0,41256,213685,7601.0,11174.0), (陕西省,西安市,580944,3,2,1,1,1,1,29928,580944,69692.0,91727.0),...) //6.按照省市分组
val result = data.groupBy(x => (x._1, x._2)) //((安徽省,合肥市),List((安徽省,合肥市,213685,2,3,1,1,1,0,41256,213685,7601.0,11174.0),(安徽省,合肥市,213685,2,3,1,1,1,0,41256,213685,7601.0,11174.0),..))
.map({
y => {
//省、市
val province_city = y._1
//原始请求数
val requestAcount = y._2.filter({
case (provincename, cityname, adplatformproviderid, requestmode,
processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment) => (requestmode == 1 && processnode >= 1)
case _ => false
}).size
//广告消费
val advertConsume = y._2.filter({
case (provincename, cityname, adplatformproviderid, requestmode,
processnode, iseffective, isbilling, isbid, iswin, adorderid, adcreativeid, winprice, adpayment) =>
(adplatformproviderid >= 100000 && iseffective == 1 && isbilling == 1 && iswin == 1 && adorderid > 200000 && adcreativeid > 2000000)
}).map(x => {
x._13 / 1000
}).sum
(province_city, requestAcount, advertConsume)
}
})
println(result) //List(((四川省,泸州市),1,0.0), ((宁夏回族自治区,吴忠市),0,0.0), ((上海市,上海市),54,0.0), ((吉林省,松原市),2,0.0)...)
}
}

Scala【需求二:求各省市的各个指标】的更多相关文章

  1. Scala学习二十一——隐式转换和隐式参数

    一.本章要点 隐式转换用于类型之间的转换 必须引入隐式转换,并确保它们可以以单个标识符的形式出现在当前作用域 隐式参数列表会要求指定类型的对象.它们可以从当前作用域中以单个标识符定义的隐式对象的获取, ...

  2. Java 8 vs. Scala(二):Stream vs. Collection

    [编者按]在之前文章中,我们介绍了 Java 8和Scala的Lambda表达式对比.在本文,将进行 Hussachai Puripunpinyo Java 和 Scala 对比三部曲的第二部分,主要 ...

  3. spark als scala实现(二)

    Vi  t1.txt1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.5 ...

  4. Scala集合(二)

    将函数映射到集合 map方法 val names = List("Peter" , "Paul", "Mary") names.map(_. ...

  5. Scala学习(二)--- 控制结构和函数

    控制结构和函数 摘要: 本篇主要学习在Scala中使用条件表达式.循环和函数,你会看到Scala和其他编程语言之间一个根本性的差异.在Java或C++中,我们把表达式(比如3+4)和语句(比如if语句 ...

  6. Scala学习二十二——定界延续

    一.本章要点 延续让你可以回到程序执行当中之前的某个点; 可以在shift块中捕获延续 延续函数一直延展到包含它的reset块的尾部 延续所谓的”余下的运算“,从包含shift的表达式开始,到包含它的 ...

  7. Scala学习二十——Actor

    一.本章要点 每个actor都要扩展Actor类并提供act方法 要往actor发送消息,可以用actor!message 消息发送是异步的:”发完就忘“ 要接受消息,actor可以调用receive ...

  8. Scala(二)——基础语法(与Java的区分)和函数式编程

    Scala快速入门(二) 一.键盘输入 关于基本类型的运算,以及复制运算,条件运算,运算符等知识,均和Java语言一样,这里不过多叙述. val name = StdIn.readLine() Std ...

  9. Scala学习二——控制结构和函数

    一.if表达式有值 val s=if(x>0) 1 else -1,相当于Java中x>0?1:-1(不过不拿呢个在?:中插入语句),而且Scala中可以用混合类型(如if (x>0 ...

随机推荐

  1. 矩形覆盖 牛客网 剑指Offer

    矩形覆盖 牛客网 剑指Offer 题目描述 我们可以用21的小矩形横着或者竖着去覆盖更大的矩形.请问用n个21的小矩形无重叠地覆盖一个2*n的大矩形,总共有多少种方法? class Solution: ...

  2. Typora 快捷方式

    1.标题编写 方法一:几个#号 代表几级标题  (共6级) 方法二:ctrl +1 .2.3.4.5.6 2.如何编写子标题 第一种:无序子标题(无序列表) *号  +  空格书写标题文本   (输入 ...

  3. java线程同步以及对象锁和类锁解析(多线程synchronized关键字)

    一.关于线程安全 1.是什么决定的线程安全问题? 线程安全问题基本是由全局变量及静态变量引起的. 若每个线程中对全局变量.静态变量只有读操作,而无写操作,一般来说,这个全局变量是线程安全的:若有多个线 ...

  4. svg的animate动画动态加载删除遇到删除animate后再次加载的animate动画没有效果问题

    svg上有多个圆圈,当选中特定圆圈后给其加上animate动画效果,并把其他圆圈的animate效果去除. 第一次选择一个点实现动画效果完全达到效果,因为是第一次所以不需要把其他圆圈的animate子 ...

  5. PAT甲级1074 Reversing Linked List (25分)

    [程序思路] 先根据地址按顺序读入节点,入栈,当栈里的元素个数等于k时全部出栈,并按出栈顺序保存,最后若栈不为空,则全部出栈并按出栈的稀饭顺序保存,最后输出各节点 注意:输入的节点中有可能存在无用节点 ...

  6. k8s 测试环境搭建

    # 安装kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux  # 官方安装 ...

  7. python有关于图像的深度和通道

    目录: (一)图像的深度和图像的通道  (1)图像的深度  (2)图像的通道 (二)自定义一张多通道的图片 (1)zeros 函数 (2)ones  函数 (三)自定义一张单通道的图片 (四)像素操作 ...

  8. Redis的ACID属性

    事务是数据库的一个重要属性,有关事务的4个特性,原子性.一致性.隔离性.持久性,也就是ACID,这些属性既包含了对事务执行结果的要求,也有数据库在事务执行前后的数据状态变化的要求. Redis可以完全 ...

  9. [luogu5464]缩小社交圈

    不难证明合法当且仅当满足一下两个条件: 1.每一个位置最多被覆盖两次(无环) 2.将选择的区间按左端点从小到大排序,对于每一个左端点,其之前的区间的最大右端点不小于其(连通) (关于第一个的充分性证明 ...

  10. [nowcoder5667K]Keyboard Free

    不妨设$r1\le r2\le r3$,令$f(\alpha)=E(S_{\Delta}ABC)$,其中AB坐标分别为$(r_{1},0)$和$(r_{2}\cos \alpha,r_{2}\sin ...