http://192.168.2.51:4041

http://hadoop1:8088/proxy/application_1512362707596_0006/executors/

Executors

Summary

 
  RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Blacklisted
Active(3) 54 1.4 GB / 1.2 GB 700.1 MB 2 50 0 22 72 6.5 min (2 s) 0.0 B 0.0 B 0.0 B 0
Dead(0) 0 0.0 B / 0.0 B 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B 0
Total(3) 54 1.4 GB / 1.2 GB 700.1 MB 2 50 0 22 72 6.5 min (2 s) 0.0 B 0.0 B 0.0 B 0
 

Executors

Show 
20
40
60
100
All
 entries
Search:
Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Logs Thread Dump
driver 192.168.2.51:52491 Active 2 5.7 KB / 384.1 MB 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B   Thread Dump
2 hadoop2:33018 Active 26 729.5 MB / 384.1 MB 348.1 MB 1 25 0 11 36 2.6 min (1 s) 0.0 B 0.0 B 0.0 B Thread Dump
1 hadoop1:53695 Active 26 700.1 MB / 384.1 MB 352 MB 1 25 0 11 36 3.9 min (0.9 s) 0.0 B 0.0 B 0.0 B Thread Dump
from pyspark.sql import SparkSession

my_spark = SparkSession \
.builder \
.appName("myAppYarn-10g") \
.master('yarn') \
.config("spark.mongodb.input.uri", "mongodb://pyspark_admin:admin123@192.168.2.50/recommendation.article") \
.config("spark.mongodb.output.uri", "mongodb://pyspark_admin:admin123@192.168.2.50/recommendation.article") \
.getOrCreate() db_rows = my_spark.read.format("com.mongodb.spark.sql.DefaultSource").load().collect()

Summary

 
  RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Blacklisted
Active(3) 31 748.4 MB / 1.2 GB 75.7 MB 2 27 0 0 27 0 ms (0 ms) 0.0 B 0.0 B 0.0 B 0
Dead(2) 56 1.5 GB / 768.2 MB 790.3 MB 2 0 0 77 77 2.7 h (2 s) 0.0 B 0.0 B 0.0 B 0
Total(5) 87 2.3 GB / 1.9 GB 865.9 MB 4 27 0 77 104 2.7 h (2 s) 0.0 B 0.0 B 0.0 B 0
 

Executors

Show 
20
40
60
100
All
 entries
Search:
Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Logs Thread Dump
driver 192.168.2.51:52491 Active 2 5.7 KB / 384.1 MB 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B   Thread Dump
4 hadoop2:34394 Active 12 315.9 MB / 384.1 MB 0.0 B 1 11 0 0 11 0 ms (0 ms) 0.0 B 0.0 B 0.0 B Thread Dump
3 hadoop1:39620 Active 17 432.5 MB / 384.1 MB 75.7 MB 1 16 0 0 16 0 ms (0 ms) 0.0 B 0.0 B 0.0 B Thread Dump
2 hadoop2:33018 Dead 27 758.7 MB / 384.1 MB 390.4 MB 1 0 0 38 38 1.3 h (1 s) 0.0 B 0.0 B 0.0 B Thread Dump
1 hadoop1:53695 Dead 29 775.9 MB / 384.1 MB 399.9 MB 1 0 0 39 39 1.4 h (0.9 s) 0.0 B 0.0 B 0.0 B Thread Dump
Showing 1 to 5 of 5 entries
 
 
Logs for container_1512362707596_0006_02_000002 http://hadoop1:8042/node/containerlogs/container_1512362707596_0006_02_000002/root/stderr?start=-4096
 
 
 
 

Logs for container_1512362707596_0006_02_000002

 

ResourceManager

NodeManager

Tools

Showing 4096 bytes. Click here for full log

Manager: Dropping block taskresult_48 from memory
17/12/04 13:14:32 INFO storage.BlockManager: Writing block taskresult_48 to disk
17/12/04 13:14:32 INFO memory.MemoryStore: After dropping 1 blocks, free memory is 38.5 MB
17/12/04 13:14:32 INFO memory.MemoryStore: Block taskresult_73 stored as bytes in memory (estimated size 32.5 MB, free 6.1 MB)
17/12/04 13:14:32 INFO executor.Executor: Finished task 72.0 in stage 1.0 (TID 73). 34033291 bytes result sent via BlockManager)
17/12/04 13:14:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 74
17/12/04 13:14:32 INFO executor.Executor: Running task 73.0 in stage 1.0 (TID 74)
17/12/04 13:14:38 INFO memory.MemoryStore: 1 blocks selected for dropping (16.0 MB bytes)
17/12/04 13:14:38 INFO storage.BlockManager: Dropping block taskresult_50 from memory
17/12/04 13:14:38 INFO storage.BlockManager: Writing block taskresult_50 to disk
17/12/04 13:14:38 INFO memory.MemoryStore: After dropping 1 blocks, free memory is 22.1 MB
17/12/04 13:14:38 INFO memory.MemoryStore: Block taskresult_74 stored as bytes in memory (estimated size 14.4 MB, free 7.7 MB)
17/12/04 13:14:38 INFO executor.Executor: Finished task 73.0 in stage 1.0 (TID 74). 15083225 bytes result sent via BlockManager)
17/12/04 13:14:38 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 75
17/12/04 13:14:38 INFO executor.Executor: Running task 74.0 in stage 1.0 (TID 75)
17/12/04 13:14:46 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.2 KB, free 7.7 MB)
17/12/04 13:14:46 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 433.0 B, free 7.7 MB)
17/12/04 13:14:48 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/04 13:14:48 ERROR executor.Executor: Exception in task 74.0 in stage 1.0 (TID 75)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:239)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:403)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/12/04 13:14:48 INFO connection.MongoClientCache: Closing MongoClient: [192.168.2.50:27017]
17/12/04 13:14:48 INFO driver.connection: Closed connection [connectionId{localValue:4, serverValue:42}] to 192.168.2.50:27017 because the pool has been closed.
 
 
 

spark 33G表的更多相关文章

  1. 基于spark实现表的join操作

    1. 自连接 假设存在如下文件: [root@bluejoe0 ~]# cat categories.csv 1,生活用品,0 2,数码用品,1 3,手机,2 4,华为Mate7,3 每一行的格式为: ...

  2. 利用spark将表中数据拆分

    i# coding:utf-8from pyspark.sql import SparkSession import os if __name__ == '__main__': os.environ[ ...

  3. spark使用Hive表操作

    spark Hive表操作 之前很长一段时间是通过hiveServer操作Hive表的,一旦hiveServer宕掉就无法进行操作. 比如说一个修改表分区的操作 一.使用HiveServer的方式 v ...

  4. Databricks 第6篇:Spark SQL 维护数据库和表

    Spark SQL 表的命名方式是db_name.table_name,只有数据库名称和数据表名称.如果没有指定db_name而直接引用table_name,实际上是引用default 数据库下的表. ...

  5. Spark SQL概念学习系列之如何使用 Spark SQL(六)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc) // 在这里引入 sqlContext 下所有的方法就可以直接用 sql 方法进行查询 ...

  6. spark基础知识介绍2

    dataframe以RDD为基础的分布式数据集,与RDD的区别是,带有Schema元数据,即DF所表示的二维表数据集的每一列带有名称和类型,好处:精简代码:提升执行效率:减少数据读取; 如果不配置sp ...

  7. 新手福利:Apache Spark入门攻略

    [编者按]时至今日,Spark已成为大数据领域最火的一个开源项目,具备高性能.易于使用等特性.然而作为一个年轻的开源项目,其使用上存在的挑战亦不可为不大,这里为大家分享SciSpike软件架构师Ash ...

  8. Spark入门之DataFrame/DataSet

    目录 Part I. Gentle Overview of Big Data and Spark Overview 1.基本架构 2.基本概念 3.例子(可跳过) Spark工具箱 1.Dataset ...

  9. 6.3 使用Spark SQL读写数据库

    Spark SQL可以支持Parquet.JSON.Hive等数据源,并且可以通过JDBC连接外部数据源 一.通过JDBC连接数据库 1.准备工作 ubuntu安装mysql教程 在Linux中启动M ...

随机推荐

  1. mock数据。根据表中一天的数据模拟其他日期的数据

    package test; import java.sql.*; import java.text.SimpleDateFormat; import java.util.*; import java. ...

  2. css样式---隐藏元素

    1.通过设置width:0;或height:0 2.将元素的opacity设置成0 3.通过定位将元素移出屏幕范围 4.通过text-indent实现隐藏文字的效果 5.通过z-index隐藏一个元素 ...

  3. Redis命令行之String

    一.Redis之String简介 1. String是redis最基本的数据类型,一个key对应一个value. 2. String是二进制安全的,可以包含任何数据,例如图片或序列化的对象. 3. S ...

  4. IntelliJ IDEA删除项目

    删除项目一向比较奇葩,因为当你点击到该项目名称右键时,并没有delete选项,导致我们不知道怎么删除,查找多方文档,得到以下解决: 1.将鼠标移到要删除的项目名称上,单击并按“Delete”按钮删除项 ...

  5. 修复OS X的Finder中文档 打开方式中重复程序的问题

    如上图,OS X在使用一段时间后,有些软件就会重复注册打开方式,对于有洁癖的人,这是难以接受的事. 不过有个命令可以很简单的把重复项给去掉. /System/Library/Frameworks/Co ...

  6. 游戏server主程白皮书-序言

    在从事游戏开发的6年时间里面.涉及的内容包含运营平台.GM工具.MMORPG.FPS游戏. 游戏都已经上线而且稳定执行.单server的承载量在1万-5万之间.对于这种成绩我自己还是比較惬意了.期间得 ...

  7. 【Todo】Java8新特性学习

    参考这篇文章吧: http://blog.csdn.net/vchen_hao/article/details/53301073  还有一个系列

  8. SSH login without password

    SSH login without password Your aim You want to use Linux and OpenSSH to automize your tasks. Theref ...

  9. IDEA中Thrift插件配置

    方法一:直接在IDEA界面中配置 打开IDEA的插件中心,搜索 Thrift 即可安装 方法二:手动下载Thrift插件安装 有时像在IDEA中安装Lombok插件一样,有时由于网络原因,方法一不奏效 ...

  10. Our happy ending

    题目链接 题意: 输入n.k.L,n个数,最大值不超过L,在序列中取若干个数和能达到k的序列个数 n,k<=20 , 0<=L<=10^9 分析: 题目关键在于和k比較小,所以能够考 ...