http://192.168.2.51:4041

http://hadoop1:8088/proxy/application_1512362707596_0006/executors/

Executors

Summary

 
  RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Blacklisted
Active(3) 54 1.4 GB / 1.2 GB 700.1 MB 2 50 0 22 72 6.5 min (2 s) 0.0 B 0.0 B 0.0 B 0
Dead(0) 0 0.0 B / 0.0 B 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B 0
Total(3) 54 1.4 GB / 1.2 GB 700.1 MB 2 50 0 22 72 6.5 min (2 s) 0.0 B 0.0 B 0.0 B 0
 

Executors

Show 
20
40
60
100
All
 entries
Search:
Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Logs Thread Dump
driver 192.168.2.51:52491 Active 2 5.7 KB / 384.1 MB 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B   Thread Dump
2 hadoop2:33018 Active 26 729.5 MB / 384.1 MB 348.1 MB 1 25 0 11 36 2.6 min (1 s) 0.0 B 0.0 B 0.0 B Thread Dump
1 hadoop1:53695 Active 26 700.1 MB / 384.1 MB 352 MB 1 25 0 11 36 3.9 min (0.9 s) 0.0 B 0.0 B 0.0 B Thread Dump
from pyspark.sql import SparkSession

my_spark = SparkSession \
.builder \
.appName("myAppYarn-10g") \
.master('yarn') \
.config("spark.mongodb.input.uri", "mongodb://pyspark_admin:admin123@192.168.2.50/recommendation.article") \
.config("spark.mongodb.output.uri", "mongodb://pyspark_admin:admin123@192.168.2.50/recommendation.article") \
.getOrCreate() db_rows = my_spark.read.format("com.mongodb.spark.sql.DefaultSource").load().collect()

Summary

 
  RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Blacklisted
Active(3) 31 748.4 MB / 1.2 GB 75.7 MB 2 27 0 0 27 0 ms (0 ms) 0.0 B 0.0 B 0.0 B 0
Dead(2) 56 1.5 GB / 768.2 MB 790.3 MB 2 0 0 77 77 2.7 h (2 s) 0.0 B 0.0 B 0.0 B 0
Total(5) 87 2.3 GB / 1.9 GB 865.9 MB 4 27 0 77 104 2.7 h (2 s) 0.0 B 0.0 B 0.0 B 0
 

Executors

Show 
20
40
60
100
All
 entries
Search:
Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Logs Thread Dump
driver 192.168.2.51:52491 Active 2 5.7 KB / 384.1 MB 0.0 B 0 0 0 0 0 0 ms (0 ms) 0.0 B 0.0 B 0.0 B   Thread Dump
4 hadoop2:34394 Active 12 315.9 MB / 384.1 MB 0.0 B 1 11 0 0 11 0 ms (0 ms) 0.0 B 0.0 B 0.0 B Thread Dump
3 hadoop1:39620 Active 17 432.5 MB / 384.1 MB 75.7 MB 1 16 0 0 16 0 ms (0 ms) 0.0 B 0.0 B 0.0 B Thread Dump
2 hadoop2:33018 Dead 27 758.7 MB / 384.1 MB 390.4 MB 1 0 0 38 38 1.3 h (1 s) 0.0 B 0.0 B 0.0 B Thread Dump
1 hadoop1:53695 Dead 29 775.9 MB / 384.1 MB 399.9 MB 1 0 0 39 39 1.4 h (0.9 s) 0.0 B 0.0 B 0.0 B Thread Dump
Showing 1 to 5 of 5 entries
 
 
Logs for container_1512362707596_0006_02_000002 http://hadoop1:8042/node/containerlogs/container_1512362707596_0006_02_000002/root/stderr?start=-4096
 
 
 
 

Logs for container_1512362707596_0006_02_000002

 

ResourceManager

NodeManager

Tools

Showing 4096 bytes. Click here for full log

Manager: Dropping block taskresult_48 from memory
17/12/04 13:14:32 INFO storage.BlockManager: Writing block taskresult_48 to disk
17/12/04 13:14:32 INFO memory.MemoryStore: After dropping 1 blocks, free memory is 38.5 MB
17/12/04 13:14:32 INFO memory.MemoryStore: Block taskresult_73 stored as bytes in memory (estimated size 32.5 MB, free 6.1 MB)
17/12/04 13:14:32 INFO executor.Executor: Finished task 72.0 in stage 1.0 (TID 73). 34033291 bytes result sent via BlockManager)
17/12/04 13:14:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 74
17/12/04 13:14:32 INFO executor.Executor: Running task 73.0 in stage 1.0 (TID 74)
17/12/04 13:14:38 INFO memory.MemoryStore: 1 blocks selected for dropping (16.0 MB bytes)
17/12/04 13:14:38 INFO storage.BlockManager: Dropping block taskresult_50 from memory
17/12/04 13:14:38 INFO storage.BlockManager: Writing block taskresult_50 to disk
17/12/04 13:14:38 INFO memory.MemoryStore: After dropping 1 blocks, free memory is 22.1 MB
17/12/04 13:14:38 INFO memory.MemoryStore: Block taskresult_74 stored as bytes in memory (estimated size 14.4 MB, free 7.7 MB)
17/12/04 13:14:38 INFO executor.Executor: Finished task 73.0 in stage 1.0 (TID 74). 15083225 bytes result sent via BlockManager)
17/12/04 13:14:38 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 75
17/12/04 13:14:38 INFO executor.Executor: Running task 74.0 in stage 1.0 (TID 75)
17/12/04 13:14:46 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.2 KB, free 7.7 MB)
17/12/04 13:14:46 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 433.0 B, free 7.7 MB)
17/12/04 13:14:48 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/04 13:14:48 ERROR executor.Executor: Exception in task 74.0 in stage 1.0 (TID 75)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:239)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:403)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/12/04 13:14:48 INFO connection.MongoClientCache: Closing MongoClient: [192.168.2.50:27017]
17/12/04 13:14:48 INFO driver.connection: Closed connection [connectionId{localValue:4, serverValue:42}] to 192.168.2.50:27017 because the pool has been closed.
 
 
 

spark 33G表的更多相关文章

  1. 基于spark实现表的join操作

    1. 自连接 假设存在如下文件: [root@bluejoe0 ~]# cat categories.csv 1,生活用品,0 2,数码用品,1 3,手机,2 4,华为Mate7,3 每一行的格式为: ...

  2. 利用spark将表中数据拆分

    i# coding:utf-8from pyspark.sql import SparkSession import os if __name__ == '__main__': os.environ[ ...

  3. spark使用Hive表操作

    spark Hive表操作 之前很长一段时间是通过hiveServer操作Hive表的,一旦hiveServer宕掉就无法进行操作. 比如说一个修改表分区的操作 一.使用HiveServer的方式 v ...

  4. Databricks 第6篇:Spark SQL 维护数据库和表

    Spark SQL 表的命名方式是db_name.table_name,只有数据库名称和数据表名称.如果没有指定db_name而直接引用table_name,实际上是引用default 数据库下的表. ...

  5. Spark SQL概念学习系列之如何使用 Spark SQL(六)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc) // 在这里引入 sqlContext 下所有的方法就可以直接用 sql 方法进行查询 ...

  6. spark基础知识介绍2

    dataframe以RDD为基础的分布式数据集,与RDD的区别是,带有Schema元数据,即DF所表示的二维表数据集的每一列带有名称和类型,好处:精简代码:提升执行效率:减少数据读取; 如果不配置sp ...

  7. 新手福利:Apache Spark入门攻略

    [编者按]时至今日,Spark已成为大数据领域最火的一个开源项目,具备高性能.易于使用等特性.然而作为一个年轻的开源项目,其使用上存在的挑战亦不可为不大,这里为大家分享SciSpike软件架构师Ash ...

  8. Spark入门之DataFrame/DataSet

    目录 Part I. Gentle Overview of Big Data and Spark Overview 1.基本架构 2.基本概念 3.例子(可跳过) Spark工具箱 1.Dataset ...

  9. 6.3 使用Spark SQL读写数据库

    Spark SQL可以支持Parquet.JSON.Hive等数据源,并且可以通过JDBC连接外部数据源 一.通过JDBC连接数据库 1.准备工作 ubuntu安装mysql教程 在Linux中启动M ...

随机推荐

  1. JAVA如何解压缩ZIP文档

    代码片段: package org.yu.units; import java.io.Closeable; import java.io.File; import java.io.FileInputS ...

  2. leetcode 206 头插法

    头插法,定义temp,Node temp指向每次头结点,Node每次指向要进行头插的节点. 最后返回temp /** * Definition for singly-linked list. * st ...

  3. angular父子scope之间的访问

    1.子可以访问父的scope,也可以更新相同的scope,但父scope不会被刷新 2.父要访问子scope的方法

  4. ZOJ 3811 / 2014 牡丹江赛区网络赛 C. Untrusted Patrol bfs/dfs/并查集

    Untrusted Patrol Time Limit: 3 Seconds                                     Memory Limit: 65536 KB    ...

  5. shell的case脚本的简单入门

    shell的case脚本的简单入门 示例1: #/bin/bash a=$ case "$a" in ") echo 'hell 2';; ") echo 'h ...

  6. SharedPreferences 存储数组+双击退出

    public static void saveApkEnalbleArray(Context context,boolean[] booleanArray) { SharedPreferences p ...

  7. T1046 旅行家的预算 codevs

    http://codevs.cn/problem/1046/ 题目描述 Description 一个旅行家想驾驶汽车以最少的费用从一个城市到另一个城市(假设出发时油箱是空的).给定两个城市之间的距离D ...

  8. Easy sssp(spfa)(负环)

    vijos    1053    Easy sssp 方法:用spfa判断是否存在负环 描述 输入数据给出一个有N(2 <= N <= 1,000)个节点,M(M <= 100,00 ...

  9. ffmpeg 时间戳

    转http://blog.csdn.net/yfh1985sdq/article/details/5721953 AVpacket里的时间戳pts和dts.单位好像是us. 问 : 时间戳pts和dt ...

  10. bzoj2555(lct维护sam)

    题意: (1):在当前字符串的后面插入一个字符串 (2):询问字符串s在当前字符串中出现了几次?(作为连续子串) 字符串长度<=6e5,询问总长度<=3e6 分析: 考虑建个sam,然后把 ...