hadoop InputSplit

/**

 * <code>InputSplit</code> represents the data to be processed by an

 * individual {@link Mapper}.

 * InputSplit 代表可以被Mapper处理的数据

 * <p>Typically, it presents a byte-oriented view on the input and is the

 * responsibility of {@link RecordReader} of the job to process this and present

 * a record-oriented view.

 *

 * @see InputFormat

 * @see RecordReader

 */

@InterfaceAudience.Public

@InterfaceStability.Stable

public abstract class InputSplit {

  /**

   * Get the size of the split, so that the input splits can be sorted by size.

   * @return the number of bytes in the split

   * @throws IOException

   * @throws InterruptedException

   * split的长度用byte表示

   */

  public abstract long getLength() throws IOException, InterruptedException;

  /**

   * Get the list of nodes by name where the data for the split would be local.

   * The locations do not need to be serialized.

   * 获取split所在的节点

   * @return a new array of the node nodes.

   * @throws IOException

   * @throws InterruptedException

   */

  public abstract

    String[] getLocations() throws IOException, InterruptedException;

  /**

   * Gets info about which nodes the input split is stored on and how it is

   * stored at each location.

   * 返回split所在的节点信息以及在该节点上如何存储 memory

   * @return list of <code>SplitLocationInfo</code>s describing how the split

   *    data is stored at each location. A null value indicates that all the

   *    locations have the data stored on disk.

   * @throws IOException

   */

  @Evolving

  public SplitLocationInfo[] getLocationInfo() throws IOException {

    return null;

  }

}

hadoop InputSplit的更多相关文章

es第十篇：Elasticsearch for Apache Hadoop
es for apache hadoop(elasticsearch-hadoop.jar)允许hadoop作业(mapreduce.hive.pig.cascading.spark)与es交互. A ...
工作采坑札记：4. Hadoop获取InputSplit文件信息
1. 场景基于客户的数据处理需求,客户分发诸多小数据文件,文件每行代表一条记录信息,且每个文件以"类型_yyyyMMdd_批次号"命名.由于同一条记录可能存在于多个文件中,且处于 ...
Hadoop源码分析之产生InputSplit文件过程
用户提交 MapReduce 作业后,JobClient 会调用 InputFormat 的 getSplit方法生成 InputSplit 的信息. 一个 MapReduce 任务 ...
Hadoop MapReduce执行过程详解（带hadoop例子）
https://my.oschina.net/itblog/blog/275294 摘要: 本文通过一个例子,详细介绍Hadoop 的 MapReduce过程. 分析MapReduce执行过程 Map ...
Hadoop学习笔记—10.Shuffle过程那点事儿
一.回顾Reduce阶段三大步骤在第四篇博文<初识MapReduce>中,我们认识了MapReduce的八大步骤,其中在Reduce阶段总共三个步骤,如下图所示: 其中,Step2.1就 ...
hadoop分片分析
上一篇分析了split的生成,现在接着来说具体的split具体内容及其相关的文件和类.以FileSplit(mapred包下org/apache/hadoop/mapreduce/lib/input/ ...
hadoop输入分片计算(Map Task个数的确定)
作业从JobClient端的submitJobInternal()方法提交作业的同时,调用InputFormat接口的getSplits()方法来创建split.默认是使用InputFormat的子类 ...
hadoop运行原理之Job运行(五) 任务调度
接着上篇来说.hadoop首先调度辅助型task(job-cleanup task.task-cleanup task和job-setup task),这是由JobTracker来完成的:但对于计算型 ...
Hadoop的数据输入的源码解析
我们知道,任何一个工程项目,最重要的是三个部分:输入,中间处理,输出.今天我们来深入的了解一下我们熟知的Hadoop系统中,输入是如何输入的? 在hadoop中,输入数据都是通过对应的InputFor ...

随机推荐

ActiveMQ(4) ActiveMQ JDBC 持久化 Mysql 数据库
ActiveMQ 消息持久化机制: ActiveMQ 消息的持久化机制有 JDBC.AMQ.KahaDB 和 LevelDB,其中本示例版本(5.15.2)默认机制为 KahaDB.无论哪种持久化机制 ...
jspersonft有关Table数据绑定（一）
一:前言在公司来就学着做报表,觉得这个报表学着还是很有意义的.jspersonft我在网上搜了一些有关的资料但是不是很多,现在就是学一点就记载一点.好记性不如烂笔头嘛! 二:在jspersonft定 ...
[vue-router] Failed to resolve async component default: Error: Loading chunk 0 failed.
在整合laravel5.4 和vue2.1的时候遇到一个奇怪的问题 Uncaught SyntaxError: Unexpected token < Error: Loading chunk 0 ...
PL/SQL 02 声明变量 declare
语法:identifier [CONSTANT] datatype [NOT NULL] [:= | DEFAULT expr] identifier:用于指定变量或常量的名称.CONSTANT:用于 ...
xcode／Interface Build（IB）／iPhone模拟器／mac／组合键常用的命令集
1.Xcode常用快捷键: win+N:新建文件 win+shift+N:新建工程 win+O:打开工程或文件 win+S:保存 win+shift+S:另存为 win+Z:撤销一步 win+W:关闭 ...
sql 获取字符串首字母，循环
//字符串首字母 CREATE FUNCTION GetInitialLetter(@ChineseString NVARCHAR()) RETURNS NVARCHAR() AS BEGIN DEC ...
SPOJ-913
Query on a tree II Time Limit: 433MS Memory Limit: 1572864KB 64bit IO Format: %lld & %llu Su ...
Selenium2+python自动化37-爬页面源码（page_source）【转载】
前言有时候通过元素的属性的查找页面上的某个元素,可能不太好找,这时候可以从源码中爬出想要的信息.selenium的page_source方法可以获取到页面源码. selenium的page_sour ...
打印sql语句方法
var_dump($this->blackpool_model->getLastSql());
服务器重启之后wdcp打不开【解决】
service wdapache restart

hadoop InputSplit

hadoop InputSplit的更多相关文章

随机推荐

热门专题