【Hadoop代码笔记】Hadoop作业提交之Child启动reduce任务
一、概要描述
在上篇博文描述了TaskTracker启动一个独立的java进程来执行Map任务。接上上篇文章,TaskRunner线程执行中,会构造一个java –D** Child address port tasked这样第一个java命令,单独启动一个java进程。在Child的main函数中通过TaskUmbilicalProtocol协议,从TaskTracker获得需要执行的Task,并调用Task的run方法来执行。在ReduceTask而Task的run方法会通过java反射机制构造Reducer,Reducer.Context,然后调用构造的Reducer的run方法执行reduce操作。不同于map任务,在执行reduce任务前,需要把map的输出从map运行的tasktracker上拷贝到reducer运行的tasktracker上。
Reduce需要集群上若干个map任务的输出作为其特殊的分区文件。每个map任务完成的时间可能不同,因此只要有一个任务完成,reduce任务就开始复制其输出。这就是reduce任务的复制阶段。其实是启动若干个MapOutputCopier线程来复制完所有map输出。在复制完成后reduce任务进入排序阶段。这个阶段将由LocalFSMerger或InMemFSMergeThread合并map输出,维持其顺序排序。【即对有序的几个文件进行归并,采用归并排序】在reduce阶段,对已排序输出的每个键都要调用reduce函数,此阶段的输出直接写到文件系统,一般为HDFS上。(如果采用HDFS,由于tasktracker节点也是DataNoe,所以第一个块副本将被写到本地磁盘。 即数据本地化)
Map 任务完成后,会通知其父tasktracker状态更新,然后tasktracker通知jobtracker。通过心跳机制来完成。因此jobtracker知道map输出和tasktracker之间的映射关系。Reducer的一个getMapCompletionEvents线程定期询问jobtracker以便获取map输出位置。
二、 流程描述
1.在ReduceTak中 构建ReduceCopier对象,调用其fetchOutputs方法。
2. 在ReduceCopier的fetchOutputs方法中分别构造几个独立的线程。相互配合,并分别独立的完成任务。
2.1 GetMapEventsThread线程通过RPC询问TaskTracker,对每个完成的Event,获取maptask所在的服务器地址,即MapTask输出的地址,构造URL,加入到mapLocations,供copier线程获取。
2.2构造并启动若干个MapOutputCopier线程,通过http协议,把map的输出从远端服务器拷贝的本地,如果可以放在内存中,则存储在内存中调用,否则保存在本地文件。
2.3LocalFSMerger对磁盘上的map 输出进行归并。
2.4nMemFSMergeThread对内存中的map输出进行归并。
3.根据拷贝到的map输出构造一个raw keyvalue的迭代器,作为reduce的输入。
4. 调用runNewReducer方法中根据配置的Reducer类构造一个Reducer实例和运行的上下文。并调用reducer的run方法来执行到用户定义的reduce操作。。
5.在Reducer的run方法中从上下文中取出一个key和该key对应的Value集合(Iterable<VALUEIN>类型),调用reducer的reduce方法进行处理。
6. Recuer的reduce方法是用户定义的处理数据的方法,也是用户唯一需要定义的方法。

三、代码详细
1. Child的main方法每个task进程都会被在单独的进程中执行,这个方法就是这些进程的入口方法。Reduce和map一样都是由该main函数调用。所以此处不做描述,详细见上节Child启动map任务。
2. ReduceTask的run方法。在Child子进程中被调用,执行用户定义的Reduce操作。前面代码逻辑和MapTask类似。通过TaskUmbilicalProtocol向tasktracker上报执行进度。开启线程向TaskTracker上报进度,根据task的不同动作要求执行不同的方法,如jobClean,jobsetup,taskCleanup。对于部分的了解可以产看taskTracker获取Task文章中的 JobTracker的 heartbeat方法处的详细解释。不同于map任务,在执行reduce任务前,需要把map的输出从map运行的tasktracker上拷贝到reducer运行的tasktracker上。
@SuppressWarnings("unchecked")
    public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
            throws IOException, InterruptedException, ClassNotFoundException {
        job.setBoolean("mapred.skip.on", isSkipping());
        if (isMapOrReduce()) {
            copyPhase = getProgress().addPhase("copy");
            sortPhase  = getProgress().addPhase("sort");
            reducePhase = getProgress().addPhase("reduce");
        }
        // start thread that will handle communication with parent
        TaskReporter reporter = new TaskReporter(getProgress(), umbilical);
        reporter.startCommunicationThread();
        boolean useNewApi = job.getUseNewReducer();
        initialize(job, getJobID(), reporter, useNewApi);
        // check if it is a cleanupJobTask
        if (jobCleanup) {
            runJobCleanupTask(umbilical, reporter);
            return;
        }
        if (jobSetup) {
            runJobSetupTask(umbilical, reporter);
            return;
        }
        if (taskCleanup) {
            runTaskCleanupTask(umbilical, reporter);
            return;
        }
        // Initialize the codec
        codec = initCodec();
        boolean isLocal = "local".equals(job.get("mapred.job.tracker", "local"));
        //如果不是一个本地执行额模式(就是配置中不是分布式的),则要启动一个ReduceCopier来拷贝Map的输出,即Reduce的输入。
        if (!isLocal) {
            reduceCopier = new ReduceCopier(umbilical, job, reporter);
            if (!reduceCopier.fetchOutputs()) {
                if(reduceCopier.mergeThrowable instanceof FSError) {
                    LOG.error("Task: " + getTaskID() + " - FSError: " +
                            StringUtils.stringifyException(reduceCopier.mergeThrowable));
                    umbilical.fsError(getTaskID(),
                            reduceCopier.mergeThrowable.getMessage());
                }
                throw new IOException("Task: " + getTaskID() +
                        " - The reduce copier failed", reduceCopier.mergeThrowable);
            }
        }
        copyPhase.complete();
        //拷贝完成后,进入sort阶段。
        setPhase(TaskStatus.Phase.SORT);
        statusUpdate(umbilical);
        final FileSystem rfs = FileSystem.getLocal(job).getRaw();
        RawKeyValueIterator rIter = isLocal
                ? Merger.merge(job, rfs, job.getMapOutputKeyClass(),
                        job.getMapOutputValueClass(), codec, getMapFiles(rfs, true),
                        !conf.getKeepFailedTaskFiles(), job.getInt("io.sort.factor", 100),
                        new Path(getTaskID().toString()), job.getOutputKeyComparator(),
                        reporter, spilledRecordsCounter, null)
                        : reduceCopier.createKVIterator(job, rfs, reporter);
                // free up the data structures
                mapOutputFilesOnDisk.clear();
                sortPhase.complete();                         // sort is complete
                setPhase(TaskStatus.Phase.REDUCE);
                statusUpdate(umbilical);
                Class keyClass = job.getMapOutputKeyClass();
                Class valueClass = job.getMapOutputValueClass();
                RawComparator comparator = job.getOutputValueGroupingComparator();
                if (useNewApi) {
                    runNewReducer(job, umbilical, reporter, rIter, comparator,
                            keyClass, valueClass);
                } else {
                    runOldReducer(job, umbilical, reporter, rIter, comparator,
                            keyClass, valueClass);
                }
                done(umbilical, reporter);
    }
3. ReduceCopier类的fetchOutputs方法。该方法负责将map的输出拷贝的reduce端进程处理。从代码上看,启动了一个LocalFSMerger、InMemFSMergeThread、 GetMapEventsThread 和若干个MapOutputCopier线程。几个独立的线程。相互配合,并分别独立的完成任务。
public boolean fetchOutputs() throws IOException {
      int totalFailures = 0;
      int            numInFlight = 0, numCopied = 0;
      DecimalFormat  mbpsFormat = new DecimalFormat("0.00");
      final Progress copyPhase =
        reduceTask.getProgress().phase();
      LocalFSMerger localFSMergerThread = null;
      InMemFSMergeThread inMemFSMergeThread = null;
      GetMapEventsThread getMapEventsThread = null;
      for (int i = 0; i < numMaps; i++) {
        copyPhase.addPhase();       // add sub-phase per file
      }
      //1)根据配置的numCopiers数量构造若干个MapOutputCopier拷贝线程,默认是5个,正是这些MapOutputCopier来实施的拷贝任务。
      copiers = new ArrayList<MapOutputCopier>(numCopiers);
      // start all the copying threads
      for (int i=0; i < numCopiers; i++) {
        MapOutputCopier copier = new MapOutputCopier(conf, reporter);
        copiers.add(copier);
        copier.start();
      }
      //start the on-disk-merge thread 2)启动磁盘merge线程(参照后面方法)
      localFSMergerThread = new LocalFSMerger((LocalFileSystem)localFileSys);
      //start the in memory merger thread 3)启动内存merge线程(参照后面方法)
      inMemFSMergeThread = new InMemFSMergeThread();
      localFSMergerThread.start();
      inMemFSMergeThread.start();
      // start the map events thread 4)启动merge事件获取线程
      getMapEventsThread = new GetMapEventsThread();
      getMapEventsThread.start();
      // start the clock for bandwidth measurement
      long startTime = System.currentTimeMillis();
      long currentTime = startTime;
      long lastProgressTime = startTime;
      long lastOutputTime = 0;
        // loop until we get all required outputs
      //5)当获取到的copiedMapOutputs数量小于map数时,说明还没有拷贝完成,则一直执行。在执行中会根据时间进度一直打印输出,表示已经拷贝了多少个map的输出,还有多万未完成。
        while (copiedMapOutputs.size() < numMaps && mergeThrowable == null) {
          currentTime = System.currentTimeMillis();
          boolean logNow = false;
          if (currentTime - lastOutputTime > MIN_LOG_TIME) {
            lastOutputTime = currentTime;
            logNow = true;
          }
          if (logNow) {
            LOG.info(reduceTask.getTaskID() + " Need another "
                   + (numMaps - copiedMapOutputs.size()) + " map output(s) "
                   + "where " + numInFlight + " is already in progress");
          }
          // Put the hash entries for the failed fetches.
          Iterator<MapOutputLocation> locItr = retryFetches.iterator();
          while (locItr.hasNext()) {
            MapOutputLocation loc = locItr.next();
            List<MapOutputLocation> locList =
              mapLocations.get(loc.getHost());
            // Check if the list exists. Map output location mapping is cleared
            // once the jobtracker restarts and is rebuilt from scratch.
            // Note that map-output-location mapping will be recreated and hence
            // we continue with the hope that we might find some locations
            // from the rebuild map.
            if (locList != null) {
              // Add to the beginning of the list so that this map is
              //tried again before the others and we can hasten the
              //re-execution of this map should there be a problem
              locList.add(0, loc);
            }
          }
          if (retryFetches.size() > 0) {
            LOG.info(reduceTask.getTaskID() + ": " +
                  "Got " + retryFetches.size() +
                  " map-outputs from previous failures");
          }
          // clear the "failed" fetches hashmap
          retryFetches.clear();
          // now walk through the cache and schedule what we can
          int numScheduled = 0;
          int numDups = 0;
          synchronized (scheduledCopies) {
            // Randomize the map output locations to prevent
            // all reduce-tasks swamping the same tasktracker
            List<String> hostList = new ArrayList<String>();
            hostList.addAll(mapLocations.keySet()); 
            Collections.shuffle(hostList, this.random);
            Iterator<String> hostsItr = hostList.iterator();
            while (hostsItr.hasNext()) {
              String host = hostsItr.next();
              List<MapOutputLocation> knownOutputsByLoc =
                mapLocations.get(host);
              // Check if the list exists. Map output location mapping is
              // cleared once the jobtracker restarts and is rebuilt from
              // scratch.
              // Note that map-output-location mapping will be recreated and
              // hence we continue with the hope that we might find some
              // locations from the rebuild map and add then for fetching.
              if (knownOutputsByLoc == null || knownOutputsByLoc.size() == 0) {
                continue;
              }
              //Identify duplicate hosts here
              if (uniqueHosts.contains(host)) {
                 numDups += knownOutputsByLoc.size();
                 continue;
              }
              Long penaltyEnd = penaltyBox.get(host);
              boolean penalized = false;
              if (penaltyEnd != null) {
                if (currentTime < penaltyEnd.longValue()) {
                  penalized = true;
                } else {
                  penaltyBox.remove(host);
                }
              }
              if (penalized)
                continue;
              synchronized (knownOutputsByLoc) {
                locItr = knownOutputsByLoc.iterator();
                while (locItr.hasNext()) {
                  MapOutputLocation loc = locItr.next();
                  // Do not schedule fetches from OBSOLETE maps
                  if (obsoleteMapIds.contains(loc.getTaskAttemptId())) {
                    locItr.remove();
                    continue;
                  }
                  uniqueHosts.add(host);
                  scheduledCopies.add(loc);
                  locItr.remove();  // remove from knownOutputs
                  numInFlight++; numScheduled++;
                  break; //we have a map from this host
                }
              }
            }
            scheduledCopies.notifyAll();
          }
          if (numScheduled > 0 || logNow) {
            LOG.info(reduceTask.getTaskID() + " Scheduled " + numScheduled +
                   " outputs (" + penaltyBox.size() +
                   " slow hosts and" + numDups + " dup hosts)");
          }
          if (penaltyBox.size() > 0 && logNow) {
            LOG.info("Penalized(slow) Hosts: ");
            for (String host : penaltyBox.keySet()) {
              LOG.info(host + " Will be considered after: " +
                  ((penaltyBox.get(host) - currentTime)/1000) + " seconds.");
            }
          }
          // if we have no copies in flight and we can't schedule anything
          // new, just wait for a bit
          try {
            if (numInFlight == 0 && numScheduled == 0) {
              // we should indicate progress as we don't want TT to think
              // we're stuck and kill us
              reporter.progress();
              Thread.sleep(5000);
            }
          } catch (InterruptedException e) { } // IGNORE
          while (numInFlight > 0 && mergeThrowable == null) {
            LOG.debug(reduceTask.getTaskID() + " numInFlight = " +
                      numInFlight);
            //the call to getCopyResult will either
            //1) return immediately with a null or a valid CopyResult object,
            //                 or
            //2) if the numInFlight is above maxInFlight, return with a
            //   CopyResult object after getting a notification from a
            //   fetcher thread,
            //So, when getCopyResult returns null, we can be sure that
            //we aren't busy enough and we should go and get more mapcompletion
            //events from the tasktracker
            CopyResult cr = getCopyResult(numInFlight);
            if (cr == null) {
              break;
            }
            if (cr.getSuccess()) {  // a successful copy
              numCopied++;
              lastProgressTime = System.currentTimeMillis();
              reduceShuffleBytes.increment(cr.getSize());
              long secsSinceStart =
                (System.currentTimeMillis()-startTime)/1000+1;
              float mbs = ((float)reduceShuffleBytes.getCounter())/(1024*1024);
              float transferRate = mbs/secsSinceStart;
              copyPhase.startNextPhase();
              copyPhase.setStatus("copy (" + numCopied + " of " + numMaps
                                  + " at " +
                                  mbpsFormat.format(transferRate) +  " MB/s)");
              // Note successful fetch for this mapId to invalidate
              // (possibly) old fetch-failures
              fetchFailedMaps.remove(cr.getLocation().getTaskId());
            } else if (cr.isObsolete()) {
              //ignore
              LOG.info(reduceTask.getTaskID() +
                       " Ignoring obsolete copy result for Map Task: " +
                       cr.getLocation().getTaskAttemptId() + " from host: " +
                       cr.getHost());
            } else {
              retryFetches.add(cr.getLocation());
              // note the failed-fetch
              TaskAttemptID mapTaskId = cr.getLocation().getTaskAttemptId();
              TaskID mapId = cr.getLocation().getTaskId();
              totalFailures++;
              Integer noFailedFetches =
                mapTaskToFailedFetchesMap.get(mapTaskId);
              noFailedFetches =
                (noFailedFetches == null) ? 1 : (noFailedFetches + 1);
              mapTaskToFailedFetchesMap.put(mapTaskId, noFailedFetches);
              LOG.info("Task " + getTaskID() + ": Failed fetch #" +
                       noFailedFetches + " from " + mapTaskId);
              // did the fetch fail too many times?
              // using a hybrid technique for notifying the jobtracker.
              //   a. the first notification is sent after max-retries
              //   b. subsequent notifications are sent after 2 retries.
              if ((noFailedFetches >= maxFetchRetriesPerMap)
                  && ((noFailedFetches - maxFetchRetriesPerMap) % 2) == 0) {
                synchronized (ReduceTask.this) {
                  taskStatus.addFetchFailedMap(mapTaskId);
                  LOG.info("Failed to fetch map-output from " + mapTaskId +
                           " even after MAX_FETCH_RETRIES_PER_MAP retries... "
                           + " reporting to the JobTracker");
                }
              }
              // note unique failed-fetch maps
              if (noFailedFetches == maxFetchRetriesPerMap) {
                fetchFailedMaps.add(mapId);
                // did we have too many unique failed-fetch maps?
                // and did we fail on too many fetch attempts?
                // and did we progress enough
                //     or did we wait for too long without any progress?
                // check if the reducer is healthy
                boolean reducerHealthy =
                    (((float)totalFailures / (totalFailures + numCopied))
                     < MAX_ALLOWED_FAILED_FETCH_ATTEMPT_PERCENT);
                // check if the reducer has progressed enough
                boolean reducerProgressedEnough =
                    (((float)numCopied / numMaps)
                     >= MIN_REQUIRED_PROGRESS_PERCENT);
                // check if the reducer is stalled for a long time
                // duration for which the reducer is stalled
                int stallDuration =
                    (int)(System.currentTimeMillis() - lastProgressTime);
                // duration for which the reducer ran with progress
                int shuffleProgressDuration =
                    (int)(lastProgressTime - startTime);
                // min time the reducer should run without getting killed
                int minShuffleRunDuration =
                    (shuffleProgressDuration > maxMapRuntime)
                    ? shuffleProgressDuration
                    : maxMapRuntime;
                boolean reducerStalled =
                    (((float)stallDuration / minShuffleRunDuration)
                     >= MAX_ALLOWED_STALL_TIME_PERCENT);
                // kill if not healthy and has insufficient progress
                if ((fetchFailedMaps.size() >= maxFailedUniqueFetches ||
                     fetchFailedMaps.size() == (numMaps - copiedMapOutputs.size()))
                    && !reducerHealthy
                    && (!reducerProgressedEnough || reducerStalled)) {
                  LOG.fatal("Shuffle failed with too many fetch failures " +
                            "and insufficient progress!" +
                            "Killing task " + getTaskID() + ".");
                  umbilical.shuffleError(getTaskID(),
                                         "Exceeded MAX_FAILED_UNIQUE_FETCHES;"
                                         + " bailing-out.");
                }
              }
              // back off exponentially until num_retries <= max_retries
              // back off by max_backoff/2 on subsequent failed attempts
              currentTime = System.currentTimeMillis();
              int currentBackOff = noFailedFetches <= maxFetchRetriesPerMap
                                   ? BACKOFF_INIT
                                     * (1 << (noFailedFetches - 1))
                                   : (this.maxBackoff * 1000 / 2);
              penaltyBox.put(cr.getHost(), currentTime + currentBackOff);
              LOG.warn(reduceTask.getTaskID() + " adding host " +
                       cr.getHost() + " to penalty box, next contact in " +
                       (currentBackOff/1000) + " seconds");
            }
            uniqueHosts.remove(cr.getHost());
            numInFlight--;
          }
        }
        // all done, inform the copiers to exit
        exitGetMapEvents= true;
        try {
          getMapEventsThread.join();
          LOG.info("getMapsEventsThread joined.");
        } catch (Throwable t) {
          LOG.info("getMapsEventsThread threw an exception: " +
              StringUtils.stringifyException(t));
        }
        synchronized (copiers) {
          synchronized (scheduledCopies) {
            for (MapOutputCopier copier : copiers) {
              copier.interrupt();
            }
            copiers.clear();
          }
        }
        // copiers are done, exit and notify the waiting merge threads
        synchronized (mapOutputFilesOnDisk) {
          exitLocalFSMerge = true;
          mapOutputFilesOnDisk.notify();
        }
        ramManager.close();
        //Do a merge of in-memory files (if there are any)
        if (mergeThrowable == null) {
          try {
            // Wait for the on-disk merge to complete
            localFSMergerThread.join();
            LOG.info("Interleaved on-disk merge complete: " +
                     mapOutputFilesOnDisk.size() + " files left.");
            //wait for an ongoing merge (if it is in flight) to complete
            inMemFSMergeThread.join();
            LOG.info("In-memory merge complete: " +
                     mapOutputsFilesInMemory.size() + " files left.");
            } catch (Throwable t) {
            LOG.warn(reduceTask.getTaskID() +
                     " Final merge of the inmemory files threw an exception: " +
                     StringUtils.stringifyException(t));
            // check if the last merge generated an error
            if (mergeThrowable != null) {
              mergeThrowable = t;
            }
            return false;
          }
        }
        return mergeThrowable == null && copiedMapOutputs.size() == numMaps;
    }
fetchOutputs
4. MapOutputCopier线程的run方法。从scheduledCopies(List<MapOutputLocation>)中取出对象来调用copyOutput方法执行拷贝。通过http协议,把map的输出从远端服务器拷贝的本地,如果可以放在内存中,则存储在内存中调用,否则保存在本地文件。
public void run() {
        while (true) {
            MapOutputLocation loc = null;
            long size = -1;
              synchronized (scheduledCopies) {
              while (scheduledCopies.isEmpty()) {
                scheduledCopies.wait();
              }
              loc = scheduledCopies.remove(0);
            }            
              start(loc);
              size = copyOutput(loc);
        if (decompressor != null) {
          CodecPool.returnDecompressor(decompressor);
        }
      }
5.MapOutputCopier线程的copyOutput方法。map的输出从远端map所在的tasktracker拷贝到reducer任务所在的tasktracker。
private long copyOutput(MapOutputLocation loc
) throws IOException, InterruptedException {
// 从拷贝的记录中检查是否已经拷贝完成。
if (copiedMapOutputs.contains(loc.getTaskId()) ||
obsoleteMapIds.contains(loc.getTaskAttemptId())) {
return CopyResult.OBSOLETE;
}
TaskAttemptID reduceId = reduceTask.getTaskID();
Path filename = new Path("/" + TaskTracker.getIntermediateOutputDir(
reduceId.getJobID().toString(),
reduceId.toString())
+ "/map_" +
loc.getTaskId().getId() + ".out"); //一个拷贝map输出的临时文件。
Path tmpMapOutput = new Path(filename+"-"+id); //拷贝map输出。
MapOutput mapOutput = getMapOutput(loc, tmpMapOutput);
if (mapOutput == null) {
throw new IOException("Failed to fetch map-output for " +
loc.getTaskAttemptId() + " from " +
loc.getHost());
}
// The size of the map-output
long bytes = mapOutput.compressedSize; synchronized (ReduceTask.this) {
if (copiedMapOutputs.contains(loc.getTaskId())) {
mapOutput.discard();
return CopyResult.OBSOLETE;
}
// Note that we successfully copied the map-output
noteCopiedMapOutput(loc.getTaskId());
return bytes;
} // 处理map的输出,如果是存储在内存中则添加到(Collections.synchronizedList(new LinkedList<MapOutput>)类型的结合mapOutputsFilesInMemory中,否则如果存储在临时文件中,则冲明明临时文件为正式的输出文件。
if (mapOutput.inMemory) {
// Save it in the synchronized list of map-outputs
mapOutputsFilesInMemory.add(mapOutput);
} else { tmpMapOutput = mapOutput.file;
filename = new Path(tmpMapOutput.getParent(), filename.getName());
if (!localFileSys.rename(tmpMapOutput, filename)) {
localFileSys.delete(tmpMapOutput, true);
bytes = -1;
throw new IOException("Failed to rename map output " +
tmpMapOutput + " to " + filename);
} synchronized (mapOutputFilesOnDisk) {
addToMapOutputFilesOnDisk(localFileSys.getFileStatus(filename));
}
} // Note that we successfully copied the map-output
noteCopiedMapOutput(loc.getTaskId());
} return bytes;
}
5.ReduceCopier.MapOutputCopier的getMapOutput方法,真正执行拷贝动作的方法,通过http把远端服务器上map的输出拷贝到本地。
private MapOutput getMapOutput(MapOutputLocation mapOutputLoc,
Path filename, int reduce)
throws IOException, InterruptedException {
// 根据远端服务器地址构建连接。
URLConnection connection =
mapOutputLoc.getOutputLocation().openConnection();
InputStream input = getInputStream(connection, STALLED_COPY_TIMEOUT,
DEFAULT_READ_TIMEOUT); // 从输出的http header中得到mapid
TaskAttemptID mapId = null;
mapId = TaskAttemptID.forName(connection.getHeaderField(FROM_MAP_TASK)); TaskAttemptID expectedMapId = mapOutputLoc.getTaskAttemptId();
if (!mapId.equals(expectedMapId)) {
LOG.warn("data from wrong map:" + mapId +
" arrived to reduce task " + reduce +
", where as expected map output should be from " + expectedMapId);
return null;
} long decompressedLength =
Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH));
long compressedLength =
Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH)); if (compressedLength < 0 || decompressedLength < 0) {
LOG.warn(getName() + " invalid lengths in map output header: id: " +
mapId + " compressed len: " + compressedLength +
", decompressed len: " + decompressedLength);
return null;
}
int forReduce =
(int)Integer.parseInt(connection.getHeaderField(FOR_REDUCE_TASK)); if (forReduce != reduce) {
LOG.warn("data for the wrong reduce: " + forReduce +
" with compressed len: " + compressedLength +
", decompressed len: " + decompressedLength +
" arrived to reduce task " + reduce);
return null;
}
LOG.info("header: " + mapId + ", compressed len: " + compressedLength +
", decompressed len: " + decompressedLength); // 检查map的输出大小是否能在memory里存储下,已决定是在内存中shuffle还是在磁盘上shuffle。并决定最终生成的MapOutput对象调用不同的构造函数,其inMemory属性页不同。
boolean shuffleInMemory = ramManager.canFitInMemory(decompressedLength); // Shuffle
MapOutput mapOutput = null;
if (shuffleInMemory) {
LOG.info("Shuffling " + decompressedLength + " bytes (" +
compressedLength + " raw bytes) " +
"into RAM from " + mapOutputLoc.getTaskAttemptId()); mapOutput = shuffleInMemory(mapOutputLoc, connection, input,
(int)decompressedLength,
(int)compressedLength);
} else {
LOG.info("Shuffling " + decompressedLength + " bytes (" +
compressedLength + " raw bytes) " +
"into Local-FS from " + mapOutputLoc.getTaskAttemptId()); mapOutput = shuffleToDisk(mapOutputLoc, input, filename,
compressedLength);
} return mapOutput;
}
6.ReduceTask.ReduceCopier.MapOutputCopier的shuffleInMemory方法。根据上一方法当map的输出可以在内存中存储时会调用该方法。
private MapOutput shuffleInMemory(MapOutputLocation mapOutputLoc,
URLConnection connection,
InputStream input,
int mapOutputLength,
int compressedLength)
throws IOException, InterruptedException { //checksum 输入流,读Mpareduce中间文件IFile.
IFileInputStream checksumIn =
new IFileInputStream(input,compressedLength); input = checksumIn; // 如果加密,则根据codec来构建一个解密的输入流。
if (codec != null) {
decompressor.reset();
input = codec.createInputStream(input, decompressor);
} //把map的输出拷贝到内存的buffer中。
byte[] shuffleData = new byte[mapOutputLength];
MapOutput mapOutput =
new MapOutput(mapOutputLoc.getTaskId(),
mapOutputLoc.getTaskAttemptId(), shuffleData, compressedLength); int bytesRead = 0;
try {
int n = input.read(shuffleData, 0, shuffleData.length);
while (n > 0) {
bytesRead += n;
shuffleClientMetrics.inputBytes(n); // indicate we're making progress
reporter.progress();
n = input.read(shuffleData, bytesRead,
(shuffleData.length-bytesRead));
} LOG.info("Read " + bytesRead + " bytes from map-output for " +
mapOutputLoc.getTaskAttemptId()); input.close();
} catch (IOException ioe) {
LOG.info("Failed to shuffle from " + mapOutputLoc.getTaskAttemptId(),
ioe); // Inform the ram-manager
ramManager.closeInMemoryFile(mapOutputLength);
ramManager.unreserve(mapOutputLength); // Discard the map-output
try {
mapOutput.discard();
} catch (IOException ignored) {
LOG.info("Failed to discard map-output from " +
mapOutputLoc.getTaskAttemptId(), ignored);
}
mapOutput = null; // Close the streams
IOUtils.cleanup(LOG, input); // Re-throw
throw ioe;
} // Close the in-memory file
ramManager.closeInMemoryFile(mapOutputLength); // Sanity check
if (bytesRead != mapOutputLength) {
// Inform the ram-manager
ramManager.unreserve(mapOutputLength); // Discard the map-output
try {
mapOutput.discard();
} catch (IOException ignored) {
// IGNORED because we are cleaning up
LOG.info("Failed to discard map-output from " +
mapOutputLoc.getTaskAttemptId(), ignored);
}
mapOutput = null; throw new IOException("Incomplete map output received for " +
mapOutputLoc.getTaskAttemptId() + " from " +
mapOutputLoc.getOutputLocation() + " (" +
bytesRead + " instead of " +
mapOutputLength + ")"
);
} // TODO: Remove this after a 'fix' for HADOOP-3647
if (mapOutputLength > 0) {
DataInputBuffer dib = new DataInputBuffer();
dib.reset(shuffleData, 0, shuffleData.length);
LOG.info("Rec #1 from " + mapOutputLoc.getTaskAttemptId() + " -> (" +
WritableUtils.readVInt(dib) + ", " +
WritableUtils.readVInt(dib) + ") from " +
mapOutputLoc.getHost());
} return mapOutput;
}
7.ReduceTask.ReduceCopier.MapOutputCopier的shuffleToDisk 方法把map输出拷贝到本地磁盘。当map的输出不能再内存中存储时,调用该方法。
private MapOutput shuffleToDisk(MapOutputLocation mapOutputLoc,
InputStream input,
Path filename,
long mapOutputLength)
throws IOException {
// Find out a suitable location for the output on local-filesystem
Path localFilename =
lDirAlloc.getLocalPathForWrite(filename.toUri().getPath(),
mapOutputLength, conf); MapOutput mapOutput =
new MapOutput(mapOutputLoc.getTaskId(), mapOutputLoc.getTaskAttemptId(),
conf, localFileSys.makeQualified(localFilename),
mapOutputLength); // Copy data to local-disk
OutputStream output = null;
long bytesRead = 0;
try {
output = rfs.create(localFilename); byte[] buf = new byte[64 * 1024];
int n = input.read(buf, 0, buf.length);
while (n > 0) {
bytesRead += n;
shuffleClientMetrics.inputBytes(n);
output.write(buf, 0, n); // indicate we're making progress
reporter.progress();
n = input.read(buf, 0, buf.length);
} LOG.info("Read " + bytesRead + " bytes from map-output for " +
mapOutputLoc.getTaskAttemptId()); output.close();
input.close();
} catch (IOException ioe) {
LOG.info("Failed to shuffle from " + mapOutputLoc.getTaskAttemptId(),
ioe); // Discard the map-output
try {
mapOutput.discard();
} catch (IOException ignored) {
LOG.info("Failed to discard map-output from " +
mapOutputLoc.getTaskAttemptId(), ignored);
}
mapOutput = null; // Close the streams
IOUtils.cleanup(LOG, input, output); // Re-throw
throw ioe;
} // Sanity check
if (bytesRead != mapOutputLength) {
try {
mapOutput.discard();
} catch (Exception ioe) {
// IGNORED because we are cleaning up
LOG.info("Failed to discard map-output from " +
mapOutputLoc.getTaskAttemptId(), ioe);
} catch (Throwable t) {
String msg = getTaskID() + " : Failed in shuffle to disk :"
+ StringUtils.stringifyException(t);
reportFatalError(getTaskID(), t, msg);
}
mapOutput = null; throw new IOException("Incomplete map output received for " +
mapOutputLoc.getTaskAttemptId() + " from " +
mapOutputLoc.getOutputLocation() + " (" +
bytesRead + " instead of " +
mapOutputLength + ")"
);
} return mapOutput; }
8.LocalFSMerger线程的run方法。Merge map输出的本地拷贝。
public void run() {
        try {
            LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());
            while(!exitLocalFSMerge){
                // TreeSet<FileStatus>(mapOutputFileComparator)中存储了mapout的本地文件集合。
                synchronized (mapOutputFilesOnDisk) {
                    List<Path> mapFiles = new ArrayList<Path>();
                    long approxOutputSize = 0;
                    int bytesPerSum =
                            reduceTask.getConf().getInt("io.bytes.per.checksum", 512);
                    LOG.info(reduceTask.getTaskID() + "We have  " +
                            mapOutputFilesOnDisk.size() + " map outputs on disk. " +
                            "Triggering merge of " + ioSortFactor + " files");
                    // 1. Prepare the list of files to be merged. This list is prepared
                    // using a list of map output files on disk. Currently we merge
                    // io.sort.factor files into 1.
                    //1. io.sort.factor构造List<Path> mapFiles,准备合并。            synchronized (mapOutputFilesOnDisk) {
                    for (int i = 0; i < ioSortFactor; ++i) {
                        FileStatus filestatus = mapOutputFilesOnDisk.first();
                        mapOutputFilesOnDisk.remove(filestatus);
                        mapFiles.add(filestatus.getPath());
                        approxOutputSize += filestatus.getLen();
                    }
                }
                // add the checksum length
                approxOutputSize += ChecksumFileSystem
                        .getChecksumLength(approxOutputSize,
                                bytesPerSum);
                // 2. 对list中的文件进行合并。
                Path outputPath =
                        lDirAlloc.getLocalPathForWrite(mapFiles.get(0).toString(),
                                approxOutputSize, conf)
                                .suffix(".merged");
                Writer writer =
                        new Writer(conf,rfs, outputPath,
                                conf.getMapOutputKeyClass(),
                                conf.getMapOutputValueClass(),
                                codec, null);
                RawKeyValueIterator iter  = null;
                Path tmpDir = new Path(reduceTask.getTaskID().toString());
                try {
                    iter = Merger.merge(conf, rfs,
                            conf.getMapOutputKeyClass(),
                            conf.getMapOutputValueClass(),
                            codec, mapFiles.toArray(new Path[mapFiles.size()]),
                            true, ioSortFactor, tmpDir,
                            conf.getOutputKeyComparator(), reporter,
                            spilledRecordsCounter, null);
                    Merger.writeFile(iter, writer, reporter, conf);
                    writer.close();
                } catch (Exception e) {
                    localFileSys.delete(outputPath, true);
                    throw new IOException (StringUtils.stringifyException(e));
                }
                synchronized (mapOutputFilesOnDisk) {
                    addToMapOutputFilesOnDisk(localFileSys.getFileStatus(outputPath));
                }
                LOG.info(reduceTask.getTaskID() +
                        " Finished merging " + mapFiles.size() +
                        " map output files on disk of total-size " +
                        approxOutputSize + "." +
                        " Local output file is " + outputPath + " of size " +
                        localFileSys.getFileStatus(outputPath).getLen());
            }
        } catch (Throwable t) {
            LOG.warn(reduceTask.getTaskID()
                    + " Merging of the local FS files threw an exception: "
                    + StringUtils.stringifyException(t));
            if (mergeThrowable == null) {
                mergeThrowable = t;
            }
        }
    }
}
9.InMemFSMergeThread线程的run方法。
    public void run() {
        LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());
        try {
          boolean exit = false;
          do {
            exit = ramManager.waitForDataToMerge();
            if (!exit) {
              doInMemMerge();
            }
          } while (!exit);
        } catch (Throwable t) {
          LOG.warn(reduceTask.getTaskID() +
                   " Merge of the inmemory files threw an exception: "
                   + StringUtils.stringifyException(t));
          ReduceCopier.this.mergeThrowable = t;
        }
      }
10. InMemFSMergeThread线程的doInMemMerge方法,
private void doInMemMerge() throws IOException{
        if (mapOutputsFilesInMemory.size() == 0) {
          return;
        }
        TaskID mapId = mapOutputsFilesInMemory.get(0).mapId;
        List<Segment<K, V>> inMemorySegments = new ArrayList<Segment<K,V>>();
        long mergeOutputSize = createInMemorySegments(inMemorySegments, 0);
        int noInMemorySegments = inMemorySegments.size();
        Path outputPath = mapOutputFile.getInputFileForWrite(mapId,
                          reduceTask.getTaskID(), mergeOutputSize);
        Writer writer =
          new Writer(conf, rfs, outputPath,
                     conf.getMapOutputKeyClass(),
                     conf.getMapOutputValueClass(),
                     codec, null);
        RawKeyValueIterator rIter = null;
        try {
          LOG.info("Initiating in-memory merge with " + noInMemorySegments +
                   " segments...");
          rIter = Merger.merge(conf, rfs,
                               (Class<K>)conf.getMapOutputKeyClass(),
                               (Class<V>)conf.getMapOutputValueClass(),
                               inMemorySegments, inMemorySegments.size(),
                               new Path(reduceTask.getTaskID().toString()),
                               conf.getOutputKeyComparator(), reporter,
                               spilledRecordsCounter, null);
          if (combinerRunner == null) {
            Merger.writeFile(rIter, writer, reporter, conf);
          } else {
            combineCollector.setWriter(writer);
            combinerRunner.combine(rIter, combineCollector);
          }
          writer.close();
          LOG.info(reduceTask.getTaskID() +
              " Merge of the " + noInMemorySegments +
              " files in-memory complete." +
              " Local file is " + outputPath + " of size " +
              localFileSys.getFileStatus(outputPath).getLen());
        } catch (Exception e) {
          //make sure that we delete the ondisk file that we created
          //earlier when we invoked cloneFileAttributes
          localFileSys.delete(outputPath, true);
          throw (IOException)new IOException
                  ("Intermediate merge failed").initCause(e);
        }
        // Note the output of the merge
        FileStatus status = localFileSys.getFileStatus(outputPath);
        synchronized (mapOutputFilesOnDisk) {
          addToMapOutputFilesOnDisk(status);
        }
      }
    }
11.ReduceCopier.GetMapEventsThread线程的run方法。通过RPC询问TaskTracker,对每个完成的Event,获取maptask所在的服务器地址,即MapTask输出的地址,构造URL,加入到mapLocations,供copier线程获取。
public void run() {
        LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());
        do {
          try {
            int numNewMaps = getMapCompletionEvents();
            if (numNewMaps > 0) {
              LOG.info(reduceTask.getTaskID() + ": " +
                  "Got " + numNewMaps + " new map-outputs");
            }
            Thread.sleep(SLEEP_TIME);
          }
          catch (InterruptedException e) {
            LOG.warn(reduceTask.getTaskID() +
                " GetMapEventsThread returning after an " +
                " interrupted exception");
            return;
          }
          catch (Throwable t) {
            LOG.warn(reduceTask.getTaskID() +
                " GetMapEventsThread Ignoring exception : " +
                StringUtils.stringifyException(t));
          }
        } while (!exitGetMapEvents);
        LOG.info("GetMapEventsThread exiting");
      }
12.ReduceCopier.GetMapEventsThread线程的getMapCompletionEvents方法。通过RPC询问TaskTracker,对每个完成的Event,获取maptask所在的服务器地址,构造URL,加入到mapLocations。
    private int getMapCompletionEvents() throws IOException {
        int numNewMaps = 0;
        //RPC调用Tasktracker的getMapCompletionEvents方法,获得MapTaskCompletionEventsUpdate,进而获得TaskCompletionEvents
        MapTaskCompletionEventsUpdate update =
                umbilical.getMapCompletionEvents(reduceTask.getJobID(),
                        fromEventId.get(),
                        MAX_EVENTS_TO_FETCH,
                        reduceTask.getTaskID());
        TaskCompletionEvent events[] = update.getMapTaskCompletionEvents();
        // Check if the reset is required.
        // Since there is no ordering of the task completion events at the
        // reducer, the only option to sync with the new jobtracker is to reset
        // the events index
        if (update.shouldReset()) {
            fromEventId.set(0);
            obsoleteMapIds.clear(); // clear the obsolete map
            mapLocations.clear(); // clear the map locations mapping
        }
        // Update the last seen event ID
        fromEventId.set(fromEventId.get() + events.length);
        // Process the TaskCompletionEvents:
        // 1. Save the SUCCEEDED maps in knownOutputs to fetch the outputs.
        // 2. Save the OBSOLETE/FAILED/KILLED maps in obsoleteOutputs to stop
        //    fetching from those maps.
        // 3. Remove TIPFAILED maps from neededOutputs since we don't need their
        //    outputs at all.
        //对每个完成的Event,获取maptask所在的服务器地址,构造URL,加入到mapLocations,供copier线程获取。
        for (TaskCompletionEvent event : events) {
            switch (event.getTaskStatus()) {
            case SUCCEEDED:
            {
                URI u = URI.create(event.getTaskTrackerHttp());
                String host = u.getHost();
                TaskAttemptID taskId = event.getTaskAttemptId();
                int duration = event.getTaskRunTime();
                if (duration > maxMapRuntime) {
                    maxMapRuntime = duration;
                    // adjust max-fetch-retries based on max-map-run-time
                    maxFetchRetriesPerMap = Math.max(MIN_FETCH_RETRIES_PER_MAP,
                            getClosestPowerOf2((maxMapRuntime / BACKOFF_INIT) + 1));
                }
                URL mapOutputLocation = new URL(event.getTaskTrackerHttp() +
                        "/mapOutput?job=" + taskId.getJobID() +
                        "&map=" + taskId +
                        "&reduce=" + getPartition());
                List<MapOutputLocation> loc = mapLocations.get(host);
                if (loc == null) {
                    loc = Collections.synchronizedList
                            (new LinkedList<MapOutputLocation>());
                    mapLocations.put(host, loc);
                }
                loc.add(new MapOutputLocation(taskId, host, mapOutputLocation));
                numNewMaps ++;
            }
            break;
            case FAILED:
            case KILLED:
            case OBSOLETE:
            {
                obsoleteMapIds.add(event.getTaskAttemptId());
                LOG.info("Ignoring obsolete output of " + event.getTaskStatus() +
                        " map-task: '" + event.getTaskAttemptId() + "'");
            }
            break;
            case TIPFAILED:
            {
                copiedMapOutputs.add(event.getTaskAttemptId().getTaskID());
                LOG.info("Ignoring output of failed map TIP: '" +
                        event.getTaskAttemptId() + "'");
            }
            break;
            }
        }
        return numNewMaps;
    }
}
}
13.ReduceTask.ReduceCopier的createKVIterator方法,从拷贝到的map输出创建RawKeyValueIterator,作为reduce的输入。
private RawKeyValueIterator createKVIterator(
JobConf job, FileSystem fs, Reporter reporter) throws IOException { // merge config params
Class<K> keyClass = (Class<K>)job.getMapOutputKeyClass();
Class<V> valueClass = (Class<V>)job.getMapOutputValueClass();
boolean keepInputs = job.getKeepFailedTaskFiles();
final Path tmpDir = new Path(getTaskID().toString());
final RawComparator<K> comparator =
(RawComparator<K>)job.getOutputKeyComparator(); // segments required to vacate memory
List<Segment<K,V>> memDiskSegments = new ArrayList<Segment<K,V>>();
long inMemToDiskBytes = 0;
if (mapOutputsFilesInMemory.size() > 0) {
TaskID mapId = mapOutputsFilesInMemory.get(0).mapId;
inMemToDiskBytes = createInMemorySegments(memDiskSegments,
maxInMemReduce);
final int numMemDiskSegments = memDiskSegments.size();
if (numMemDiskSegments > 0 &&
ioSortFactor > mapOutputFilesOnDisk.size()) {
// must spill to disk, but can't retain in-mem for intermediate merge
final Path outputPath = mapOutputFile.getInputFileForWrite(mapId,
reduceTask.getTaskID(), inMemToDiskBytes);
final RawKeyValueIterator rIter = Merger.merge(job, fs,
keyClass, valueClass, memDiskSegments, numMemDiskSegments,
tmpDir, comparator, reporter, spilledRecordsCounter, null);
final Writer writer = new Writer(job, fs, outputPath,
keyClass, valueClass, codec, null);
try {
Merger.writeFile(rIter, writer, reporter, job);
addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
} catch (Exception e) {
if (null != outputPath) {
fs.delete(outputPath, true);
}
throw new IOException("Final merge failed", e);
} finally {
if (null != writer) {
writer.close();
}
}
LOG.info("Merged " + numMemDiskSegments + " segments, " +
inMemToDiskBytes + " bytes to disk to satisfy " +
"reduce memory limit");
inMemToDiskBytes = 0;
memDiskSegments.clear();
} else if (inMemToDiskBytes != 0) {
LOG.info("Keeping " + numMemDiskSegments + " segments, " +
inMemToDiskBytes + " bytes in memory for " +
"intermediate, on-disk merge");
}
} // segments on disk
List<Segment<K,V>> diskSegments = new ArrayList<Segment<K,V>>();
long onDiskBytes = inMemToDiskBytes;
Path[] onDisk = getMapFiles(fs, false);
for (Path file : onDisk) {
onDiskBytes += fs.getFileStatus(file).getLen();
diskSegments.add(new Segment<K, V>(job, fs, file, codec, keepInputs));
}
LOG.info("Merging " + onDisk.length + " files, " +
onDiskBytes + " bytes from disk");
Collections.sort(diskSegments, new Comparator<Segment<K,V>>() {
public int compare(Segment<K, V> o1, Segment<K, V> o2) {
if (o1.getLength() == o2.getLength()) {
return 0;
}
return o1.getLength() < o2.getLength() ? -1 : 1;
}
}); // build final list of segments from merged backed by disk + in-mem
List<Segment<K,V>> finalSegments = new ArrayList<Segment<K,V>>();
long inMemBytes = createInMemorySegments(finalSegments, 0);
LOG.info("Merging " + finalSegments.size() + " segments, " +
inMemBytes + " bytes from memory into reduce");
if (0 != onDiskBytes) {
final int numInMemSegments = memDiskSegments.size();
diskSegments.addAll(0, memDiskSegments);
memDiskSegments.clear();
RawKeyValueIterator diskMerge = Merger.merge(
job, fs, keyClass, valueClass, diskSegments,
ioSortFactor, numInMemSegments, tmpDir, comparator,
reporter, false, spilledRecordsCounter, null);
diskSegments.clear();
if (0 == finalSegments.size()) {
return diskMerge;
}
finalSegments.add(new Segment<K,V>(
new RawKVIteratorReader(diskMerge, onDiskBytes), true));
}
return Merger.merge(job, fs, keyClass, valueClass,
finalSegments, finalSegments.size(), tmpDir,
comparator, reporter, spilledRecordsCounter, null);
}
14.ReduceTask的runNewReducer方法。根据配置构造reducer以及其运行的上下文,调用reducer的reduce方法。
@SuppressWarnings("unchecked")
    private <INKEY,INVALUE,OUTKEY,OUTVALUE>
    void runNewReducer(JobConf job,
            final TaskUmbilicalProtocol umbilical,
            final TaskReporter reporter,
            RawKeyValueIterator rIter,
            RawComparator<INKEY> comparator,
            Class<INKEY> keyClass,
            Class<INVALUE> valueClass
            ) throws IOException,InterruptedException,
            ClassNotFoundException {
        //1. 构造TaskContext
        org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
                new org.apache.hadoop.mapreduce.TaskAttemptContext(job, getTaskID());
        //2. 根据配置的Reducer类构造一个Reducer实例
        org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =      (org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
                ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
        //3. 构造RecordWriter
        org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> output =
                (org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE>)
                outputFormat.getRecordWriter(taskContext);
        job.setBoolean("mapred.skip.on", isSkipping());
        //4. 构造Context,是Reducer运行的上下文
        org.apache.hadoop.mapreduce.Reducer.Context
        reducerContext = createReduceContext(reducer, job, getTaskID(),
                rIter, reduceInputValueCounter,
                output, committer,
                reporter, comparator, keyClass,
                valueClass);
        reducer.run(reducerContext);
        output.close(reducerContext);
    }
15.抽象类Reducer的run方法。从上下文中取出一个key和该key对应的Value集合(Iterable<VALUEIN>类型),调用reducer的reduce方法进行处理。
public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
    }
    cleanup(context);
  }
16.Reducer类的reduce,是用户一般会覆盖来执行reduce处理逻辑的方法。
@SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
完。
为了转载内容的一致性、可追溯性和保证及时更新纠错,转载时请注明来自:http://www.cnblogs.com/douba/p/hadoop_mapreduce_tasktracker_child_reduce.html。谢谢!
【Hadoop代码笔记】Hadoop作业提交之Child启动reduce任务的更多相关文章
- 【Hadoop代码笔记】Hadoop作业提交之Child启动map任务
		
一.概要描述 在上篇博文描述了TaskTracker启动一个独立的java进程来执行Map或Reduce任务.在本篇和下篇博文中我们会关注启动的那个入口是org.apache.hadoop.mapre ...
 - 【hadoop代码笔记】hadoop作业提交之汇总
		
一.概述 在本篇博文中,试图通过代码了解hadoop job执行的整个流程.即用户提交的mapreduce的jar文件.输入提交到hadoop的集群,并在集群中运行.重点在代码的角度描述整个流程,有些 ...
 - 【hadoop代码笔记】Mapreduce shuffle过程之Map输出过程
		
一.概要描述 shuffle是MapReduce的一个核心过程,因此没有在前面的MapReduce作业提交的过程中描述,而是单独拿出来比较详细的描述. 根据官方的流程图示如下: 本篇文章中只是想尝试从 ...
 - 【Hadoop代码笔记】目录
		
整理09年时候做的Hadoop的代码笔记. 开始. [Hadoop代码笔记]Hadoop作业提交之客户端作业提交 [Hadoop代码笔记]通过JobClient对Jobtracker的调用看详细了解H ...
 - 【Hadoop代码笔记】Hadoop作业提交之客户端作业提交
		
1. 概要描述仅仅描述向Hadoop提交作业的第一步,即调用Jobclient的submitJob方法,向Hadoop提交作业. 2. 详细描述Jobclient使用内置的JobS ...
 - 【Hadoop代码笔记】Hadoop作业提交之TaskTracker 启动task
		
一.概要描述 在上篇博文描述了TaskTracker从Jobtracker如何从JobTracker获取到要执行的Task.在从JobTracker获取到LaunchTaskAction后,执行add ...
 - 【hadoop代码笔记】Hadoop作业提交中EagerTaskInitializationListener的作用
		
在整理FairScheduler实现的task调度逻辑时,注意到EagerTaskInitializationListener类.差不多应该是job提交相关的逻辑代码中最简单清楚的一个了. todo: ...
 - 【Hadoop代码笔记】Hadoop作业提交之JobTracker等相关功能模块初始化
		
一.概要描述 本文重点描述在JobTracker一端接收作业.调度作业等几个模块的初始化工作.想过模块的介绍会在其他文章中比较详细的描述.受理作业提交在下一篇文章中会进行描述. 为了表达的尽可能清晰一 ...
 - 【Hadoop代码笔记】通过JobClient对Jobtracker的调用详细了解Hadoop RPC
		
Hadoop的各个服务间,客户端和服务间的交互采用RPC方式.关于这种机制介绍的资源很多,也不难理解,这里不做背景介绍.只是尝试从Jobclient向JobTracker提交作业这个最简单的客户端服务 ...
 
随机推荐
- 234. Palindrome Linked List
			
题目: Given a singly linked list, determine if it is a palindrome. Follow up:Could you do it in O(n) t ...
 - WCF 简单示例
			
WCF(Windows Communication Foundation,WCF)是基于Windows平台下开发和部署服务的软件开发包(Software Development Kit,SDK).WC ...
 - 如何写出优秀的研究论文 Chapter 1. How to Write an A+ Research Paper
			
This Chapter outlines the logical steps to writing a good research paper. To achieve supreme excelle ...
 - QTP10&QTP11&UFT11.5的安装和破解
			
QTP10的安装和破解方法 下载QTP10.0并安装. 安装成功后,在C:\Program Files\Common Files\Mercury Interactive下创建文件夹:License M ...
 - Android开发之定义接口暴露数据
			
写了一个网络请求的工具类,然后想要获取到网络请求的结果,在网络工具类中写了一个接口,暴露除了请求到的数据 代码: package com.lijingbo.knowweather.utils; imp ...
 - C++STL之整理算法
			
这里主要介绍颠倒.旋转.随机排列和分类4中常见的整理算法 1.颠倒(反转) void reverse(_BidIt _First, _BidIt _Last) _OutIt reverse_copy( ...
 - bzoj3242
			
如果是树,那么一定选择树的直径的中点 套了个环?裸的想法显然是断开环,然后求所有树的直径的最小值 环套树dp的一般思路,先把环放到根,把环上点下面的子树dp出来,然后再处理环上的情况 设f[i]表示以 ...
 - poj3468,poj2528
			
其实这两题都是基础的线段树,但对于我这个线段树的初学者来说,总结一下还是很有用的: poj3468显然是线段树区间求和,区间更改的问题,而poj2528是对区间染色,问有多少种颜色的问题: 线段树的建 ...
 - WinForm 禁止调整大小、禁止最大化窗口
			
这个设置代码必须添加到*.designer.cs中,就是自动隐藏的那部分: #region Windows Form Designer generated code 一般窗体设置的代码会生成到最后面, ...
 - 02day1
			
淘汰赛制 递推 [问题描述] 淘汰赛制是一种极其残酷的比赛制度.2^n名选手分别标号1,2,3,…,2^n-1,2^n,他们将要参加n轮的激烈角逐.每一轮中,将所有参加该轮的选手按标号从小到大排序后, ...