服务器准备

启动hadoop最小集群的典型配置是3台服务器, 一台作为Master, NameNode, 两台作为Slave, DataNode. 操作系统使用的Ubuntu18.04 Server, 安装过程就省略了, 使用的是LVM文件系统, XFS文件格式, 为了避免浪费空间, 除了划分1G给/boot以外, 其他都划为/

服务器规划

192.168.1.148 vm148 -- 作为master, NameNode, ResourceManager
192.168.1.149 vm149 -- 作为slave, DataNode, NodeManager
192.168.1.150 vm150 -- 作为slave, DataNode, NodeManager

注意: 这里是第一个坑, 主机名里面不能带下划线 _ , 会导致DataServer创建socket失败无法启动.

安装后的升级

sudo apt update
sudo apt upgrade

添加普通用户

用于运行hadoop的受限用户, 我习惯用tomcat作为用户名, 这里使用adduser而不是useradd, 因为后者不带参数时, 有时候不会创建home目录

sudo adduser tomcat
# 按提示输入

.如果是虚机, 这时候就可以以当前状态创建模板了.

设置hostname和hosts

# view current hostname
sudo hostnamectl status
# set
sudo hostnamectl set-hostname vm148 # add entries to hosts
sudo vi /etc/hosts
# add following lines
192.168.1.148 vm148
192.168.1.149 vm149
192.168.1.150 vm150

.依次将服务器设置为vm148, vm149, vm150. 重启后登入检查是否生效, 互相ping看看是否生效

对tomcat用户互相添加免密登录

# 生成id_rsa和id_rsa.pub
ssh-keygen
cd .ssh/
# 创建 authorized_keys
mv id_rsa.pub authorized_keys
# 注意权限必须是600
chmod 600 authorized_keys
# 将本服务器的私钥改名为id_rsa_mine
mv id_rsa id_rsa_mine
# 修改config
vi config
# 添加如下内容
Host vm149
IdentityFile ~/.ssh/id_rsa_mine
User tomcat Host vm150
IdentityFile ~/.ssh/id_rsa_mine
User tomcat Host vm148
IdentityFile ~/.ssh/id_rsa_mine
User tomcat # 如果是master机器, 还需要添加如下, 用于启动Secondary name server
Host 0.0.0.0
IdentityFile ~/.ssh/id_rsa_mine
User tomcat

将各个服务器的authorized_keys的内容互相合并, 最后各服务器的authorized_keys文件都是一样的.
在以上工作完成后, 从各个机器尝试ssh tomcat@[主机名], 确保登录没问题, 也避免在启动服务时提示发现新key是否接受

防火墙ufw

如果是初次设置, 建议关闭, 确保不会因为防火墙而导致服务启动失败, 可以等服务配置完成后, 再根据实际的端口, 打开并配置ufw

sudo ufw disable

配置JDK

将jdk解压缩至/opt/jdk, 并创建latest软链, 完成后结构如下

$ ll /opt/jdk/
total 0
drwxr-xr-x 7 root root 245 Oct 6 13:58 jdk1.8.0_192/
lrwxrwxrwx 1 root root 12 Jan 18 05:49 latest -> jdk1.8.0_192/

需要将jps软链到/usr/bin

cd /usr/bin
sudo ln -s /opt/jdk/latest/bin/jps jps

配置Hadoop

将hadoop解压缩至 /opt/hadoop, 并创建latest 软链, 完成后目录结构如下

$ ll /opt/hadoop/
total 0
drwxr-xr-x 9 root root 149 Nov 13 15:15 hadoop-2.9.2/
lrwxrwxrwx 1 root root 12 Jan 18 10:26 latest -> hadoop-2.9.2/

修改配置文件 etc/hadoop/hadoop-env.sh

需要修改的变量有两处

# The java implementation to use.
export JAVA_HOME=/opt/jdk/latest # Where log files are stored. $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=/home/tomcat/run/hadoop/logs

修改配置文件 etc/hadoop/yarn-env.sh

需要修改的变量有两处

# some Java parameters
export JAVA_HOME=/opt/jdk/latest # default log directory & file
export YARN_LOG_DIR=/home/tomcat/run/yarn/logs

修改配置文件/etc/hadoop/slaves 

将内容修改为两个slave的主机名

vm149
vm150

修改配置文件/etc/hadoop/core-site.xml

添加以下内容. 配置明细需要参考 share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/tomcat/run/hadoop</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://vm148:9000</value>
</property>
</configuration>

修改配置文件/etc/hadoop/hdfs-site.xml

添加以下内容

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

修改配置文件/etc/hadoop/mapred-site.xml

添加以下内容

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

修改配置文件/etc/hadoop/yarn-site.xml

添加以下内容. 配置明细需要参考 share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

<configuration>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>vm148</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

将配置好的hadoop, 按当前的目录结构, 复制到另外两个服务器中

启动Hadoop

第一次启动前, 需要format nameserver, 在master服务器上执行

/opt/hadoop/latest/bin/hdfs namenode -format

然后启动hdfs服务

/opt/hadoop/latest/sbin/start-dfs.sh

然后启动yarn服务

/opt/hadoop/latest/sbin/start-yarn.sh

每一步, 都需要用jps命令查看服务是否正常启动, 对于master服务器, 正常启动后应该显示如下进程

tomcat@vm148:/opt$ jps
3173 SecondaryNameNode
3495 ResourceManager
4583 Jps
2906 NameNode

slave服务器

tomcat@vm149:~/run$ jps
3074 NodeManager
2691 DataNode
3591 Jps

.

WEB访问

服务启动后, 可以通过 http://vm148:50070/ 访问web界面

服务端口

master端

21, FTP for ?

8030, YARN resourcemanager scheduler
8031, YARN resourcemanager tracker
8032, YARN resourcemanager
8033, YARN resourcemanager admin
8088, YARN resourcemanager webapp
8090, YARN resourcemanager webapp https 9000, HDFS 50070, WEB UI
50090,

slave, data node端

50075

运行WordCount Example

例子代码来源: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

首先编译java, 生成class, 生成jar. 因为JAVA_HOME已经在hadoop里配置过, 而PATH在此环境不需要, 只需要配置一个tools.jar的classpath就可以了

export HADOOP_CLASSPATH=/opt/jdk/latest/lib/tools.jar
/opt/hadoop/latest/bin/hadoop com.sun.tools.javac.Main WordCount.java
/opt/jdk/latest/bin/jar cf wc.jar WordCount*.class

然后将两个输入文件上载到hdfs.

/opt/hadoop/latest/bin/hadoop fs -put file01 /workspace/input/
/opt/hadoop/latest/bin/hadoop fs -ls /workspace/input
/opt/hadoop/latest/bin/hadoop fs -put file02 /workspace/input/
/opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file01
/opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file02

一开始我在这里遇到了个坑: 我把文件放到/tmp/下面去了, 把/tmp作为输入目录, 结果在运行中yarn会把staging信息存在 /tmp/hadoop-yarn/staging 文件中, 然后任务就抛异常了. 教训就是: 任务文件不要放到/tmp下

执行任务

/opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output

这里最后一个路径是输出路径, 这个路径在运行任务前不能存在, 否则也会报错

最后的执行结果

tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output
19/01/30 08:24:55 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.1.148:8032
19/01/30 08:24:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/30 08:24:56 INFO input.FileInputFormat: Total input files to process : 2
19/01/30 08:24:56 INFO mapreduce.JobSubmitter: number of splits:2
19/01/30 08:24:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/30 08:24:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0004
19/01/30 08:24:56 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0004
19/01/30 08:24:56 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0004/
19/01/30 08:24:56 INFO mapreduce.Job: Running job: job_1547812325179_0004
19/01/30 08:25:03 INFO mapreduce.Job: Job job_1547812325179_0004 running in uber mode : false
19/01/30 08:25:03 INFO mapreduce.Job: map 0% reduce 0%
19/01/30 08:25:10 INFO mapreduce.Job: map 100% reduce 0%
19/01/30 08:25:18 INFO mapreduce.Job: map 100% reduce 100%
19/01/30 08:25:18 INFO mapreduce.Job: Job job_1547812325179_0004 completed successfully
19/01/30 08:25:18 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=97
FILE: Number of bytes written=594622
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=266
HDFS: Number of bytes written=38
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=10309
Total time spent by all reduces in occupied slots (ms)=3850
Total time spent by all map tasks (ms)=10309
Total time spent by all reduce tasks (ms)=3850
Total vcore-milliseconds taken by all map tasks=10309
Total vcore-milliseconds taken by all reduce tasks=3850
Total megabyte-milliseconds taken by all map tasks=10556416
Total megabyte-milliseconds taken by all reduce tasks=3942400
Map-Reduce Framework
Map input records=2
Map output records=10
Map output bytes=96
Map output materialized bytes=103
Input split bytes=210
Combine input records=10
Combine output records=8
Reduce input groups=5
Reduce shuffle bytes=103
Reduce input records=8
Reduce output records=5
Spilled Records=16
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=379
CPU time spent (ms)=2090
Physical memory (bytes) snapshot=778280960
Virtual memory (bytes) snapshot=5914849280
Total committed heap usage (bytes)=507510784
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=56
File Output Format Counters
Bytes Written=38
tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -ls /workspace/output
Found 2 items
-rw-r--r-- 2 tomcat supergroup 0 2019-01-30 08:25 /workspace/output/_SUCCESS
-rw-r--r-- 2 tomcat supergroup 38 2019-01-30 08:25 /workspace/output/part-r-00000
tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -cat /workspace/output/part-r-00000
Day 2
Good 2
Hadoop 2
Hello 2
World 2

一个简单的Map Reduce 例子

输入的内容格式是这样的, 每一行是一个日志记录, 记录了用户, IP和时间戳, 需要统计每个 (用户+IP) 出现的次数

1571	76	738	legnd	166.111.8.133	870876781
1572 121 697 kuoc 202.116.65.16 870909489
1573 121 697 kuoc 202.116.65.16 870910644
1574 121 739 maerick 870926284

代码 pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>com.rockbb</groupId>
<artifactId>hdtask</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version> <name>HD Task</name> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties> <dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.2</version>
<scope>test</scope>
</dependency> <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.4.1</version>
</dependency> <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.4.1</version>
</dependency> <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.4.1</version>
</dependency> </dependencies> <build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>

代码 DataBean.java

package com.rockbb.hdtask;

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException; public class DataBean implements Writable {
private String nameIp;
private long count; public DataBean() {
} public DataBean(String nameIp, long count) {
this.nameIp = nameIp;
this.count = count;
} public String getNameIp() {
return nameIp;
} public void setNameIp(String nameIp) {
this.nameIp = nameIp;
} public long getCount() {
return count;
} public void setCount(long count) {
this.count = count;
} /**
* Important: this will be use for the final output.
*/
@Override
public String toString() {
return this.nameIp + "\t" + this.count;
} @Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(nameIp);
dataOutput.writeLong(count);
} @Override
public void readFields(DataInput dataInput) throws IOException {
this.nameIp = dataInput.readUTF();
this.count = dataInput.readLong();
}
}

代码 IpCount.java

package com.rockbb.hdtask;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class IpCount {
public static class IpMapper extends Mapper<LongWritable, Text, Text, DataBean> {
@Override
public void map(LongWritable keyIn, Text valueIn, Context context) throws IOException, InterruptedException {
String line = valueIn.toString();
String[] fields = line.split("\t");
String keyOut = fields[3] + '-' + fields[4];
long valueOut = 1;
DataBean bean = new DataBean(keyOut, valueOut);
context.write(new Text(keyOut), bean);
}
} public static class IpReducer extends Reducer<Text, DataBean, Text, DataBean> {
@Override
public void reduce(Text keyIn, Iterable<DataBean> valuesIn, Context context) throws IOException, InterruptedException {
long total = 0;
for (DataBean bean : valuesIn) {
total += bean.getCount();
}
DataBean bean = new DataBean("", total);
context.write(keyIn, bean);
}
} public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(IpCount.class);
job.setMapperClass(IpMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DataBean.class);
FileInputFormat.addInputPath(job, new Path(args[0])); job.setReducerClass(IpReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DataBean.class);
FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true);
}
}

运行命令

/opt/hadoop/latest/bin/hadoop jar hdtask.jar com.rockbb.hdtask.IpCount /workspace/input/ /workspace/output3

.数据文件有2.3GB, 因为默认的block大小为128MB, 所以提交后产生了19个Map任务和一个Reduce任务, 任务的命令行输出

19/01/31 10:08:01 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.31.148:8032
19/01/31 10:08:02 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/31 10:08:02 INFO input.FileInputFormat: Total input files to process : 1
19/01/31 10:08:02 INFO mapreduce.JobSubmitter: number of splits:19
19/01/31 10:08:02 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/31 10:08:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0008
19/01/31 10:08:03 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0008
19/01/31 10:08:03 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0008/
19/01/31 10:08:03 INFO mapreduce.Job: Running job: job_1547812325179_0008
19/01/31 10:08:13 INFO mapreduce.Job: Job job_1547812325179_0008 running in uber mode : false
19/01/31 10:08:13 INFO mapreduce.Job: map 0% reduce 0%
19/01/31 10:08:41 INFO mapreduce.Job: map 11% reduce 0%
19/01/31 10:08:45 INFO mapreduce.Job: map 21% reduce 0%
19/01/31 10:08:47 INFO mapreduce.Job: map 23% reduce 0%
19/01/31 10:08:51 INFO mapreduce.Job: map 28% reduce 0%
19/01/31 10:08:53 INFO mapreduce.Job: map 30% reduce 0%
19/01/31 10:08:57 INFO mapreduce.Job: map 31% reduce 0%
19/01/31 10:08:59 INFO mapreduce.Job: map 38% reduce 0%
19/01/31 10:09:09 INFO mapreduce.Job: map 39% reduce 0%
19/01/31 10:09:10 INFO mapreduce.Job: map 40% reduce 0%
19/01/31 10:09:11 INFO mapreduce.Job: map 44% reduce 0%
19/01/31 10:09:14 INFO mapreduce.Job: map 46% reduce 0%
19/01/31 10:09:16 INFO mapreduce.Job: map 48% reduce 0%
19/01/31 10:09:17 INFO mapreduce.Job: map 49% reduce 0%
19/01/31 10:09:22 INFO mapreduce.Job: map 55% reduce 0%
19/01/31 10:09:24 INFO mapreduce.Job: map 56% reduce 0%
19/01/31 10:09:28 INFO mapreduce.Job: map 61% reduce 0%
19/01/31 10:09:40 INFO mapreduce.Job: map 64% reduce 0%
19/01/31 10:09:42 INFO mapreduce.Job: map 64% reduce 7%
19/01/31 10:09:46 INFO mapreduce.Job: map 66% reduce 7%
19/01/31 10:09:48 INFO mapreduce.Job: map 68% reduce 9%
19/01/31 10:09:52 INFO mapreduce.Job: map 71% reduce 9%
19/01/31 10:09:54 INFO mapreduce.Job: map 71% reduce 12%
19/01/31 10:09:58 INFO mapreduce.Job: map 73% reduce 12%
19/01/31 10:09:59 INFO mapreduce.Job: map 74% reduce 12%
19/01/31 10:10:01 INFO mapreduce.Job: map 75% reduce 12%
19/01/31 10:10:04 INFO mapreduce.Job: map 80% reduce 12%
19/01/31 10:10:06 INFO mapreduce.Job: map 81% reduce 12%
19/01/31 10:10:10 INFO mapreduce.Job: map 85% reduce 12%
19/01/31 10:10:12 INFO mapreduce.Job: map 86% reduce 12%
19/01/31 10:10:13 INFO mapreduce.Job: map 87% reduce 12%
19/01/31 10:10:15 INFO mapreduce.Job: map 88% reduce 12%
19/01/31 10:10:18 INFO mapreduce.Job: map 88% reduce 16%
19/01/31 10:10:22 INFO mapreduce.Job: map 90% reduce 16%
19/01/31 10:10:23 INFO mapreduce.Job: map 91% reduce 16%
19/01/31 10:10:24 INFO mapreduce.Job: map 91% reduce 18%
19/01/31 10:10:25 INFO mapreduce.Job: map 92% reduce 18%
19/01/31 10:10:29 INFO mapreduce.Job: map 93% reduce 18%
19/01/31 10:10:31 INFO mapreduce.Job: map 93% reduce 21%
19/01/31 10:10:32 INFO mapreduce.Job: map 94% reduce 21%
19/01/31 10:10:34 INFO mapreduce.Job: map 96% reduce 21%
19/01/31 10:10:35 INFO mapreduce.Job: map 97% reduce 21%
19/01/31 10:10:37 INFO mapreduce.Job: map 98% reduce 23%
19/01/31 10:10:38 INFO mapreduce.Job: map 99% reduce 23%
19/01/31 10:10:41 INFO mapreduce.Job: map 100% reduce 23%
19/01/31 10:10:43 INFO mapreduce.Job: map 100% reduce 30%
19/01/31 10:10:49 INFO mapreduce.Job: map 100% reduce 33%
19/01/31 10:11:25 INFO mapreduce.Job: map 100% reduce 67%
19/01/31 10:11:31 INFO mapreduce.Job: map 100% reduce 70%
19/01/31 10:11:37 INFO mapreduce.Job: map 100% reduce 74%
19/01/31 10:11:43 INFO mapreduce.Job: map 100% reduce 78%
19/01/31 10:11:49 INFO mapreduce.Job: map 100% reduce 83%
19/01/31 10:11:55 INFO mapreduce.Job: map 100% reduce 86%
19/01/31 10:12:01 INFO mapreduce.Job: map 100% reduce 89%
19/01/31 10:12:07 INFO mapreduce.Job: map 100% reduce 93%
19/01/31 10:12:13 INFO mapreduce.Job: map 100% reduce 97%
19/01/31 10:12:18 INFO mapreduce.Job: map 100% reduce 100%
19/01/31 10:12:19 INFO mapreduce.Job: Job job_1547812325179_0008 completed successfully
19/01/31 10:12:19 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=6635434217
FILE: Number of bytes written=9269615741
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2551940756
HDFS: Number of bytes written=134288980
HDFS: Number of read operations=60
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=3
Launched map tasks=22
Launched reduce tasks=1
Data-local map tasks=22
Total time spent by all maps in occupied slots (ms)=1737403
Total time spent by all reduces in occupied slots (ms)=178563
Total time spent by all map tasks (ms)=1737403
Total time spent by all reduce tasks (ms)=178563
Total vcore-milliseconds taken by all map tasks=1737403
Total vcore-milliseconds taken by all reduce tasks=178563
Total megabyte-milliseconds taken by all map tasks=1779100672
Total megabyte-milliseconds taken by all reduce tasks=182848512
Map-Reduce Framework
Map input records=49458230
Map output records=49458230
Map output bytes=2531297616
Map output materialized bytes=2630214190
Input split bytes=2052
Combine input records=0
Combine output records=0
Reduce input groups=5453085
Reduce shuffle bytes=2630214190
Reduce input records=49458230
Reduce output records=5453085
Spilled Records=174185483
Shuffled Maps =19
Failed Shuffles=0
Merged Map outputs=19
GC time elapsed (ms)=9585
CPU time spent (ms)=389790
Physical memory (bytes) snapshot=5763260416
Virtual memory (bytes) snapshot=39333715968
Total committed heap usage (bytes)=4077912064
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2551938704
File Output Format Counters
Bytes Written=134288980

  

在Ubuntu18.04下配置hadoop集群的更多相关文章

  1. CentOS下配置Hadoop集群:java.net.NoRouteToHostException: No route to host问题的解决

    我用的是hadoop 1.2.1 遇到的问题是: hadoop中datanode无法启动,报Caused by: java.net.NoRouteToHostException: No route t ...

  2. Linux中安装配置hadoop集群

    一. 简介 参考了网上许多教程,最终把hadoop在ubuntu14.04中安装配置成功.下面就把详细的安装步骤叙述一下.我所使用的环境:两台ubuntu 14.04 64位的台式机,hadoop选择 ...

  3. linux下配置tomcat集群的负载均衡

    linux下配置tomcat集群的负载均衡 一.首先了解下与集群相关的几个概念集群:集群是一组协同工作的服务实体,用以提供比单一服务实体更具扩展性与可用性的服务平台.在客户端看来,一个集群就象是一个服 ...

  4. windows下配置redis集群,启动节点报错:createing server TCP listening socket *:7000:listen:Unknown error

    windows下配置redis集群,启动节点报错:createing server TCP listening socket *:7000:listen:Unknown error 学习了:https ...

  5. Ubuntu18.04下配置深度学习开发环境

    在Ubuntu18.04下配置深度学习/机器学习开发环境 1.下载并安装Anaconda 下载地址:https://www.anaconda.com/distribution/#linux 安装步骤: ...

  6. hadoop集群环境搭建之安装配置hadoop集群

    在安装hadoop集群之前,需要先进行zookeeper的安装,请参照hadoop集群环境搭建之zookeeper集群的安装部署 1 将hadoop安装包解压到 /itcast/  (如果没有这个目录 ...

  7. Linux下搭建Hadoop集群

    本文地址: 1.前言 本文描述的是如何使用3台Hadoop节点搭建一个集群.本文中,使用的是三个Ubuntu虚拟机,并没有使用三台物理机.在使用物理机搭建Hadoop集群的时候,也可以参考本文.首先这 ...

  8. 在Ubuntu18.04下配置HBase

    HBase在HDFS基础上提供了高可靠, 列存储, 可扩展的数据库系统. HBase仅能通过主键(row key)和主键的range来检索数据, 主要用来存储非结构化和半结构化的松散数据. 与Hado ...

  9. centos7配置hadoop集群

    一:测试环境搭建规划: 主机名称 IP 用户 HDFS YARN hadoop11 192.168.1.101 hadoop NameNode,DataNode NodeManager hadoop1 ...

随机推荐

  1. mybatis的xml处理大于和小于号问题

    https://blog.csdn.net/u022812849/article/details/42123007

  2. HDU 1540 Tunnel Warfare(经典)(区间合并)【线段树】

    <题目链接> 题目大意: 一个长度为n的线段,下面m个操作 D x 表示将单元x毁掉 R  表示修复最后毁坏的那个单元 Q x  询问这个单元以及它周围有多少个连续的单元,如果它本身已经被 ...

  3. UVa 679 - Dropping Balls【二叉树】【思维题】

    题目链接 题目大意: 小球从一棵所有叶子深度相同的二叉树的顶点开始向下落,树开始所有节点都为0.若小球落到节点为0的则往左落,否则向右落.并且小球会改变它经过的节点,0变1,1变0.给定树的深度D和球 ...

  4. Stm32串口通信(USART)

    Stm32串口通信(UART) 串口通信的分类 串口通信三种传递方式 串口通信的通信方式 串行通信的方式: 异步通信:它用一个起始位表示字符的开始,用停止位表示字符的结束.其每帧的格式如下: 在一帧格 ...

  5. 001.Oracle安装部署-本地文件系统

    一 环境准备 安装包:linux.x64_11gR2_database_1of2.zip linux.x64_11gR2_database_2of2.zip 二 安装Oracle准备 2.1 用户名/ ...

  6. 多媒体开发(7):编译Android与iOS平台的FFmpeg

    编译FFmpeg,一个古老的话题,但小程还是介绍一遍,就当记录.之前介绍怎么给视频添加水印时,就已经提到FFmpeg的编译,并且在编译时指定了滤镜的功能. 但是,在手机盛行的时代,读者可能更需要的是能 ...

  7. Linux下的计划任务at,batch,crontab

    0x00前言: 继上次的windows计划任务后,拓展研究下linux下的计划任务,能够执行计划任务有3个命令at,batch,crontab 所谓的计划任务就是定时启动某个程序,可以是一组shell ...

  8. Misunderstood-Missing-逆向DP

    Misunderstood … Missing 记忆深刻......打铁没做出来的题 题意 : 打怪,有 A 的攻击力,有 D 的成长,初始均为 0,有 n 轮. 同时有三个数组 a[1:n],b[1 ...

  9. SpringBoot整合Mybatis-Plus

    这篇文章介绍一个SpringBoot整合Mybatis-Plus,提供一个小的Demo供大家参考. 已经很久没有写文章了,最近家里有点事刚刚处理完,顺便也趁机休息了一段时间.刚回到公司看了一下码云,发 ...

  10. BZOJ3457 : Ring

    根据Polya定理: \[ans=\frac{\sum_{d|n}\varphi(d)cal(\frac{n}{d})}{n}\] 其中$cal(n)$表示长度为$n$的无限循环后包含$S$的串的数量 ...