前提:

①已经搭建好zk

②已经安装好JDK

正文开始:

首先从官网下载hadoop 2.7.3 (虽然官网3.0都出了。但是目前还没经过完全的测试。。待测试后。。。)

一、hadoop-env.sh(环境变量相关)

export JAVA_HOME=/app/jdk/jdk1.8.0_92
export HOME=/app/hadoop
export HADOOP_HOME=$HOME
export HADOOP_COMMON_HOME=$HOME
export HADOOP_MAPRED_HOME=$HOME
export HADOOP_HDFS_HOME=$HOME
export YARN_HOME=$HOME
export CLASSPATH=.:$HADOOP_HOME/lib:$SQOOP_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/app/hadoop/tmp
export YARN_PID_DIR=/app/hadoop/tmp
export HADOOP_LOG_DIR="/log/hadoop"
export YARN_LOG_DIR=/log/yarn

#export HADOOP_HEAPSIZE=4096
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
export HADOOP_NAMENODE_OPTS="-Xmx80G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_DATANODE_OPTS="-Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options. Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol. This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored. $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

二、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bdp-core</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>2880</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/app/hadoop/etc/hadoop/rack.sh</value>
</property>
</configuration>

三、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>bdp-core</value>
</property>
<property>
<name>dfs.ha.namenodes.bdp-core</name>
<value>nn1,nn2</value>
</property>
<!--NameNode1 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:8020</value>
</property>
<!--NameNode2 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:8020</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/nn/local</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:50070</value>
</property>
<!--journal node的地址-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CNSZ17PL1523:8485;CNSZ17PL1524:8485;CNSZ17PL1525:8485;CNSZ17PL1526:8485;CNSZ17PL1527:8485;CNSZ17PL1528:8485;CNSZ17PL1529:8485/bdp-core</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.new-epoch.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>60000</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.bdp-core</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
<description>Number of replication for each chunk.</description>
</property>
<!--需要根据实际配置进行修改-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/HDATA/12/dfs/local,/HDATA/11/dfs/local,/HDATA/10/dfs/local,/HDATA/9/dfs/local,/HDATA/8/dfs/local,/HDATA/7/dfs/local,/HDATA/6/dfs/local,/HDATA/5/dfs/local,/HDATA/4/dfs/local,/HDATA/3/dfs/local,/HDATA/2/dfs/local,/HDATA/1/dfs/local</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/app/hadoop/etc/hadoop/exclude.list</value>
<description> List of nodes to decommission </description>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
<value>10737418240</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>0.75</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>1000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
<value>10000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/app/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>40</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

四、yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/yarn/local</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/yarn/log</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://bdp-core/var/log/hadoop-yarn/apps</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<!-- resourcemanager config -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>CNSZ17PL1523</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>CNSZ17PL1524</value>
</property>
<!-- fair scheduler -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/app/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/app/hadoop/etc/hadoop/yarn.exclude</value>
<final>true</final>
</property>
<!-- ZKRMStateStore config -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>cnsz23pl0073:2181,cnsz23pl0069:2181,cnsz23pl0070:2181,cnsz23pl0071:2181,cnsz23pl0072:2181</value>
</property>
-->
<!-- applications manager interface -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>CNSZ17PL1523:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>CNSZ17PL1524:23140</value>
</property>
<!-- scheduler interface -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>CNSZ17PL1523:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>CNSZ17PL1524:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>CNSZ17PL1523:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>CNSZ17PL1524:23141</value>
</property>
<!-- RM resource-tracker interface -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>CNSZ17PL1523:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>CNSZ17PL1524:23125</value>
</property>
<!-- RM web application interface -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>CNSZ17PL1523:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>CNSZ17PL1524:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>CNSZ17PL1523:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>CNSZ17PL1524:23189</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://CNSZ17PL1525:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
<!-- Node Manager Configs -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///HDATA/11/yarn/local,file:///HDATA/10/yarn/local,file:///HDATA/9/yarn/local,file:///HDATA/8/yarn/local,file:///HDATA/7/yarn/local,file:///HDATA/6/yarn/local,file:///HDATA/5/yarn/local,file:///HDATA/4/yarn/local,file:///HDATA/3/yarn/local,file:///HDATA/2/yarn/local,file:///HDATA/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///HDATA/11/yarn/logs,file:///HDATA/10/yarn/logs,file:///HDATA/9/yarn/logs,file:///HDATA/8/yarn/logs,file:///HDATA/7/yarn/logs,file:///HDATA/6/yarn/logs,file:///HDATA/5/yarn/logs,file:///HDATA/4/yarn/logs,file:///HDATA/3/yarn/logs,file:///HDATA/2/yarn/logs,file:///HDATA/1/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
<!-- tuning -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>120000</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
<!-- tuning yarn container -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>1209600</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
</configuration>

五、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>CNSZ17PL1525:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>CNSZ17PL1525:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>

<!-- tuning mapreduce -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>5120</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx4096m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>13312</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx10649m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.jobhistory.max-age-ms</name>
<value>1296000000</value>
<source>mapred-default.xml</source>
</property>
<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>200000</value>
<source>mapred-default.xml</source>
</property>
</configuration>

5个配置文件配完就ok了。其中的参数意思会有专门帖子讲,现在分发到每台机器上,执行脚本新建运行的时候的data  log等需要的目录,   dir.sh  (每台机器都执行。改成上述配置文件的hadoop2.7.3安装包也都每台机器都分发。每台机器的环境变量都需要增加。)

环境变量类似:

export ZOOKEEPER_HOME=/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/app/jdk
export JRE_HOME=/app/jdk/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME

(dir,sh 脚本)

#!/bin/bash
groupadd hadoop
groupadd hdfs
useradd -g hdfs -G hadoop hdfs

echo "sf#123456" | passwd --stdin hdfs

for i in 1 2 3 4 5 6 7 8 9 10 11
do
datadir=/HDATA/$i/dfs/local
mrdir=/HDATA/$i/mapred/local
yarndir=/HDATA/$i/yarn/local
yarnlog=/HDATA/$i/yarn/logs
mkdir -p $datadir
mkdir -p $mrdir
mkdir -p $yarndir
mkdir -p $yarnlog
echo "$datadir $mrdir $yarndir $yarnlog make over and chown hdfs:hadoop"
chown hdfs:hadoop -R $datadir $mrdir

chown yarn:yarn -R  $yarndir $yarnlog

done

#log
mkdir -p /data/dfs/nn/local
chown hdfs:hadoop /data/dfs/nn/local
mkdir -p /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs
chown hdfs:hadoop /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs

最后将hadoop的应用目录赋给hdfs:hadoop

然后启动过程:

1. 启动 ZooKeeper 集群
在集群中安装 ZooKeeper 的主机上启动 ZooKeeper 服务。在本教程中也就是在 slave51、slave52、slave53 的主机上启动相应进程。分别登陆到三台机子上执行:
zookeeper的启动在每台zookeeper节点执行这句命令

zkServer.sh start

2. 格式化 ZooKeeper 集群
在任意的 namenode 上都可以执行,笔者还是选择了 master1 主机执行格式化命令(namenode1上执行)

hdfs zkfc -formatZK

3. 启动 JournalNode 集群
分别在 slave1、slave2、slave3 上执行以下命令(所有的journal节点)

hadoop-daemon.sh start journalnode

4. 格式化集群的 NameNode

在 master1 的主机上执行以下命令,以格式化 namenode:(namenode1节点执行)

hdfs namenode -format

5. 启动刚格式化的 NameNode
刚在 master1 上格式化了 namenode ,故就在 master1上执行(namenode1节点执行)

hadoop-daemon.sh start namenode

6. 同步 NameNode1 元数据到 NameNode2 上
复制你 NameNode 上的元数据目录到另一个 NameNode,也就是此处的 master5 复制元数据到 master52 上。在 master52 上执行以下命令:(namenode2节点执行)

hdfs namenode -bootstrapStandby

7. 启动 NameNode2
master2 主机拷贝了元数据之后,就接着启动 namenode 进程了,执行(namenode2节点执行)

hadoop-daemon.sh start namenode

8. 启动集群中所有的DataNode(所有datanode节点执行)

hadoop-daemon.sh start datanode

9. 启动 ZKFC

在 master1 和 master2 的主机上分别执行如下命令:((namenode1节点执行)&&(namenode2节点执行))

hadoop-daemon.sh start zkfc

10. 开启历史日志服务

在 master1和 master2 的主机上执行((namenode1节点执行)&&(namenode2节点执行))

mr-jobhistory-daemon.sh start historyserver

11. 在 RM1 启动 YARN

在 master1的主机上执行以下命令:((namenode1节点执行))

yarn-daemon.sh start resourcemanager

12. 在 RM2 单独启动 YARN

虽然上一步启动了 YARN ,但是在 master2 上是没有相应的 ResourceManager 进程,故需要在 master2 主机上单独启动:(namenode2节点执行)

yarn-daemon.sh start resourcemanager

13.启动所有datanode 的 nodemanager(所有datanode节点)

yarn-daemon.sh start nodemanager

Hadoop生产环境配置文件的更多相关文章

  1. Hadoop生产环境搭建(含HA、Federation)

    Hadoop生产环境搭建 1. 将安装包hadoop-2.x.x.tar.gz存放到某一目录下,并解压. 2. 修改解压后的目录中的文件夹etc/hadoop下的配置文件(若文件不存在,自己创建.) ...

  2. SpringBoot yml 配置 多配置文件,开发环境,生产环境配置文件分开

    原文地址:https://www.cnblogs.com/baoyi/p/SpringBoot_YML.html 1. 在 spring boot 中,有两种配置文件,一种是application.p ...

  3. 转 通过 spring 容器内建的 profile 功能实现开发环境、测试环境、生产环境配置自动切换

                                      软件开发的一般流程为工程师开发 -> 测试 -> 上线,因此就涉及到三个不同的环境,开发环境.测试环境以及生产环境,通常 ...

  4. webpack(7)-生产环境

    development(开发环境) 和 production(生产环境) 这两个环境下的构建目标存在着巨大差异.在开发环境中,我们需要:强大的 source map 和一个有着 live reload ...

  5. Spring.profile配合Jenkins发布War包,实现开发、测试和生产环境的按需切换

    前两篇不错 Spring.profile实现开发.测试和生产环境的配置和切换 - Strugglion - 博客园https://www.cnblogs.com/strugglion/p/709102 ...

  6. 一种简单的生产环境部署Node.js程序方法

    最近在部署Node.js程序时,写了段简单的脚本,发觉还挺简单的,忍不住想与大家分享. 配置文件 首先,本地测试环境和生产环境的数据库连接这些配置信息是不一样的,需要将其分开为两个文件存储 到conf ...

  7. spring boot区分生产环境和开发环境

    回顾一下spring boot使用基础,做个笔记. 通过配置文件,设置项目的开发环境和生成环境. 项目目录结构: application-dev.yml是开发环境配置文件,application-pr ...

  8. Spring.profile实现开发、测试和生产环境的配置和切换

    软件开发过程一般涉及“开发 -> 测试 -> 部署上线”多个阶段,每个阶段的环境的配置参数会有不同,如数据源,文件路径等.为避免每次切换环境时都要进行参数配置等繁琐的操作,可以通过spri ...

  9. 使用Asset Pipeline管理rails生产环境静态资源实现步骤

    1.    修改项目中指向静态资源文件的链接 a)     访问静态资源文件 <%= stylesheet_link_tag "application", media: &q ...

随机推荐

  1. 使用Java泛型返回动态类型

    返回一个指定类型的集合,并且clazz必须继承IGeoLog对象或者是其本身 <T extends IGeoLog> List<T> getLogListSql(Class&l ...

  2. docker 搭建简易仓库registry

    下载仓库镜像: docker pull  registry:2 运行仓库库镜像: docker run -d  -p 5000:5000  -v /usr/local/registry:/var/li ...

  3. 洛谷P1434滑雪题解及记忆化搜索的基本步骤

    题目 滑雪是一道dp及记忆化搜索的经典题目. 所谓记忆化搜索便是在搜索的过程中边记录边搜索的一个算法. 当下次搜到这里时,便直接使用. 而且记忆化搜索一定要满足无后效性,为什么呢,因为如果不满足无后效 ...

  4. 【BZOJ3814】【清华集训2014】简单回路 状压DP

    题目描述 给你一个\(n\times m\)的网格图和\(k\)个障碍,有\(q\)个询问,每次问你有多少个不同的不经过任何一个障碍点且经过\((x,y)\)与\((x+1,y)\)之间的简单回路 \ ...

  5. 【转】无法获得锁 /var/lib/dpkg/lock - open (11: 资源暂时不可用) ubuntu 安装vim 及遇到的错误处理

    今天,处理完问题,闲来无事,打算在虚拟机中的Ubuntu中练习shell脚本编写. 无奈,虚拟机系统所装的只有vi,这个编辑软件对于我们来说还是比较不习惯的,所以打算安装vim.好了,闲言少叙. 安装 ...

  6. IHttpHandler处理请求api

    使用IHttpHandler处理请求,实现webapi功能. 研究asp.net管道处理事件后,可用此法实现webapi功能. 测试环境 VS2017 WIN10 IIS10 集成模式 关键接口类两个 ...

  7. nodejs的某些api~(六)HTTPS

    node的HTTPS模块接口与HTTP其实差不多,就是多了一个认证证书,私钥的配置等等,API都相似的. 在客户端服务器通信的方法中,只有HTTPS是最安全的,它的原理是客户端和服务器发送自己的公钥, ...

  8. Codeforces Round #493 (Div. 2)

    C - Convert to Ones 给你一个01串 x是反转任意子串的代价 y是将子串全部取相反的代价 问全部变成1的最小代价 两种可能 一种把1全部放到一边 然后把剩下的0变成1  要么把所有的 ...

  9. docker file 示例

    报错 Cannot connect to the Docker daemon. Is the docker daemon running on this host? 这个错误只要输入docker -d ...

  10. java 子类强转父类 父类强转子类

    Java 继承 继承就是子类继承父类的特征和行为,使得子类对象(实例)具有父类的实例域和方法,或子类从父类继承方法,使得子类具有父类相同的行为. Java 子类强转父类 父类引用指向子类对象: jav ...