Hadoop生产环境配置文件
前提:
①已经搭建好zk
②已经安装好JDK
正文开始:
首先从官网下载hadoop 2.7.3 (虽然官网3.0都出了。但是目前还没经过完全的测试。。待测试后。。。)
一、hadoop-env.sh(环境变量相关)
export JAVA_HOME=/app/jdk/jdk1.8.0_92
export HOME=/app/hadoop
export HADOOP_HOME=$HOME
export HADOOP_COMMON_HOME=$HOME
export HADOOP_MAPRED_HOME=$HOME
export HADOOP_HDFS_HOME=$HOME
export YARN_HOME=$HOME
export CLASSPATH=.:$HADOOP_HOME/lib:$SQOOP_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/app/hadoop/tmp
export YARN_PID_DIR=/app/hadoop/tmp
export HADOOP_LOG_DIR="/log/hadoop"
export YARN_LOG_DIR=/log/yarn
#export HADOOP_HEAPSIZE=4096
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
export HADOOP_NAMENODE_OPTS="-Xmx80G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_DATANODE_OPTS="-Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""
# Extra Java runtime options. Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"
# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol. This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
# Where log files are stored. $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""
###
# Advanced Users Only!
###
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
二、core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bdp-core</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>2880</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/app/hadoop/etc/hadoop/rack.sh</value>
</property>
</configuration>
三、hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>bdp-core</value>
</property>
<property>
<name>dfs.ha.namenodes.bdp-core</name>
<value>nn1,nn2</value>
</property>
<!--NameNode1 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:8020</value>
</property>
<!--NameNode2 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:8020</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/nn/local</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:50070</value>
</property>
<!--journal node的地址-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CNSZ17PL1523:8485;CNSZ17PL1524:8485;CNSZ17PL1525:8485;CNSZ17PL1526:8485;CNSZ17PL1527:8485;CNSZ17PL1528:8485;CNSZ17PL1529:8485/bdp-core</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.new-epoch.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.bdp-core</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
<description>Number of replication for each chunk.</description>
</property>
<!--需要根据实际配置进行修改-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/HDATA/12/dfs/local,/HDATA/11/dfs/local,/HDATA/10/dfs/local,/HDATA/9/dfs/local,/HDATA/8/dfs/local,/HDATA/7/dfs/local,/HDATA/6/dfs/local,/HDATA/5/dfs/local,/HDATA/4/dfs/local,/HDATA/3/dfs/local,/HDATA/2/dfs/local,/HDATA/1/dfs/local</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/app/hadoop/etc/hadoop/exclude.list</value>
<description> List of nodes to decommission </description>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
<value>10737418240</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>0.75</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>1000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
<value>10000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/app/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>40</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
四、yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/yarn/local</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/yarn/log</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://bdp-core/var/log/hadoop-yarn/apps</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<!-- resourcemanager config -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>CNSZ17PL1523</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>CNSZ17PL1524</value>
</property>
<!-- fair scheduler -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/app/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/app/hadoop/etc/hadoop/yarn.exclude</value>
<final>true</final>
</property>
<!-- ZKRMStateStore config -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>cnsz23pl0073:2181,cnsz23pl0069:2181,cnsz23pl0070:2181,cnsz23pl0071:2181,cnsz23pl0072:2181</value>
</property>
-->
<!-- applications manager interface -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>CNSZ17PL1523:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>CNSZ17PL1524:23140</value>
</property>
<!-- scheduler interface -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>CNSZ17PL1523:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>CNSZ17PL1524:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>CNSZ17PL1523:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>CNSZ17PL1524:23141</value>
</property>
<!-- RM resource-tracker interface -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>CNSZ17PL1523:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>CNSZ17PL1524:23125</value>
</property>
<!-- RM web application interface -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>CNSZ17PL1523:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>CNSZ17PL1524:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>CNSZ17PL1523:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>CNSZ17PL1524:23189</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://CNSZ17PL1525:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
<!-- Node Manager Configs -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///HDATA/11/yarn/local,file:///HDATA/10/yarn/local,file:///HDATA/9/yarn/local,file:///HDATA/8/yarn/local,file:///HDATA/7/yarn/local,file:///HDATA/6/yarn/local,file:///HDATA/5/yarn/local,file:///HDATA/4/yarn/local,file:///HDATA/3/yarn/local,file:///HDATA/2/yarn/local,file:///HDATA/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///HDATA/11/yarn/logs,file:///HDATA/10/yarn/logs,file:///HDATA/9/yarn/logs,file:///HDATA/8/yarn/logs,file:///HDATA/7/yarn/logs,file:///HDATA/6/yarn/logs,file:///HDATA/5/yarn/logs,file:///HDATA/4/yarn/logs,file:///HDATA/3/yarn/logs,file:///HDATA/2/yarn/logs,file:///HDATA/1/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
<!-- tuning -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>120000</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
<!-- tuning yarn container -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>1209600</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
</configuration>
五、mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>CNSZ17PL1525:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>CNSZ17PL1525:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<!-- tuning mapreduce -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>5120</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx4096m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>13312</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx10649m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.jobhistory.max-age-ms</name>
<value>1296000000</value>
<source>mapred-default.xml</source>
</property>
<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>200000</value>
<source>mapred-default.xml</source>
</property>
</configuration>
5个配置文件配完就ok了。其中的参数意思会有专门帖子讲,现在分发到每台机器上,执行脚本新建运行的时候的data log等需要的目录, dir.sh (每台机器都执行。改成上述配置文件的hadoop2.7.3安装包也都每台机器都分发。每台机器的环境变量都需要增加。)
环境变量类似:
export ZOOKEEPER_HOME=/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/app/jdk
export JRE_HOME=/app/jdk/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME
(dir,sh 脚本)
#!/bin/bash
groupadd hadoop
groupadd hdfs
useradd -g hdfs -G hadoop hdfs
echo "sf#123456" | passwd --stdin hdfs
for i in 1 2 3 4 5 6 7 8 9 10 11
do
datadir=/HDATA/$i/dfs/local
mrdir=/HDATA/$i/mapred/local
yarndir=/HDATA/$i/yarn/local
yarnlog=/HDATA/$i/yarn/logs
mkdir -p $datadir
mkdir -p $mrdir
mkdir -p $yarndir
mkdir -p $yarnlog
echo "$datadir $mrdir $yarndir $yarnlog make over and chown hdfs:hadoop"
chown hdfs:hadoop -R $datadir $mrdir
chown yarn:yarn -R $yarndir $yarnlog
done
#log
mkdir -p /data/dfs/nn/local
chown hdfs:hadoop /data/dfs/nn/local
mkdir -p /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs
chown hdfs:hadoop /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs
最后将hadoop的应用目录赋给hdfs:hadoop
然后启动过程:
1. 启动 ZooKeeper 集群
在集群中安装 ZooKeeper 的主机上启动 ZooKeeper 服务。在本教程中也就是在 slave51、slave52、slave53 的主机上启动相应进程。分别登陆到三台机子上执行:
zookeeper的启动在每台zookeeper节点执行这句命令
zkServer.sh start
2. 格式化 ZooKeeper 集群
在任意的 namenode 上都可以执行,笔者还是选择了 master1 主机执行格式化命令(namenode1上执行)
hdfs zkfc -formatZK
3. 启动 JournalNode 集群
分别在 slave1、slave2、slave3 上执行以下命令(所有的journal节点)
hadoop-daemon.sh start journalnode
4. 格式化集群的 NameNode
在 master1 的主机上执行以下命令,以格式化 namenode:(namenode1节点执行)
hdfs namenode -format
5. 启动刚格式化的 NameNode
刚在 master1 上格式化了 namenode ,故就在 master1上执行(namenode1节点执行)
hadoop-daemon.sh start namenode
6. 同步 NameNode1 元数据到 NameNode2 上
复制你 NameNode 上的元数据目录到另一个 NameNode,也就是此处的 master5 复制元数据到 master52 上。在 master52 上执行以下命令:(namenode2节点执行)
hdfs namenode -bootstrapStandby
7. 启动 NameNode2
master2 主机拷贝了元数据之后,就接着启动 namenode 进程了,执行(namenode2节点执行)
hadoop-daemon.sh start namenode
8. 启动集群中所有的DataNode(所有datanode节点执行)
hadoop-daemon.sh start datanode
9. 启动 ZKFC
在 master1 和 master2 的主机上分别执行如下命令:((namenode1节点执行)&&(namenode2节点执行))
hadoop-daemon.sh start zkfc
10. 开启历史日志服务
在 master1和 master2 的主机上执行((namenode1节点执行)&&(namenode2节点执行))
mr-jobhistory-daemon.sh start historyserver
11. 在 RM1 启动 YARN
在 master1的主机上执行以下命令:((namenode1节点执行))
yarn-daemon.sh start resourcemanager
12. 在 RM2 单独启动 YARN
虽然上一步启动了 YARN ,但是在 master2 上是没有相应的 ResourceManager 进程,故需要在 master2 主机上单独启动:(namenode2节点执行)
yarn-daemon.sh start resourcemanager
13.启动所有datanode 的 nodemanager(所有datanode节点)
yarn-daemon.sh start nodemanager
Hadoop生产环境配置文件的更多相关文章
- Hadoop生产环境搭建(含HA、Federation)
Hadoop生产环境搭建 1. 将安装包hadoop-2.x.x.tar.gz存放到某一目录下,并解压. 2. 修改解压后的目录中的文件夹etc/hadoop下的配置文件(若文件不存在,自己创建.) ...
- SpringBoot yml 配置 多配置文件,开发环境,生产环境配置文件分开
原文地址:https://www.cnblogs.com/baoyi/p/SpringBoot_YML.html 1. 在 spring boot 中,有两种配置文件,一种是application.p ...
- 转 通过 spring 容器内建的 profile 功能实现开发环境、测试环境、生产环境配置自动切换
软件开发的一般流程为工程师开发 -> 测试 -> 上线,因此就涉及到三个不同的环境,开发环境.测试环境以及生产环境,通常 ...
- webpack(7)-生产环境
development(开发环境) 和 production(生产环境) 这两个环境下的构建目标存在着巨大差异.在开发环境中,我们需要:强大的 source map 和一个有着 live reload ...
- Spring.profile配合Jenkins发布War包,实现开发、测试和生产环境的按需切换
前两篇不错 Spring.profile实现开发.测试和生产环境的配置和切换 - Strugglion - 博客园https://www.cnblogs.com/strugglion/p/709102 ...
- 一种简单的生产环境部署Node.js程序方法
最近在部署Node.js程序时,写了段简单的脚本,发觉还挺简单的,忍不住想与大家分享. 配置文件 首先,本地测试环境和生产环境的数据库连接这些配置信息是不一样的,需要将其分开为两个文件存储 到conf ...
- spring boot区分生产环境和开发环境
回顾一下spring boot使用基础,做个笔记. 通过配置文件,设置项目的开发环境和生成环境. 项目目录结构: application-dev.yml是开发环境配置文件,application-pr ...
- Spring.profile实现开发、测试和生产环境的配置和切换
软件开发过程一般涉及“开发 -> 测试 -> 部署上线”多个阶段,每个阶段的环境的配置参数会有不同,如数据源,文件路径等.为避免每次切换环境时都要进行参数配置等繁琐的操作,可以通过spri ...
- 使用Asset Pipeline管理rails生产环境静态资源实现步骤
1. 修改项目中指向静态资源文件的链接 a) 访问静态资源文件 <%= stylesheet_link_tag "application", media: &q ...
随机推荐
- Go语言的接口
一.接口的定义和好处 我们都知道接口给类提供了一种多态的机制,什么是多态,多态就是系统根据类型的具体实现完成不同的行为. 以下代码简单说明了接口的作用 package main import ( &q ...
- Android Spinner 绑定键值对
这里给大家提供下绑定 spinner键值对的方法. 首先创建绑定模型BaseItem public class BaseItem { public BaseItem(Integer id,String ...
- Antenna Placement POJ - 3020 二分图匹配 匈牙利 拆点建图 最小路径覆盖
题意:图没什么用 给出一个地图 地图上有 点 一次可以覆盖2个连续 的点( 左右 或者 上下表示连续)问最少几条边可以使得每个点都被覆盖 最小路径覆盖 最小路径覆盖=|G|-最大匹配数 ...
- hbase系列
jvmhttps://www.cnblogs.com/jiyukai/p/6665199.html hbase https://blog.csdn.net/lizhitao/article/detai ...
- 【UOJ#236】[IOI2016]railroad(欧拉回路,最小生成树)
[UOJ#236][IOI2016]railroad(欧拉回路,最小生成树) 题面 UOJ 题解 把速度看成点,给定的路段看成边,那么现在就有了若干边,然后现在要补上若干边,以及一条\([inf,\) ...
- sql语句循环截取字符串
测试环境 : mssql2016 express 需求 : 拆分字符串执行insert 思路 : 在循环中截取分隔符之间的字符串.起止点位置计算 起点从0开始startIndex,查找第一个分隔 ...
- nginx.conf(centos7 1.14)主配置文件修改
#nginx1.14 centos7# For more information on configuration, see:# * Official English Documentation: h ...
- smtp
新闻系统的定时通知初步有三种实用方式,1.短信 2.邮箱 3.微信 短信就不得不使用第三方平台,虽说5分一条,但耐不住量大,一天1000条的话,50元也是一笔不小的支出. 这时,邮箱和微信的优势就体现 ...
- 洛谷P1020 导弹拦截
n²谁都会打,不说了. 这里讨论一下nlogn算法(单调不减): 首先开始考虑单调性,我习惯性的以为是单调队列/栈优化的那个套路,想要找到一个跟下标有关的单调性却发现没有. 例如:我想过当下标增加时f ...
- 没有上司的舞会 codevs 1380
上树DP,记忆化搜索. 本题老师讲的方法是直接树形DP,但是由于我对树并不够了解,什么dfs也不想尝试(虽然感觉自己可以搞),于是搞了个结构体存点以及该点的信息,用f[i][j]作为记忆化数组.以后最 ...