在Fedora18上配置个人的Hadoop开发环境

1.    背景

文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。基本的目的是配置文件和日志文件有一个单一的源,

能够用软连接到开发生成的二进制库。这样就能够在所生成二进制库更新的时候维护其它的数据和配置项。

2.    用户案例

1.  比較不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做測试

2.  单一源的配置文件盒日志文件

3.    參考

网页:

http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment

http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

http://wiki.apache.org/hadoop/

http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm

书籍:

Hadoop “TheDefinitive Guide”

4.    免责声明

1.  当前是在使用存在maven依赖的非本地开发步骤,具体信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop

2 . 单节点环境搭建步骤在下边列出

5.    先决条件

1.      配置没有password的ssh

yum install openssh openssh-clients openssh-server

# generate a public/private key, if you don't already have one

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/*

# testing ssh:

ps -ef | grep sshd     # verify sshd is running

ssh localhost          # accept the certification when prompted

sudo passwd root       # Make sure the root has a password

2.        安装其他依赖包

yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++ 

3.        安装java和开发环境

yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*

改动.bashrc文件信息

 export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
 export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"

注意:以上的配置用在F18的OpenJDK7上。能够通过下面命令来測试当前环境配置是否成功。

mvn install -Dmaven.test.failure.ignore=true

6.     搭建“personal-hadoop“

1.        下载编译hadoop

git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests

2.        创建沙盒环境

在这个配置中我们默认到/home/tstclair

cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home

3.        重写你的环境变量

附加下面信息到家文件夹的.bashrc文件里

# Hadoop env override:

export HADOOP_BASE_DIR=${HOME}/personal-hadoop

export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs

export HADOOP_PID_DIR=${HADOOP_BASE_DIR}

export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf

export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home

export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}

export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}

# Yarn env override:

export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}

export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn

#classpath override to search hadoop loc

export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share

#Finally update your PATH

export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}

4.        验证以上步骤

source ~/.bashrc
which hadoop    # verify it should be ${HOME}/personal-hadoop/home/bin  
hadoop -help    # verify classpath is correct.

5.        创建初始化单一源的配置文件

拷贝默认的配置文件

cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf

更新你的hdfs-site.xml文件:

<?

xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Override tstclair with your home directory -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost/</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>file:///home/tstclair/personal-hadoop/name</value>

</property>

<property>

<name>dfs.http.address</name>

<value>0.0.0.0:50070</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:///home/tstclair/personal-hadoop/data</value>

</property>

<property>

<name>dfs.datanode.address</name>

<value>0.0.0.0:50010</value>

</property>

<property>

<name>dfs.datanode.http.address</name>

<value>0.0.0.0:50075</value>

</property>

<property>

<name>dfs.datanode.ipc.address</name>

<value>0.0.0.0:50020</value>

</property>

</configuratio

更新mapred-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Update or append these vars -->

<configuration>

<property>

<name>mapreduce.cluster.temp.dir</name>

<value>

</value>

<description>No description</description>

<final>true</final>

</property>

<property>

<name>mapreduce.cluster.local.dir</name>

<value>

</value>

<description>No description</description>

<final>true</final>

</property>

</configuration>

最后更新yarn-site.xml文件

<?xml version="1.0"?

>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>localhost:8031</value>

<description>host is the hostname of the resource manager and

port is the port on which the NodeManagers contact the Resource Manager.

</description>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>localhost:8030</value>

<description>host is the hostname of the resourcemanager and port is the port

on which the Applications in the cluster talk to the Resource Manager.

</description>

</property>

<property>

<name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>

<description>In case you do not want to use the default scheduler</description>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>localhost:8032</value>

<description>the host is the hostname of the ResourceManager and the port is the port on

which the clients can talk to the Resource Manager. </description>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>

</value>

<description>the local directories used by the nodemanager</description>

</property>

<property>

<name>yarn.nodemanager.address</name>

<value>localhost:8034</value>

<description>the nodemanagers bind to this port</description>

</property>

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>10240</value>

<description>the amount of memory on the NodeManager in GB</description>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

<description>shuffle service that needs to be set for Map Reduce to run </description>

</property>

</configuration>

7.    开启单节点的Hadoop集群

格式化namenode

hadoop namenode -format
#verify output is correct.

开启hdfs:

start-dfs.sh

打开浏览器http://localhost:50070。查看是否有一个节点已经被启动

接下来开启yarn

start-yarn.sh

通过查看日志文件来验证是否正常启动

最后通过执行MapReduce任务来检查Hadoop是否正常执行

cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out

文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/

在Fedora18上配置个人的Hadoop开发环境的更多相关文章

  1. react-native —— 在Mac上配置React Native Android开发环境排坑总结

    配置React Native Android开发环境总结 1.卸载Android Studio,在终端(terminal)执行以下命令: rm -Rf /Applications/Android\ S ...

  2. MAC上配置asp.net core开发环境

    安装.NET Core sdk https://www.microsoft.com/net/core#macos 安装VS Code https://code.visualstudio.com/Dow ...

  3. Mac上配置maven+eclipse+spark开发环境

    1.安装jdk 2.下载scala-ide.官网:http://scala-ide.org 3.安装maven 4.在eclipse中,配置maven的安装了路径.偏好设置--->maven-- ...

  4. 在ubuntu下使用Eclipse搭建Hadoop开发环境

    一.安装准备1.JDK版本:jdk1.7.0(jdk-7-linux-i586.tar.gz)2.hadoop版本:hadoop-1.1.1(hadoop-1.1.1.tar.gz)3.eclipse ...

  5. Mac OS X上搭建伪分布式CDH版本Hadoop开发环境

    最近在研究数据挖掘相关的东西,在本地 Mac 环境搭建了一套伪分布式的 hadoop 开发环境,采用CDH发行版本,省时省心. 参考来源 How-to: Install CDH on Mac OSX ...

  6. Windows 8.0上Eclipse 4.4.0 配置CentOS 6.5 上的Hadoop2.2.0开发环境

    原文地址:http://www.linuxidc.com/Linux/2014-11/109200.htm 图文详解Windows 8.0上Eclipse 4.4.0 配置CentOS 6.5 上的H ...

  7. Hadoop开发环境简介(转)

    1.Hadoop开发环境简介 1.1 Hadoop集群简介 Java版本:jdk-6u31-linux-i586.bin Linux系统:CentOS6.0 Hadoop版本:hadoop-1.0.0 ...

  8. Hadoop开发环境搭建

    hadoop是一个分布式系统基础架构,由Apache基金会所开发. 用户可以在不了解分布式底层细节的情况下,开发分布式程序.充分利用集群的威力高速运算和存储.   Hadoop实现了一个分布式文件系统 ...

  9. 基于Eclipse搭建hadoop开发环境

    一.基础环境准备 1.Eclipse 下载地址:http://pan.baidu.com/s/1slArxAP 2.JDK1.8  下载地址:http://pan.baidu.com/s/1i5iNy ...

随机推荐

  1. 一句话木马和中国菜刀的结合拿webshell

    什么叫做一句话木马:     就是一句简单的脚本语言,一句话木马分为Php,asp,aspx等 中国菜刀:   连接一句话木马的工具 实验的目的:  通过一句话木马来控制我们的服务器,拿到webshe ...

  2. 链表(list)--c实现

    做c的开发有1年多了,期间写过c++,感觉基础不够好,补上去,不丢人.o(^▽^)o to better myself. #include <stdio.h> #include <s ...

  3. 平凡主丛上的Yang-Mills理论

    本文是复旦大学由丁青教授的暑期课程“Yang-Mills理论的几何及其应用”所作笔记,会有少许修正. 所需基础: 多元微积分学 微分方程(常微分方程,数学物理方程) 曲线曲面论(初等微分几何) 以下是 ...

  4. Java基础学习总结(30)——Java 内存溢出问题总结

    Java中OutOfMemoryError(内存溢出)的三种情况及解决办法 相信有一定java开发经验的人或多或少都会遇到OutOfMemoryError的问题,这个问题曾困扰了我很长时间,随着解决各 ...

  5. Python学习第二天-编写三级菜单

    编写三级菜单:1. 运行程序输出第一级菜单2. 选择一级菜单某项,输出二级菜单,同理输出三级菜单3. 菜单数据保存在文件中4. 让用户选择是否要退出5. 有返回上一级菜单的功能 # Author: z ...

  6. servlet3.0 @webfilter 过滤顺序

    Servlet3.0之前Filter过滤的顺序是由用户在web.xml中配置的顺序决定的,如下会先执行encodingFilter,再执行filter1. <filter> <dis ...

  7. Codeforces Round #249 (Div. 2) (模拟)

    C. Cardiogram time limit per test 1 second memory limit per test 256 megabytes input standard input ...

  8. ubuntu 非长期支持版升级系统版本号(ssh登录情况适用)

    (1)当前系统为非长期支持版.而且已被废弃,仅仅能逐版本号升级 以当前系统版本号为11.10为例 改动source.list更新源为通用old源,由于原来的源已经不可用 deb http://old- ...

  9. HDU5053the Sum of Cube(水题)

    HDU5053the Sum of Cube(水题) 题目链接 题目大意:给你L到N的范围,要求你求这个范围内的全部整数的立方和. 解题思路:注意不要用int的数相乘赋值给longlong的数,会溢出 ...

  10. zzulioj--1832--贪吃的松鼠(位运算好题)

    1832: 贪吃的松鼠 Time Limit: 3 Sec  Memory Limit: 2 MB Submit: 43  Solved: 7 SubmitStatusWeb Board Descri ...