在Fedora18上配置个人的Hadoop开发环境
在Fedora18上配置个人的Hadoop开发环境
1. 背景
文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。基本的目的是配置文件和日志文件有一个单一的源,
能够用软连接到开发生成的二进制库。这样就能够在所生成二进制库更新的时候维护其它的数据和配置项。
2. 用户案例
1. 比較不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做測试
2. 单一源的配置文件盒日志文件
3. 參考
网页:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://wiki.apache.org/hadoop/
http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm
书籍:
Hadoop “TheDefinitive Guide”
4. 免责声明
1. 当前是在使用存在maven依赖的非本地开发步骤,具体信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop
2 . 单节点环境搭建步骤在下边列出
5. 先决条件
1. 配置没有password的ssh
yum install openssh openssh-clients openssh-server
# generate a public/private key, if you don't already have one
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/*
# testing ssh:
ps -ef | grep sshd # verify sshd is running
ssh localhost # accept the certification when prompted
sudo passwd root # Make sure the root has a password
2. 安装其他依赖包
yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++3. 安装java和开发环境
yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*改动.bashrc文件信息
export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"注意:以上的配置用在F18的OpenJDK7上。能够通过下面命令来測试当前环境配置是否成功。
mvn install -Dmaven.test.failure.ignore=true
6. 搭建“personal-hadoop“
1. 下载编译hadoop
git clone git://git.apache.org/hadoop-common.gitcd hadoop-commongit checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alphamvn clean package -Pdist -DskipTests2. 创建沙盒环境
在这个配置中我们默认到/home/tstclair
cd ~mkdir personal-hadoopcd personal-hadoopmkdir -p conf data name logs/yarnln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home3. 重写你的环境变量
附加下面信息到家文件夹的.bashrc文件里
# Hadoop env override:
export HADOOP_BASE_DIR=${HOME}/personal-hadoop
export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs
export HADOOP_PID_DIR=${HADOOP_BASE_DIR}
export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf
export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home
export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}
# Yarn env override:
export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}
export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn
#classpath override to search hadoop loc
export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share
#Finally update your PATH
export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}
4. 验证以上步骤
source ~/.bashrcwhich hadoop # verify it should be ${HOME}/personal-hadoop/home/binhadoop -help # verify classpath is correct.5. 创建初始化单一源的配置文件
拷贝默认的配置文件
cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf更新你的hdfs-site.xml文件:
<?
xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Override tstclair with your home directory -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/tstclair/personal-hadoop/name</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/tstclair/personal-hadoop/data</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
</configuratio
更新mapred-site.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Update or append these vars -->
<configuration>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>
</value>
<description>No description</description>
<final>true</final>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>
</value>
<description>No description</description>
<final>true</final>
</property>
</configuration>
最后更新yarn-site.xml文件
<?xml version="1.0"?
>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>
</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>localhost:8034</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
</configuration>
7. 开启单节点的Hadoop集群
格式化namenode
hadoop namenode -format#verify output is correct.开启hdfs:
start-dfs.sh打开浏览器http://localhost:50070。查看是否有一个节点已经被启动
接下来开启yarn
start-yarn.sh通过查看日志文件来验证是否正常启动
最后通过执行MapReduce任务来检查Hadoop是否正常执行
cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreducehadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out
文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/
在Fedora18上配置个人的Hadoop开发环境的更多相关文章
- react-native —— 在Mac上配置React Native Android开发环境排坑总结
配置React Native Android开发环境总结 1.卸载Android Studio,在终端(terminal)执行以下命令: rm -Rf /Applications/Android\ S ...
- MAC上配置asp.net core开发环境
安装.NET Core sdk https://www.microsoft.com/net/core#macos 安装VS Code https://code.visualstudio.com/Dow ...
- Mac上配置maven+eclipse+spark开发环境
1.安装jdk 2.下载scala-ide.官网:http://scala-ide.org 3.安装maven 4.在eclipse中,配置maven的安装了路径.偏好设置--->maven-- ...
- 在ubuntu下使用Eclipse搭建Hadoop开发环境
一.安装准备1.JDK版本:jdk1.7.0(jdk-7-linux-i586.tar.gz)2.hadoop版本:hadoop-1.1.1(hadoop-1.1.1.tar.gz)3.eclipse ...
- Mac OS X上搭建伪分布式CDH版本Hadoop开发环境
最近在研究数据挖掘相关的东西,在本地 Mac 环境搭建了一套伪分布式的 hadoop 开发环境,采用CDH发行版本,省时省心. 参考来源 How-to: Install CDH on Mac OSX ...
- Windows 8.0上Eclipse 4.4.0 配置CentOS 6.5 上的Hadoop2.2.0开发环境
原文地址:http://www.linuxidc.com/Linux/2014-11/109200.htm 图文详解Windows 8.0上Eclipse 4.4.0 配置CentOS 6.5 上的H ...
- Hadoop开发环境简介(转)
1.Hadoop开发环境简介 1.1 Hadoop集群简介 Java版本:jdk-6u31-linux-i586.bin Linux系统:CentOS6.0 Hadoop版本:hadoop-1.0.0 ...
- Hadoop开发环境搭建
hadoop是一个分布式系统基础架构,由Apache基金会所开发. 用户可以在不了解分布式底层细节的情况下,开发分布式程序.充分利用集群的威力高速运算和存储. Hadoop实现了一个分布式文件系统 ...
- 基于Eclipse搭建hadoop开发环境
一.基础环境准备 1.Eclipse 下载地址:http://pan.baidu.com/s/1slArxAP 2.JDK1.8 下载地址:http://pan.baidu.com/s/1i5iNy ...
随机推荐
- Django模版系统
一.什么是模板? 只要是在html里面有模板语法就不是html文件了,这样的文件就叫做模板. 二.模板语法分类 一.模板语法之变量:语法为 {{ }}: 在 Django 模板中遍历复杂数据结构的关键 ...
- SCIP,Clp,Gurobi和Cplex安装
SCIP安装 1.在自己的家目录下建立目录scip,并将获得的压缩包考入该文件夹并解压缩 tar -zxvf scipoptsuite-5.0.0.tgz 2.进入目录scipoptsuite-5.0 ...
- 微信小程序 上传图的功能
首先选择图片,然后循环,再就是在点击发布的时候循环图片地址赋值,包括删除命令 js代码: //选择图片 uploadImgAdd: function(e) { var imgs = this.data ...
- js 函数基础(方便复习使用)
// 函数声明: function bbq(){ // ..... } // 函数表达式: // 1.命名函数表达式 var test = function abc(){ document.write ...
- 破解APK注入代码大揭秘
点此了解详细的APK破解及二次打包过程揭秘: http://t.cn/RzEn7UK [HACK]破解APK并注入自己的代码 会破解是你的本事,但是请不要去干坏事! 使用工具: APKTool 提 ...
- 如何成为一个偷懒又高效的Android开发人员
我敢肯定你对这个标题肯定心存疑惑,但事实就是如此,这个标题完全适合Android开发人员.据我所知, Android程序员不情愿写 findViewById().点击事件监听等重复率较高的代码.那我们 ...
- mvc中使用remote属性来做ajax验证
mvc中使用remote属性来做ajax验证比較easy : [Remote("Action", "Controller", AdditionalFields ...
- 【剑指Offer面试题】 九度OJ1516:调整数组顺序使奇数位于偶数前面
题目链接地址: http://ac.jobdu.com/problem.php?pid=1516 题目1516:调整数组顺序使奇数位于偶数前面 时间限制:1 秒内存限制:128 兆特殊判题:否提交:2 ...
- Python 标准库 —— 队列(Queue,优先队列 PriorityQueue)
优先队列,有别于普通队列的先入先出(虽然字面上还是队列,但其实无论从含义还是实现上,和普通队列都有很大的区别),也有别于栈的先入后出.在实现上,它一般通过堆这一数据结构,而堆其实是一种完全二叉树,它会 ...
- ORM中基于对象查询与基于queryset查询
感谢老男孩~ 一步一步走下去 前面是视图函数 后面是表结构models.py from django.shortcuts import render, HttpResponse from djang ...