There are a lot of Hadoop related projects which are open sourced and widely used by many componies. This article will go through the installations of them.

Install JDK

Install Hadoop

Install Hbase

Install Hive

Install Spark

Install Impala

Install Sqoop

Install Alluxio

Install JDK

Step 1: download package from offical site, and choose appropriate version.

Step 2: unzip the package and copy to destination folder

tar zxf jdk-8u111-linux-x64.tar.gz

cp -R jdk1.8.0_111/* /usr/share

Step 3: setting PATH and JAVA_HOME

vi ~/.bashrc

export JAVA_HOME=/usr/share/jdk1.8.0_111
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

source ~/.bashrc

Step 4: reboot server to make the changes take effect

Step 5: check java version

java -version

javac -version

Install Hadoop

Follow below steps to install Hadoop in standalone mode.

Step 1: download package from apache site

Step 2: unzip the package and copy to destination folder

tar zxf hadoop-2.7.3.tar.gz

cp -R hadoop-2.7.3/* /usr/share/hadoop

Step 3: create 'hadoop' fiolder under 'home'

mkdir /home/hadoop

Step 4: set PATH and HADOOP_HOME

vi ~/.bashrc

export HADOOP_HOME=/usr/share/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME

source ~/.bashrc

Step 5: check hadoop version

hadoop version

Step 6: config hadoop hdfs, core site, yarn and map-reduce

cd $HADOOP_HOME/etc/hadoop

vi hadoop-env.sh

export JAVA_HOME=/usr/share/jdk1..0_111

vi core-site.xml

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

vi hdfs-site.xml

<property>
<name>dfs.replication</name>
<value></value>
</property> <property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode</value>
</property> <property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode</value>
</property>

vi yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

cp mapred-site.xml.template mapred-site.xml

vi mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Step 7: initialize hadoop namenode

hdfs namenode -format

Step 8: start hadoop

start-dfs.sh

start-yarn.sh

Step 9: check hadoop site to see if it works

http://localhost:50070/

http://localhost:8088/

Install HBase

Follow below steps to install HBase in standalone mode.

Step 1: check if Hadoop installed

hadoop version

Step 2: download version 1.2.4 of hbase from apache site

Step3: unzip package and copy to destination folder

tar zxf hbase-1.2.4-bin.tar.gz

cp -R hbase-1.2.4-bin/* /usr/share/base

Step 4: configure hbase env

cd /usr/shar/hbase/conf

vi hbase-env.sh

export JAVA_HOME=/usr/share/jdk1..0_111

Step 5: modify hbase-site.xml

vi hbase-site.xml

<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property> //Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>

Step 6: start hbase and check hbase directory in hdfs

cd /usr/share/hbase/bin

start-hbase.sh

hadoop fs -ls /hbase

Step 7: check hbase via web interface

http://localhost:60010

Install Hive

Step 1: download version 1.2.1 of hive from apache site

Step 2: unzip the package and copy to destination folder

tar zxf apache-hive-1.2.1-bin.tar.gz

cp -R apache-hive-1.2.1-bin/* /usr/share/hive

Step 3: set HIVE_HOME

vi ~/.bashrc

export HIVE_HOME=/usr/share/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/share/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/share/hive/lib/*:.

source ~/.bashrc

Step 4: configure env for hive

cd $HIVE_HOME/conf

cp hive-env.sh.template hive-env.sh

export HADOOP_HOME-/usr/share/hadoop

Step 5: download version 10.12.1.1 of Apache Derby from apache site

Step 6: unzip derby package and copy to destination folder

tar zxf db-derby-10.12.1.1-bin.tar.gz

cp -R db-derby-10.12.1.1-bin/* /usr/share/derby

Step 7: setup DERBY_HOME

vi ~/.bashrc

export DERBY_HOME=/usr/local/derby
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar

source ~/.bashrc

Step 8: create a directory to store metastore

mkdir $DERBY_HOME/data

Step 9: configure metasore of hive

cd $HIVE_HOME/conf

cp hive-default.xml.template hive-site.xml

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>

Step 10: create a file named jpox.properties and add the following content into it

touch jpox.properties

vi jpox.properties

javax.jdo.PersistenceManagerFactoryClass =

org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine

Step 11: enter into hive shell and execute command 'show tables'

cd $HIVE_HOME/bin

hive

hive> show tables;

Install Spark

Step 1: download version 2.12.0 of scala from scala site

Step 2: unzip the package and copy to destination folder

tar zxf scala-2.12.0.tgz

cp -R scala-2.12.0/* /usr/share/scala

Step 3: set PATH for scala

vi ~/.bashrc

export PATH=$PATH:/usr/share/scala/bin

source ~/.bashrc

Step 4: check scala version

scala -version

Step 5: download version 2.0.2 of spark from apache site

Step 6: unzip the package and copy to destination folder

tar zxf spark-2.0.2-bin-hadoop2.7.tgz

copy spark-2.0.2-bin-hadoop2.7/* /usr/share/spark

Step 7: setup PATH

vi ~/.bashrc

export PATH=$PATH:/usr/share/spark/bin

source ~/.bashrc

Step 8: enter into spark-shell to see if spark is installed successfully

spark-shell

Install Impala

Step 1: download version 2.7.0 of impala from impala site

Step 2: unzip the package and copy to destination folder

tar zxf apache-impala-incubating-2.7.0.tar.gz

cp -R apache-impala-incubating-2.7.0/* /usr/share/impala

Step 3: set PATH and IMPALA_HOME

vi ~/.bashrc

export IMPALA_HOME=/usr/share/impala
export PATH=$PATH:/usr/share/impala

source ~/.bashrc

Step 4: to be continued...

Install Sqoop

Preconditions: should have Hadoop (HDFS and Map-Reduce) installed

Step 1: download version 1.4.6 of sqoop from apache site

Step 2: unzip the package and copy to destination folder

tar zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

cp -R sqoop-1.4.6.bin__hadoop-2.0.4-alpha/* /usr/share/sqoop

Step 3: set SQOOP_HOME and PATH

vi ~/.bashrc

export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

source ~/.bashrc

Step 4: configure sqoop

cd $SQOOP_HOME/conf

mv sqoop-env-template.sh sqoop-env.sh

vi sqoop-env.sh

export HADOOP_COMMON_HOME=/usr/share/hadoop
export HADOOP_MAPRED_HOME=/usr/share/hadoop

Step 5: download version 5.1.40 of mysql-connector-java from site

Step 6: unzip the package and move related jar file into destination folder

$ tar -zxf mysql-connector-java-5.1.40.tar.gz
# cd mysql-connector-java-5.1.40
# mv mysql-connector-java-5.1.40-bin.jar /usr/lib/sqoop/lib

Step 7: verify if sqoop is installed successfully

cd $SQOOP_HOME/bin

sqoop-version

Install Alluxio

Step 1: download version 1.3.0 of alluxio from site

Step 2: unzip the package and move it to destination folder

tar zxf alluxio-1.3.0-hadoop2.7-bin.tar.gz

cp -R alluxio-1.3.0-hadoop2.7-bin/* /usr/share/alluxio

Step 3: create alluxio-env

cd /usr/share/alluxio

bin/alluxio bootstrapConf localhost local

vi conf/alluxio-env.sh

export ALLUXIO_UNDERFS_ADDRESS=/tmp

Step 4: format alluxio file system and start alluxio

cd /usr/share/alluxio

bin/alluxio format

bin/alluxio-start.sh local

Step 5: verify if alluxio is running by visiting http://localhost:19999

Step 6: run predefined tests

cd /usr/share/alluxio

bin/alluxio runTests

Hadoop Ecosytem的更多相关文章

  1. Hadoop 中利用 mapreduce 读写 mysql 数据

    Hadoop 中利用 mapreduce 读写 mysql 数据   有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...

  2. 初识Hadoop、Hive

    2016.10.13 20:28 很久没有写随笔了,自打小宝出生后就没有写过新的文章.数次来到博客园,想开始新的学习历程,总是被各种琐事中断.一方面确实是最近的项目工作比较忙,各个集群频繁地上线加多版 ...

  3. hadoop 2.7.3本地环境运行官方wordcount-基于HDFS

    接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...

  4. hadoop 2.7.3本地环境运行官方wordcount

    hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...

  5. 【Big Data】HADOOP集群的配置(一)

    Hadoop集群的配置(一) 摘要: hadoop集群配置系列文档,是笔者在实验室真机环境实验后整理而得.以便随后工作所需,做以知识整理,另则与博客园朋友分享实验成果,因为笔者在学习初期,也遇到不少问 ...

  6. Hadoop学习之旅二:HDFS

    本文基于Hadoop1.X 概述 分布式文件系统主要用来解决如下几个问题: 读写大文件 加速运算 对于某些体积巨大的文件,比如其大小超过了计算机文件系统所能存放的最大限制或者是其大小甚至超过了计算机整 ...

  7. 程序员必须要知道的Hadoop的一些事实

    程序员必须要知道的Hadoop的一些事实.现如今,Apache Hadoop已经无人不知无人不晓.当年雅虎搜索工程师Doug Cutting开发出这个用以创建分布式计算机环境的开源软...... 1: ...

  8. Hadoop 2.x 生态系统及技术架构图

    一.负责收集数据的工具:Sqoop(关系型数据导入Hadoop)Flume(日志数据导入Hadoop,支持数据源广泛)Kafka(支持数据源有限,但吞吐大) 二.负责存储数据的工具:HBaseMong ...

  9. Hadoop的安装与设置(1)

    在Ubuntu下安装与设置Hadoop的主要过程. 1. 创建Hadoop用户 创建一个用户,用户名为hadoop,在home下创建该用户的主目录,就不详细介绍了. 2. 安装Java环境 下载Lin ...

随机推荐

  1. C# How To Read .xlsx Excel File With 3 Lines Of Code

    Download Excel.zip - 9.7 KB Download ExcelDLL.zip - 3.7 KB Introduction We produce professional busi ...

  2. 【转】Android Android属性动画深入分析

    转载请注明出处:http://blog.csdn.net/singwhatiwanna/article/details/17841165 开篇 像设计模式一样,我们也提出一个问题来引出我们的内容. 问 ...

  3. C# 读取Text文本,写入Text文本

    //读取 private void showMess() { this.dataGridViewX2.Rows.Clear(); //将车辆信息一行行添加到datagreatview 里面 Strea ...

  4. Windows中与系统关联自己开发的程序(默认打开方式、图标、右击菜单等)

    1. 默认打开方式 1.1. 代码支持 在Windows下,某个特定后缀名类型的文件,如果要双击时默认用某个程序(比如自己开发的WinForm程序)打开,代码中首先肯定要支持直接根据这个文件进行下一步 ...

  5. golang subprocess tests

    golang Subprocess tests Sometimes you need to test the behavior of a process, not just a function. f ...

  6. Eclipse下的Hadoop应用开发准备

    window下开发的准备: A.在windows的某个目录下解压一个hadoop的安装包 B.将安装包下的lib和bin目录用对应windows版本平台编译的本地库替换 C.在window系统中配置H ...

  7. 使用jmeter做简单的场景设计

    使用jmeter做简单的场景设计 Jmeter: Apache JMeter是Apache组织开发的基于Java的压力测试工具.用于对软件做压力测试.我之所以选择它,最重要的一点就是----开源 个人 ...

  8. python3使用newspaper快速抓取任何新闻文章正文

    newspaper用于爬取各式各样的新闻网站 1,安装newspaper pip install newspaper3k 2,直接上代码 from newspaper import Article u ...

  9. Centos查看端口占用令

    Centos查看端口占用情况命令,比如查看80端口占用情况使用如下命令: lsof -i tcp:80 列出所有端口 netstat -ntlp 1.开启端口(以80端口为例) 方法一: /sbin/ ...

  10. request payload

    最近在调试代码时发现有Request Payload的情况,从网上查一些文件,也都有较多的描述.下面我只是说明一下大家没有注意的地方 关于HTTP请求,都是通过URL及参数向后台发送数据.主要方式有G ...