I will talk the main steps to install CDH 5.15 on Linux(CENT OS 6.10).  The installation method is Manual installation with tarball and parcels. The newest version if CDH 6.0.1 when I write this guide but it does not support CENT OS 6.10 so I have to install 5.15.

Software components List:

  • CDH, the parcel. CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel
  • CM: cloudery manager. cloudera-manager-el6-cm5.15.1_x86_64.tar.gz
  • mysql 5.7.73 and jdbc driver: mysql-connector-java-8.0.11.jar
  • jvm: jdk-8u181-linux-x64.rpm

Here is how to get them:

JVM:
wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.rpm?AuthParam=1539265359_30b3a4b9e17f3ed3b0962980168c2721 CDH:
wget http://archive.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.15.1_x86_64.tar.gz
wget http://archive.cloudera.com/cdh5/parcels/latest/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel
wget http://archive.cloudera.com/cdh5/parcels/latest/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel.sha1
wget http://archive.cloudera.com/cdh5/parcels/5.15.1.4/manifest.json MySQL Yum Repo:
wget https://repo.mysql.com//mysql57-community-release-el6-11.noarch.rpm

Hareware:

  • ....more powerful, more better...

Machine/OS Setup:

  • Making sure the time is sychronized across machines(Configure ntp if needed).
  • Network: change hostname you like. I have 4 servers ns01, ns02, ns03, ns04. Add them into the /etc/hosts file like below. Making sure you have high bandwidth accross the nodes. Mine is 210MB/Sec.

    192.168.0.79 ns04
    192.168.0.77 ns02
    192.168.0.232 ns01
    192.168.0.114 ns03

  • Tune swap(each node):
    vim /etc/sysctl.conf
    vm.swappiness = 10

  • Disable security Linux.  SELINUX=disabled in /etc/selinux/config
  • Turn off firewall(each node):
    service iptables stop.
    chkconfig iptables off
  • SSH without pasword:
    1. run ssh-keygen -t rsa in each node. type Enter when prompted.
    2. run cp .ssh/id_rsa.pub .ssh/pub_key.ns0$n in each node to save the public key to a seperate file. $n is the number of the machine.
    3. copy those files to one of the machine like ns01.  
    4. run cat pub_key.ns0* >> authorized_keys in ns01
    5. boardcase the file to all other machines.

JVM setup(each node)

......do it in each machine. Please use the rpm distribution to install(rpm -ivh jdk*.rpm) and set the JAVA_HOME to /usr/java/latest. I got an error like 'deploy client configuration for spark'  during setup using the jdk*.tar.gz distribution.

MySQL Installation/Setup(one node, ns01)

  • Install and init mysql:
    https://opensourcedbms.com/dbms/installing-mysql-5-7-on-centosredhatfedora/
  • Sql to create databases/users for Hadoop
    create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    create database activity DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    GRANT ALL PRIVILEGES ON hive.* to 'hive'@'%' identified by 'hive' with grant option ;
    flush privileges ;
    GRANT ALL PRIVILEGES ON oozie.* to 'oozie'@'%' identified by 'oozie' with grant option ;
    flush privileges ;
    GRANT ALL PRIVILEGES ON hue.* to 'hue'@'%' identified by 'hue' with grant option ;
    flush privileges ;
    GRANT ALL PRIVILEGES ON activity.* to 'activity'@'%' identified by 'activity' with grant option ;
    flush privileges ;

CDH Install

  • untar the cm to target folder /opt (ns01 node). 
    tar -xzf cloudera-manager-el6-cm5.15.1_x86_64.tar.gz -C /opt
  • copy mysql jdbc driver to avoid the 'java.lang.ClassNotFoundException: com.mysql.jdbc.Driver' (ns01 node)
    cp mysql-connector-java-8.0.11.jar /opt/cm-5.15.1/share/cmf/lib/
    cp mysql-connector-java-8.0.11.jar /opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/hive/lib/   
    ........................................(cp mysql driver to the proper location for other components need mysql connection to create and init database,tables......)
  • Creat Cloudera Manager DB(ns01 node)
    (Run in MySQL)
    GRANT ALL PRIVILEGES ON scm.* to 'scm'@'%' identified by 'scm' with grant option ;
    (Run in Shell)
    /opt/cm-5.15.1/share/cmf/schema/scm_prepare_database.sh mysql -hlocalhost -uroot -p*** --scm-host ns01 scm scm scm
  • copy the hadoop parcel to the parcel-repo folder. rename (mv) the *.sha1 file to .sha . See below result of the folder.(ns01 node)
  • update the cloudera manager host server so the agents can connect to it.(ns01 node)
    vim /opt/cm-5.15.1/etc/cloudera-scm-agent/config.ini
  • copy cloudera manager to other machines as well. You can tar the folder /opt/cm-5.15.1 and scp to other machines.
  • Add cloudera-scm User ( each node )
    useradd --system --home /opt/cm-5.15.1/run/cloudera-scm-server --no-create-home --shell=/bin/false -comment "Cloudera SCM User" cloudera-scm
  • Reboot all machines before start cm. I write a simple script rebootAll.sh to do it.
  • Run the service and configuration manager(scm) on server node and agent nodes.  You can also run the agent on the server node so you will have one more node to install Hadoop.
    /opt/cm-5.15.1/etc/init.d/cloudera-scm-server start  (run  in only 1 node)
    /opt/cm-5.15.1/etc/init.d/cloudera-scm-agent start  (run in all nodes)
  • launch the CM by: http://ns01:7180 from you browser
  • Wait several seconds if needed. If you can see the login page(credential: admin/admin), there should be no big problem. I will paste some of the UI here.
    hosts list, you can see I have 4 nodes.

    The repository you have configured.

    Assign the roles for node.

CDH/Hadoop 5.15 installation steps的更多相关文章

  1. 使用yum安装CDH Hadoop集群

    使用yum安装CDH Hadoop集群 2013.04.06 Update: 2014.07.21 添加 lzo 的安装 2014.05.20 修改cdh4为cdh5进行安装. 2014.10.22  ...

  2. CDH hadoop的安装

    1 先拷贝tar包到目录底下(tar 包解压 tar zxvf) 2 : 1.使用课程提供的hadoop-2.5.0-cdh5.3.6.tar.gz,上传到虚拟机的/usr/local目录下.(htt ...

  3. [大牛翻译系列]Hadoop(15)MapReduce 性能调优:优化MapReduce的用户JAVA代码

    6.4.5 优化MapReduce用户JAVA代码 MapReduce执行代码的方式和普通JAVA应用不同.这是由于MapReduce框架为了能够高效地处理海量数据,需要成百万次调用map和reduc ...

  4. CDH 5.16.1 离线部署 & 通过 CDH 部署 Hadoop 服务

    参考 Cloudera Enterprise 5.16.x Installing Cloudera Manager, CDH, and Managed Services Installation Pa ...

  5. hadoop再次集群搭建(4)-Cloudera Manager Installation

       决定选择 Cloudera Manager 进行安装,阅读官方文档,掌握大概脉络.         Cloudera Manager在集群已经实现ssh免秘钥登录,能够访问网络资源和本地资源的情 ...

  6. Mac OS X上搭建伪分布式CDH版本Hadoop开发环境

    最近在研究数据挖掘相关的东西,在本地 Mac 环境搭建了一套伪分布式的 hadoop 开发环境,采用CDH发行版本,省时省心. 参考来源 How-to: Install CDH on Mac OSX ...

  7. Install RHadoop with Hadoop 2.2 – Red Hat Linux

    Prerequisite Hadoop 2.2 has been installed (and the below installation steps should be applied on ea ...

  8. A record--Offline deployment of Big Data Platform CDH Cluster

    A record--Offline deployment of Big Data Platform CDH Cluster Tags: Cloudera-Manager CDH Hadoop Depl ...

  9. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

随机推荐

  1. Linux -- cal/bc/LANGE与帮助文档

    cal 显示日历命令 使用cal命令,显示日历 cal [month] [year] 1.显示当前的日历 [root@localhost ~]# cal 一月 日 一 二 三 四 五 六 2.显示指定 ...

  2. 字符编码ascii、unicode、utf-­‐8、gbk 的关系

    ASIIC码: 计算机是美国人发明和最早使用的,他们为了解决计算机处理字符串的问题,就将数字字母和一些常用的符号做成了一套编码,这个编码就是ASIIC码.ASIIC码包括数字大小写字母和常用符号,一共 ...

  3. GoogleHacking相关技巧

    转自https://www.cnblogs.com/anka9080/p/googlehack.html 0x 01 GoogleHack语法 Site 指定域名 Intext 正文中出现关键字的网页 ...

  4. 10.安装使用jenkins及其插件

    持续集成 1.安装jenkins 安装依赖 [root@git ~]# yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel rpm包下载: ...

  5. cut 的用法

    cut 文件内容查看 显示行中的指定部分,删除文件中指定字段 显示文件的内容,类似于下的type命令. 说明 该命令有两项功能,其一是用来显示文件的内容,它依次读取由参数file所指明的文件,将它们的 ...

  6. mysql8.0.11的坑早知道

    1.plugin caching_sha2_password could not be loaded 我在mac上用Sequel Pro连数据库的时候,会报出以上错误,这是应为8.0.11把身份认证插 ...

  7. C/C++中的malloc、calloc和realloc

    1. malloc 原型:extern void *malloc(unsigned int num_bytes); 头文件:Visual C++6.0中可以用malloc.h或者stdlib.h 功能 ...

  8. Spark RDD API详解之:Map和Reduce

    RDD是什么? RDD是Spark中的抽象数据结构类型,任何数据在Spark中都被表示为RDD.从编程的角度来看, RDD可以简单看成是一个数组.和普通数组的区别是,RDD中的数据是分区存储的,这样不 ...

  9. H5测试(转载)

    可能有些朋友不明白啥是H5,但其实生活中我们经常会碰到. 比如,你经常收到的朋友虐狗第一式—结婚请贴. 你的朋友圈,可能会经常看到宝妈们虐狗第二式—晒可爱宝宝的相册. 你有可能也收到过这样,非常直观, ...

  10. python学习笔记:第7天 深浅拷贝

    目录 1. 基础数据类型补充 2. set集合 3. 深浅拷贝 1. 基础数据类型补充 (1)join方法 join方法是把一个列表中的数据进行拼接,拼接成字符串(与split方法相反,split方法 ...