hadoop Native Shared Libraries 使得Hadoop可以使用多种压缩编码算法,来提高数据的io处理性能。不同的压缩库需要依赖到很多Linux本地共享库文件,社区提供的二进制安装包,默认没有支持snappy这样在生产中最常用的压缩格式。而且社区没有提供64位的二进制安装包,在生产环境中一般都是x86 64位服务器,所以需要自己编译部署包。根据公司情况有一些修改的分支基础构建二进制安装包/rpm包。

我今天介绍,源码编译Hadoop支持多种Native Shared Libraries,编译完成默认支持所有native libraries。

文件压缩主要有两方面的好处:一方面节省文件存储空间;另一方面加速网络数据传输或磁盘读写。当处理大规模的数据时这些效果提升更加明显,因此我们需要仔细斟酌压缩在Hadoop环境下的使用,不同的Hadoop分析引擎对不同的数据压缩和文件格式有不同的性能特点,
请参考:Hadoop列式存储引擎Parquet/ORC和snappy压缩

Components

  • The native hadoop library includes various components:

    • Compression Codecs (bzip2, lz4, snappy, zlib)

    • Native IO utilities for HDFS Short-Circuit Local Reads and Centralized Cache Management in HDFS

    • CRC32 checksum implementation

Requirements:

* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Jansson C XML parsing library ( if compiling libwebhdfs )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

Installing required

  • Installing required packages for clean install of Centos 6.x Desktop:

    Requirements: gcc c++, autoconf, automake, libtool, Java 6,JAVA_HOME set, Maven 3

  • Native libraries

yum -y gcc c++ install lzo-devel zlib zlib-devel autoconf automake libtool cmake

yum -y install  svn   ncurses-devel 

yum install openssl-devel

yum install ant -y

yum install cmake -y
  • Maven && JDK 1.7
# tail -6 ~/.bash_profile
export JAVA_HOME=/opt/jdk
export MAVEN_HOME=/opt/maven export PATH=.:$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # source ~/.bash_profile # mvn -v
Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T19:57:37+08:00)
Maven home: /opt/maven
Java version: 1.7.0_45, vendor: Oracle Corporation
Java home: /opt/jdk1.7.0_45/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"
  • ProtocolBuffer 2.5.0 (required) (见另一篇文章)
$ tar -zxvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0/
./configure
make
make install $ protoc --version
libprotoc 2.5.0
  • Optional packages

Snappy compression

yum -y install snappy snappy-devel

Bzip2

yum install bzip2 bzip2-devel bzip2-libs -y

Linux FUSE

yum install fuse fuse-devel fuse-libs

snappy link

$ ls -lh /usr/lib64/ |grep snappy
lrwxrwxrwx 1 root root 18 Jul 3 21:45 libsnappy.so -> libsnappy.so.1.1.4
lrwxrwxrwx. 1 root root 18 Feb 12 03:08 libsnappy.so.1 -> libsnappy.so.1.1.4
-rwxr-xr-x 1 root root 22K Nov 23 2013 libsnappy.so.1.1.4 ln -sf /usr/lib64/libsnappy.so.1.1.4 /usr/local/lib
ln -sf /usr/lib64/libsnappy.so /usr/local/lib
ln -sf /usr/lib64/libsnappy.so.1 /usr/local/lib $ ls -lh /usr/local/lib|grep snappy
lrwxrwxrwx 1 root root 23 Jul 9 18:47 libsnappy.so -> /usr/lib64/libsnappy.so
lrwxrwxrwx 1 root root 25 Jul 9 18:47 libsnappy.so.1 -> /usr/lib64/libsnappy.so.1
lrwxrwxrwx 1 root root 29 Jul 9 18:48 libsnappy.so.1.1.4 -> /usr/lib64/libsnappy.so.1.1.4

Building Hadoop in Snappy build

$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz

$ tar -zxf hadoop-2.7.2-src.tar.gz

building Hadoop

mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=/usr/local/lib -Dbundle.snappy

building Hadoop Info

main:
[exec] $ tar cf hadoop-2.7.2.tar hadoop-2.7.2
[exec] $ gzip -f hadoop-2.7.2.tar
[exec]
[exec] Hadoop dist tar available at: /opt/hadoop-2.7.2-src/hadoop-dist/target/hadoop-2.7.2.tar.gz
[exec]
[INFO] Executed tasks
[INFO]
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /opt/hadoop-2.7.2-src/hadoop-dist/target/hadoop-dist-2.7.2-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 4.046 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 3.147 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 6.948 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.382 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 2.768 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 4.699 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 5.243 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 7.649 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 4.101 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [02:50 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 9.580 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 18.977 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.037 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:24 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 24.689 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 8.890 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 6.085 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.684 s]
[INFO] hadoop-yarn ........................................ SUCCESS [ 0.209 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:27 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 48.704 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [ 0.150 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 18.128 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 28.898 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 5.899 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 13.665 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 31.929 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 8.038 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [ 12.032 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [ 4.674 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.056 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 3.816 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 2.328 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [ 0.029 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [ 8.030 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [ 4.697 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.101 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 38.677 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 25.883 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 6.625 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 15.826 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 9.873 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 8.766 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 2.425 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 9.849 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [ 3.369 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 6.783 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 12.602 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 2.728 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 9.200 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 6.542 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 5.351 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 3.234 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 4.533 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 9.475 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 6.788 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [09:42 min]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 41.264 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 14.762 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.065 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 6.579 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 16.842 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.026 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 45.423 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 29:44 min
[INFO] Finished at: 2016-07-10T00:28:58+08:00
[INFO] Final Memory: 148M/667M
[INFO] ------------------------------------------------------------------------

Hadoop Native checknative

# hadoop checknative -a

16/07/10 00:37:01 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/07/10 00:37:01 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/hadoop-2.7.2/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /opt/hadoop-2.7.2/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so

Deploy hadoop 2.7.2

参考Hadoop部署安装文档。

[root@bigdata-server-1 cloud]# hadoop checknative -a
16/07/10 18:39:48 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/07/10 18:39:48 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/cloud/hadoop-2.7.2/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /opt/cloud/hadoop-2.7.2/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so

编译Hadoop 2.7.2支持压缩 转的更多相关文章

  1. CentOS7 下编译 Hadoop

    准备工作 下载 Hadoop 源码 Source (当前最新 2.9.2) https://hadoop.apache.org/releases.html 打开压缩包会看到 BUILDING.txt ...

  2. Hadoop、Hive【LZO压缩配置和使用】

    目录 一.编译 二.相关配置 三.为LZO文件创建索引 四.Hive为LZO文件建立索引 1.hive创建的lzo压缩的分区表 2.给.lzo压缩文件建立索引index 3.读取Lzo文件的注意事项( ...

  3. [转] - hadoop中使用lzo的压缩

    在hadoop中使用lzo的压缩算法可以减小数据的大小和数据的磁盘读写时间,不仅如此,lzo是基于block分块的,这样他就允许数据被分解成chunk,并行的被hadoop处理.这样的特点,就可以让l ...

  4. CentOS 6.4 64位 源码编译hadoop 2.2.0

    搭建环境:Centos 6.4 64bit 1.安装JDK 参考这里2.安装mavenmaven官方下载地址,可以选择源码编码安装,这里就直接下载编译好的wget http://mirror.bit. ...

  5. 利用Docker编译Hadoop 3.1.0

    前言 为什么要使用Docker编译,请自行百度 操作系统环境:Centos 6.8 uname -r 内核版本:2.6.32-642.el6.x86_64 除非有把握否则不要在Centos6.8中直接 ...

  6. 自己动手一步一步安装hadoop(含编译hadoop的native本地包)

    近期项目须要用到hadoop.边学习边应用,第一步无疑是安装hadoop.我安装的是hadoop-2.4.1.以下是具体步骤,做备忘以后查看 一.下载依赖软件 1.java hadoop官网说明仅仅支 ...

  7. 在Ubuntu X64上编译Hadoop

    在之前的文章中介绍了如何直接在Ubuntu中安装Hadoop.但是对于64位的Ubuntu来说,官方给出的Hadoop包是32位的,运行时会得到警告: WARN util.NativeCodeLoad ...

  8. 编译Hadoop

    Apache Hadoop 生态圈软件下载地址:http://archive.apache.org/dist/hadoop/hadoop下载地址 http://archive.apache.org/d ...

  9. mac OS X Yosemite 上编译hadoop 2.6.0/2.7.0及TEZ 0.5.2/0.7.0 注意事项

    1.jdk 1.7问题 hadoop 2.7.0必须要求jdk 1.7.0,而oracle官网已经声明,jdk 1.7 以后不准备再提供更新了,所以趁现在还能下载,赶紧去down一个mac版吧 htt ...

随机推荐

  1. Spring MVC 使用介绍(一)—— 概述

    一.Web MVC简介 1.经典的MVC架构 存在的问题:1.控制器负责流程控制.请求数据整理与校验.模型与视图选择等功能,过于复杂.2.模型层没有进行分层设计 2.改进的MVC设计 1)控制器功能拆 ...

  2. 使用binlog,实现MySQL数据恢复

    mysql的binlog日志,用于记录数据库的增.删.改等修改操作,默认处于关闭状态.使用binlog实现数据恢复的条件为 1.binlog日志功能已开启 2.若binlog在数据库创建一段时候后开启 ...

  3. Android Studio 导致C盘过大

    转载:http://blog.csdn.net/u010794180/article/details/48004415 这是一个可看可不看的文章,不可看是对与那些 C盘 容量不紧张的人而言:而我是属于 ...

  4. 洛谷P1107[BJWC2008]雷涛的小猫题解

    题目 这个题可以说是一个很基础偏中等的\(DP\)了,很像\(NOIpD1T2\)的难度,所以这个题是很好想的. 简化题意 可以先简化一下题意,这个题由于从上面向下调和从下向上爬都是一样的,所以我们就 ...

  5. Marriage Match II HDU - 3081(二分权值建边)

    题意: 有编号为1~n的女生和1~n的男生配对 首先输入m组,a,b表示编号为a的女生没有和编号为b的男生吵过架 然后输入f组,c,d表示编号为c的女生和编号为d的女生是朋友 进行配对的要求满足其一即 ...

  6. 【BZOJ1011】【HNOI2008】遥远的行星 误差分析

    题目大意 给你\(n,b\),还有一个数列\(a\). 对于每个\(i\)求\(f_i=\sum_{j=1}^{bi}\frac{a_ja_i}{i-j}\). 绝对误差不超过\(5\%\)就算对. ...

  7. wstngfw IKEv2服务器配置示例

    wstngfw IKEv2服务器配置示例 移动客户端的服务器配置有几个组件: 为***创建一个证书结构 配置IPsec移动客户端设置 为客户端连接创建阶段1和阶段2 添加IPsec防火墙规则 创建** ...

  8. tp5命令行基础介绍

    查看指令 生成模块 生成文件 生成类库映射文件 生成路由缓存文件 生成数据表字段缓存文件 指令扩展示例 命令行调试 命令行颜色支持 调用命令 查看指令 命令行工具需要在命令行下面执行,请先确保你的ph ...

  9. JXOI2017颜色

    题面 loj 分析 这道题非常妙啊 对于可保留区间[l, r] 枚举右端点r 考虑l的取值范围有两重约数 记颜色i出现的最右侧位置是\(max_i\) 最左侧位置是\(min_i\) r前最后一次出现 ...

  10. Hdoj 2050.折线分割平面 题解

    Problem Description 我们看到过很多直线分割平面的题目,今天的这个题目稍微有些变化,我们要求的是n条折线分割平面的最大数目.比如,一条折线可以将平面分成两部分,两条折线最多可以将平面 ...