[SequenceFile_2] SequenceFile 的基本操作

0. 说明

　　测试序列文件的读写操作 && 测试序列文件的排序操作 && 测试序列文件的合并操作 && 测试序列文件的压缩方式 && 测试将日志文件转换成序列文件

　　作为 Hadoop 序列文件中的 SequenceFile 的基本操作部分的补充存在

1. 测试读写 && 压缩

package hadoop.sequencefile;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;

import org.junit.Test;

import java.io.IOException;

/**

 * 测试序列文件

 */

public class TestSeqFile {

    /**

     * 测试序列文件写操作

     */

    @Test

    public void testWriteSeq() throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

//        Path path = new Path("E:/test/none.seq");

//        Path path = new Path("E:/test/record.seq");

        Path path = new Path("E:/test/block.seq");

        // 不压缩

//        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class,SequenceFile.CompressionType.NONE);

        // 记录压缩

//        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class,SequenceFile.CompressionType.RECORD);

        // 块压缩

        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class, SequenceFile.CompressionType.BLOCK);

        for (int i = 1; i <= 1000; i++) {

            IntWritable key = new IntWritable(i);

            Text value = new Text("helloworld" + i);

            writer.append(key, value);

        }

        writer.close();

    }

    /**

     * 测试序列文件读操作

     */

    @Test

    public void testReadSeq() throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path path = new Path("E:/test/block.seq");

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);

        //初始化两个 Writable 对象

        IntWritable key = new IntWritable();

        Text value = new Text();

        while ((reader.next(key, value))) {

            long position = reader.getPosition();

            System.out.println("key: " + key.get() + " , " + " val: " + value.toString() + " , " + " pos: " + position);

        }

    }

}

2. 测试排序

package hadoop.sequencefile;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;

import org.junit.Test;

import java.util.Random;

/**

 * 测试排序

 */

public class TestSeqFileSort {

    /**

     * 创建无序 key-value 文件

     */

    @Test

    public void testWriteRandom() throws Exception {

        Configuration conf = new Configuration();

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path p = new Path("E:/test/random.seq");

        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, p, IntWritable.class, Text.class, SequenceFile.CompressionType.RECORD);

        // 初始化 random

        Random r = new Random();

        for (int i = 1; i < 100000; i++) {

            // 在0-99999之中随机选取一个值

            int j = r.nextInt(100000);

            IntWritable key = new IntWritable(j);

            Text value = new Text("helloworld" + j);

            writer.append(key, value);

        }

        writer.close();

    }

    /**

     * 测试seqFile排序

     */

    @Test

    public void testSort() throws Exception {

        Configuration conf = new Configuration();

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path pin = new Path("E:/test/random.seq");

        Path pout = new Path("E:/test/sort.seq");

        SequenceFile.Sorter sorter = new SequenceFile.Sorter(fs, IntWritable.class, Text.class, conf);

        sorter.sort(pin, pout);

    }

    /**

     * 测试序列文件读操作

     */

    @Test

    public void testReadSeq() throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path path = new Path("E:/test/sort.seq");

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);

        //初始化两个 Writable 对象

        IntWritable key = new IntWritable();

        Text value = new Text();

        while ((reader.next(key, value))) {

            long position = reader.getPosition();

            System.out.println("key: " + key.get() + " , " + " val: " + value.toString() + " , " + " pos: " + position);

        }

    }

}

3. 测试合并

package hadoop.sequencefile;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;

import org.junit.Test;

/**

 * 测试文件合并，必须是同一种压缩类型

 */

public class TestSeqFileMerge {

    /**

     * 测试序列文件写操作

     * 创建两个文件，范围为1-100，100-200

     */

    @Test

    public void testWriteSeq() throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

//        Path path = new Path("E:/test/block1.seq");

        Path path = new Path("E:/test/block2.seq");

        // 块压缩

        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class, SequenceFile.CompressionType.BLOCK);

//        for (int i = 1; i <= 100; i++) {

        for (int i = 101; i <= 200; i++) {

            IntWritable key = new IntWritable(i);

            Text value = new Text("helloworld" + i);

            writer.append(key, value);

        }

        writer.close();

    }

    /**

     * 测试文件合并，合并的同时排序

     */

    @Test

    public void testMerge() throws Exception {

        Configuration conf = new Configuration();

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path pin1 = new Path("E:/test/block1.seq");

        Path pin2 = new Path("E:/test/block2.seq");

        Path pout = new Path("E:/test/merge.seq");

        SequenceFile.Sorter sorter = new SequenceFile.Sorter(fs, IntWritable.class, Text.class, conf);

        Path[] p = {pin1, pin2};

        sorter.merge(p, pout);

    }

    /**

     * 测试序列文件读操作

     */

    @Test

    public void testReadSeq() throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path path = new Path("E:/test/merge.seq");

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);

        //初始化两个 Writable 对象

        IntWritable key = new IntWritable();

        Text value = new Text();

        while ((reader.next(key, value))) {

            long position = reader.getPosition();

            System.out.println("key: " + key.get() + " , " + " val: " + value.toString() + " , " + " pos: " + position);

        }

    }

}

4. 测试将日志文件转换成序列文件

package hadoop.sequencefile;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.IOException;

/**

 * 测试将日志文件转换成序列文件

 * Windows 下查看压缩后的 SequenceFile :

 * hdfs dfs -text file:///E:/test/access.seq

 */

public class Log2Seq {

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        // 设置文件系统为本地模式

        conf.set("fs.defaultFS", "file:///");

        FileSystem fs = FileSystem.get(conf);

        Path path = new Path("E:/test/access.seq");

        // 不压缩

//        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class,SequenceFile.CompressionType.NONE);

        // 记录压缩

//        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, IntWritable.class, Text.class,SequenceFile.CompressionType.RECORD);

        // 块压缩

        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, NullWritable.class, Text.class, SequenceFile.CompressionType.BLOCK);

        BufferedReader br = new BufferedReader(new FileReader("E:/file/access.log1"));

        String line = null;

        while ((line = br.readLine()) != null) {

            NullWritable key = NullWritable.get();

            Text value = new Text(line);

            writer.append(key, value);

        }

        writer.close();

    }

}

[SequenceFile_2] SequenceFile 的基本操作的更多相关文章

【合集】Hadoop 合集
0. 说明 Hadoop 随笔的目录 1. HDFS 主要内容: [HDFS_1] HDFS 的概念和特性 [HDFS_2] HDFS 的 Shell 操作 [HDFS_3] HDFS 工作机制 [H ...
[SequenceFile_1] Hadoop 序列文件
1. 关于 SequenceFile 对于日志文件来说,纯文本不适合记录二进制类型数据,通过 SequenceFile 为二进制键值对提供了持久的数据结构,将其作为日志文件的存储格式时,可自定义键(L ...
hive学习3（hive基本操作）
hive基本操作 hive的数据类型 1)基本数据类型 TINYINT,SMALLINT,INT,BIGINT FLOAT/DOUBLE BOOLEAN STRING 2)复合类型 ARRAY:一组有 ...
大数据入门第十一天——hive详解（二）基本操作与分区分桶
一.基本操作 1.DDL 官网的DDL语法教程:点击查看建表语句 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data ...
第2节 hive基本操作：6、7、8
第1节 hive安装:6.hive的基本操作:7.创建数据库的语法:8.hive当中创建内部表的语法. hive的基本操作: 创建数据库与创建数据库表操作创建数据库操作:create databas ...
SQL的基本操作（三）
Hive基本SQL操作 Hive DDL(数据库定义语言) 1.数据库的基本操作 --展示所有数据库 show databases; --切换数据库 use database_name; /*创建数据 ...
【Hadoop离线基础总结】Hive的基本操作
Hive的基本操作创建数据库与创建数据库表创建数据库的相关操作创建数据库:CREATE TABLE IF NOT EXISTS myhive hive创建表成功后的存放位置由hive-site. ...
Key/Value之王Memcached初探：二、Memcached在.Net中的基本操作
一.Memcached ClientLib For .Net 首先,不得不说,许多语言都实现了连接Memcached的客户端,其中以Perl.PHP为主. 仅仅memcached网站上列出的语言就有: ...
Android Notification 详解（一）——基本操作
Android Notification 详解(一)--基本操作版权声明:本文为博主原创文章,未经博主允许不得转载. 微博:厉圣杰源码:AndroidDemo/Notification 文中如有纰 ...

随机推荐

Java并发框架AbstractQueuedSynchronizer(AQS)
1.前言本文介绍一下Java并发框架AQS,这是大神Doug Lea在JDK5的时候设计的一个抽象类,主要用于并发方面,功能强大.在新增的并发包中,很多工具类都能看到这个的影子,比如:CountDo ...
给recycleview加headview
参考了https://blog.csdn.net/qibin0506/article/details/49716795 由于recycleview没有直接添加头部view的api,所以需要我们自己去添 ...
Linux排序不准确的问题，用以下两行代码解决
export LC_ALL=C ...
Spring Security使用报错 No bean named 'springSecurityFilterChain' is defined
今天配置spring security时,运行报出No bean named 'springSecurityFilterChain' is defined错误,报错信息如下严重: Exception ...
Python机器学习笔记：不得不了解的机器学习面试知识点（1）
机器学习岗位的面试中通常会对一些常见的机器学习算法和思想进行提问,在平时的学习过程中可能对算法的理论,注意点,区别会有一定的认识,但是这些知识可能不系统,在回答的时候未必能在短时间内答出自己的认识,因 ...
设计模式教程（Design Patterns Tutorial）笔记之二结构型模式（Structural Patterns）
目录 · Decorator · What is the Decorator Design Pattern? · Sample Code · Adapter · What is the Adapter ...
Oracle添加定时任务
1.创建存储过程注:执行语句后,如果需要请添加commit 2.添加定时job,执行存储过程 declare job_delete number; begin dbms_job.submit( jo ...
[转]window7下利用DockerToolbox安装Docker
本文转自:https://blog.csdn.net/qq2712193/article/details/54576313 版权声明:本文为博主原创文章,未经博主允许不得转载. https://blo ...
c# 虚拟路径转换为绝对路径
/// <summary> /// 解析相对Url /// </summary> /// <param name="relativeUrl">相 ...
Spring基于注解和XML混合方式的使用
首先要明白,基于注解和XML两种方式的实现功能是一样的,只是两种不同的配置方式. 一.IoC配置 1.配置xml 在使用注解与xml结合的方式配置IoC之前,首先要引入context标签: xmlns ...

[SequenceFile_2] SequenceFile 的基本操作

0. 说明

1. 测试读写 && 压缩

2. 测试排序

3. 测试合并

4. 测试将日志文件转换成序列文件

[SequenceFile_2] SequenceFile 的基本操作的更多相关文章

随机推荐

热门专题