by HIMANSHU ARORA on OCTOBER 16, 2012

http://www.thegeekstuff.com/2012/10/15-linux-split-and-join-command-examples-to-manage-large-files/

Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.

Join and split command syntax:

join [OPTION]… FILE1 FILE2
split [OPTION]… [INPUT [PREFIX]]

Linux Split Command Examples

1. Basic Split Example

Here is a basic example of split command.

$ split split.zip 

$ ls
split.zip xab xad xaf xah xaj xal xan xap xar xat xav xax xaz xbb xbd xbf xbh xbj xbl xbn
xaa xac xae xag xai xak xam xao xaq xas xau xaw xay xba xbc xbe xbg xbi xbk xbm xbo

So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.

$ wc -l *
40947 split.zip
1000 xaa
1000 xab
1000 xac
1000 xad
1000 xae
1000 xaf
1000 xag
1000 xah
1000 xai
...
...
...

So the output above confirms that by default each x** file contains 1000 lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.

As you see in the following example, it is using suffix of length 5 on the split files.

$ split -a5 split.zip
$ ls
split.zip xaaaac xaaaaf xaaaai xaaaal xaaaao xaaaar xaaaau xaaaax xaaaba xaaabd xaaabg xaaabj xaaabm
xaaaaa xaaaad xaaaag xaaaaj xaaaam xaaaap xaaaas xaaaav xaaaay xaaabb xaaabe xaaabh xaaabk xaaabn
xaaaab xaaaae xaaaah xaaaak xaaaan xaaaaq xaaaat xaaaaw xaaaaz xaaabc xaaabf xaaabi xaaabl xaaabo

Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.

3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.

In this example, the split files were created with a size of 200000 bytes.

$ split -b200000 split.zip 

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu 4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xar
...
...
...

4. Create Split Files with Numeric Suffix using -d option

As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.

Here is an example. This has numeric suffix on the split files.

$ split -d split.zip
$ ls
split.zip x01 x03 x05 x07 x09 x11 x13 x15 x17 x19 x21 x23 x25 x27 x29 x31 x33 x35 x37 x39
x00 x02 x04 x06 x08 x10 x12 x14 x16 x18 x20 x22 x24 x26 x28 x30 x32 x34 x36 x38 x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks, use the -C option.

This example will create 50 chunks of split files.

$ split -n50 split.zip
$ ls
split.zip xac xaf xai xal xao xar xau xax xba xbd xbg xbj xbm xbp xbs xbv
xaa xad xag xaj xam xap xas xav xay xbb xbe xbh xbk xbn xbq xbt xbw
xab xae xah xak xan xaq xat xaw xaz xbc xbf xbi xbl xbo xbr xbu xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.

Here is an example:

$ split -n50 testfile

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
...
...
...

So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:

$ split -n50 -e testfile
$ ls
split.zip testfile xaa xab xac xad xae xaf $ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa

So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l option.

As seen in the example below, split files are created with 20000 lines.

$ split -l20000 split.zip

$ ls
split.zip testfile xaa xab xac $ wc -l x*
20000 xaa
20000 xab
947 xac
40947 total

Get Detailed Information using –verbose option

To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.

$ split -l20000 --verbose split.zip
creating file `xaa'
creating file `xab'
creating file `xac'

Linux Join Command Examples

8. Basic Join Example

Join command works on first field of the two files (supplied as input) by matching the first fields.

Here is an example :

$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada $ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto $ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto

So we see that a file containing countries was joined with another file containing capitals on the basis of first field.

9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.

In this example, since the input file is not sorted, it will display a warning/error message.

$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK $ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto $ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto

10. Ignore Case using -i option

When comparing fields, the difference in case can be ignored using -i option as shown below.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada $ cat testfile2
a NewDelhi
B Washington
c Dublin
d London
e Toronto $ join testfile1 testfile2
a India NewDelhi
c Ireland Dublin
d UK London
e Canada Toronto $ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

11. Verify that Input is Sorted using –check-order option

Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.

$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada $ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto $ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
join: testfile1:6: is not sorted: e Canada

12. Do not Check the Sortness using –nocheck-order option

This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.

$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).

In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia $ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto $ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto $ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto
f Australia

14. Print Only Unpaired Lines using -v option

In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.

$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada $ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e $ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

15 Linux Split and Join Command Examples to Manage Large Files--reference的更多相关文章

  1. 15 Basic ‘ls’ Command Examples in Linux

    FROM: http://www.tecmint.com/15-basic-ls-command-examples-in-linux/ ls command is one of the most fr ...

  2. 18 Tar Command Examples in Linux

    FROM: http://www.tecmint.com/18-tar-command-examples-in-linux/ 18 Tar Command Examples in Linux By R ...

  3. Linux就这个范儿 第15章 七种武器 linux 同步IO: sync、fsync与fdatasync Linux中的内存大页面huge page/large page David Cutler Linux读写内存数据的三种方式

    Linux就这个范儿 第15章 七种武器  linux 同步IO: sync.fsync与fdatasync   Linux中的内存大页面huge page/large page  David Cut ...

  4. linux split (分割文件)命令

    linux split 命令 功能说明:切割文件. 语 法:split [--help][--version][-<行数>][-b <字节>][-C <字节>][- ...

  5. String Split 和 Join

    很多时候处理字符串数据,比如从文件中读取或者存入 - 我们可能需要加入分隔符(如CSV文件中的逗号),或使用一个分隔符来合并字符串序列. 很多人都知道使用split()的方法,但使用与其对应的Join ...

  6. C# 中奇妙的函数–7. String Split 和 Join

    很多时候处理字符串数据,比如从文件中读取或者存入 - 我们可能需要加入分隔符(如CSV文件中的逗号),或使用一个分隔符来合并字符串序列. 很多人都知道使用split()的方法,但使用与其对应的Join ...

  7. split和join函数的比较

    关于split和join方法 处理对象字符串.split拆分字符串,join连接字符串 string.join(sep): 以string作为分隔符,将seq中的所有元素(字符串表示)合并成一个新的字 ...

  8. 按行切割大文件(linux split 命令简版)

    按行切割大文件(linux split 命令简版) #-*- coding:utf-8 -*- __author__ = 'KnowLifeDeath' ''' Linux上Split命令可以方便对大 ...

  9. Linux split命令详解

    Linux split命令 Linux split命令用于将一个文件分割成数个.该指令将大文件分割成较小的文件,在默认情况下将按照每1000行切割成一个小文件. 将输入内容拆分为固定大小的分片并输出到 ...

随机推荐

  1. 手势识别官方教程(2)识别常见手势用GestureDetector+手势回调接口/手势抽象类

    简介 GestureDetector识别手势. GestureDetector.OnGestureListener是识别手势后的回调接口.GestureDetector.SimpleOnGesture ...

  2. windows 上rails3.2 + ruby1.9环境搭建

    题外话:本文是通过参考网友资料,亲自尝试过后写的,有不对之处,还请网友指正! 1.搭建环境 准备ruby1.9.3 下载地址: 下载地址:http://rubyforge.org/frs/?group ...

  3. Server.MapPath 的使用方法

    Server.MapPath 的使用方法 用法: 1.Server.MapPath ("/") 应用程序根目录所在的位置 如 C:\Inetpub\wwwroot\ 2.Serve ...

  4. HTML页面的导出,包括Excel和Word导出

    //导出到Excel --- 全部导出,可以设置一些隐藏进行导出 protected void btnExport_Click(object sender, EventArgs e)    {     ...

  5. Linux下搭建BT服务器

    P2P(Peer to Peer 即对等网络)就是在这种背景下提出的一种网络技术,P2P可以简单地定义为通过直接交换信息,共享计算机资源和服务,对等计算机兼有客户机和服务器的功能.在这种网络中所有的节 ...

  6. ☀【JS】检测属性

    <!DOCTYPE html> <html lang="zh-CN"> <head> <meta charset="utf-8& ...

  7. dtree的使用

      第一步:到官网下载下载dtree的相关包. 第二步:导入相关包 <link rel="StyleSheet" href="${ctx}/dtree/dtree. ...

  8. sharepoint 2010 隐藏左边菜单left menu样式脚本

    转:http://www.cfanz.cn/?c=article&a=read&id=60536 在v4.master中,<head></head>标签中,加入 ...

  9. ASP.NET MVC3学习心得-----表单和HTML辅助方法

    5.1表单的使用 5.1.1  action和method的特性 表单是包含输入元素的容器,包含按钮.复选框.文本框等元素,表单的这些输入元素使得用户能够向页面中输入信息,并把输入信息提交给服务器.A ...

  10. Monkey的简单自动化

    手机测试都逃避不了Monkey,但每次都是手动跑Monkey,自己导出包来,一条条的手动输入命令. 现在轮到我去执行这些任务,觉得很是繁琐,于是写了这个脚本,自动读取导出的包名,一键回车搞定. 代码如 ...