File systems do have limits. Thats no surprise. ext3 had a limit at 16 TB file system size. If you needed more space you´d have to use another file system for instance XFS or JFS or spilt the capacity into multiple mount points.

ext4 was designed to allow far more larger file systems than ext3. According to wikipediaext4 has a maximum file system size of 1 EiB (approx. one exabyte or 1024 TB).

Now if you´d try to create one single large file system with ext4 on every linux distribution out there (including OEL 6.1; as of 18th August 2011) you will end up with:

[root@localhost ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

This post is about how to solve the issue.

The demo system

My demo system consists of one large LUNof 18 TB encapsulated in LVM with a logical volume of 17 TB on a Oracle Enterprise Linux (OEL 5.5):

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[root@localhost ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 19791.2 GB, 19791209299968 bytes
255 heads, 63 sectors/track, 2406144 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdb doesn't contain a valid partition table [root@localhost ~]# vgdisplay iscsi
--- Volume group ---
VG Name               iscsi
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  2
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                1
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               18.00 TB
PE Size               4.00 MB
Total PE              4718591
Alloc PE / Size       4456448 / 17.00 TB
Free  PE / Size       262143 / 1024.00 GB
VG UUID               tdi4f2-3ZYr-c1P0-NuSl-i3w2-5qQl-K75guj
[root@localhost ~]# lvdisplay iscsi
--- Logical volume ---
LV Name                /dev/iscsi/test
VG Name                iscsi
LV UUID                8q1UrT-ludC-FEkT-NExO-4Gzd-cn5H-FYJcB1
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                17.00 TB
Current LE             4456448
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:2

Creating file systems  larger than 16TB with ext4:

If you try to create a ext4 file system on the 17 TB logical volume:

[root@localhost ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

OK. Maybe with ext4dev:

[root@localhost ~]# mkfs.ext4dev /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4dev: Size of device /dev/iscsi/test too big to be expressed in 32 bits using a blocksize of 4096.

Nope – no success. The reason behind that are the e2fsprogs (or how they are called on OEL: e4fsprogs) are not able to deal with file systems larger than ~ 16 TB.

To be specific: Even with the most recent e2fsprogs 1.41.14 there is no way to create file systems larger than 16 TB.

But: According to thispost it should work since June:

It’s taken way too long, but I’ve finally finished integrating the 64-bit patches into e2fsprogs’s mainline repository. All of the necessary patches should now be in the master branch for e2fsprogs. The big change from before is that I replaced Val’s changes for fixing up how mke2fs picked the correct fs-type profile from mke2fs.conf with something that I think works much better and leaves the code much cleaner. With this change you need to add the following to your /etc/mke2fs.conf file if you want to enable the 64-bit feature flag automatically for a big disk:

[fs_types] ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
auto_64-bit_support = 1 # <—- add this line
inode_size = 256
}

Alternatively you can change the features line to include the feature “64bit”; this will force the use of the 64-bit fields, and double the size of the block group descriptors, even for smaller file systems that don’t require the 64-bit support. (This was one of my problems with Val’s implementation; it forced the mke2fs.conf file to always enable the 64-bit feature flag, which would cause backwards compatibility issues.) This might be a good thing to do for debugging purposes, though, so this is an option which I left open, but the better way of doing things is to use the auto_64-bit-support flag.

So the change must be there. A short look at the ‘WIP’ (work-in-progress) branch of the e2fsprogrs confirmed the integration.

So i tried to build the most recent e2fsprogs (Remeber: This are *development* tools – use at your OWN RISK):

[root@vm-mkmoel ~] git clone git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
[root@vm-mkmoel ~]# cd e2fsprogs
[root@vm-mkmoel e2fsprogs]# mkdir build ; cd build/
[root@vm-mkmoel build]# ../configure
[root@vm-mkmoel build]# make
[root@vm-mkmoel build]# make install

So let´s try to create a file system:

[root@vm-mkmoel misc]# ./mke2fs -O 64bit,has_journal,extents,huge_file,flex_bg, \
uninit_bg,dir_nlink,extra_isize -i 4194304 /dev/iscsi/test mke2fs 1.42-WIP (02-Jul-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
4456448 inodes, 4563402752 blocks
228170137 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=6710886400
139264 block groups
32768 blocks per group, 32768 fragments per group
32 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000, 3855122432
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 0 mounts or 0 days,
whichever comes first.  Use tune2fs -c or -i to override.

OK. Seems to have worked. Lets check it:

[root@vm-mkmoel misc]# mount /dev/iscsi/test /mnt
[root@vm-mkmoel misc]# df -h
Filesystem            Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00     18G  2.6G   14G  16% /
/dev/sda1              99M 13M  82M  14% /boot
tmpfs                 502M 0  502M   0% /dev/shm
/dev/mapper/iscsi-test      17T  229M   17T   1% /mnt
[root@vm-mkmoel misc]# mount | grep mnt
/dev/mapper/iscsi-test on /mnt type ext4 (rw)

As you can see: With the most recent development e2fsprogrs it is possible to create ext4 file systems larger than 16 TB.

I even tried it with a 50 TB file system (because thats what i needed i my use case):

[root@vm-mkmoel misc]# df -h
Filesystem                          Size Used Avail Use% Mounted on
/dev/mapper/iscsi-test  50T  237M   48T   1% /mnt

Update:

Today i tested some more user space tools.

fsck

Maybe the most important tool in case the journaling fails. I copied some data to the file system (roughly about 2 TB) and had 73% of my 6.5 million inodes (one inode per 8 MB) allocated. Running fsck on my demo system with 1 GB memory yields:

[root@vm-mkmoel ~]# fsck.ext4 -f /dev/iscsi/test
e2fsck 1.42-WIP (02-Jul-2011)
Pass 1: Checking inodes, blocks, and sizes
Error allocating block bitmap (4): Memory allocation failed

fsck is some kind of messy with memory. Increasing the memory to 8 GB did it. While running fsck i noticed a memory consumption of up to 3.4 GB! So large file systems require a lot of memory for fscking. It requires even more memory with more inodes!

resize2fs

After fscking my file system i tried to resize it:

[root@localhost sbin]# lvresize -l +7199 /dev/iscsi/test
  Extending logical volume test to 50.00 TB
  Logical volume test successfully resized
[root@localhost sbin]# resize2fs /dev/iscsi/test
resize2fs 1.42-WIP (02-Jul-2011)
resize2fs: New size too large to be expressed in 32 bits

As you can see resizing the file system is not yet supported/implemented. So it would be wise to create the file system with the final size from start since growing is NOT possible!

tune2fs

tune2fs seems to work – at least it dumps the suberblock contents:

[root@localhost sbin]# tune2fs -l /dev/iscsi/test
tune2fs 1.42-WIP (02-Jul-2011)
Filesystem volume name:   <none>
Last mounted on:          /mnt/mnt
Filesystem UUID:          a754e947-8b89-415d-909d-000e6c95c44a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              6550000
Block count:              13414400000
Reserved block count:     670720000
Free blocks:              13394134177
Free inodes:              1484526
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16
Inode blocks per group:   1
Flex block group size:    16
Filesystem created:       Wed Oct 19 17:09:06 2011
Last mount time:          Wed Oct 19 18:45:47 2011
Last write time:          Wed Oct 19 18:45:47 2011
Mount count:              1
Maximum mount count:      20
Last checked:             Wed Oct 19 18:35:36 2011
Check interval:           0 (<none>)
Lifetime writes:          2511 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      ea117174-a04a-412e-a067-7972804f83d7
Journal backup:           inode blocks

Setting properties works as well:

[root@localhost sbin]# tune2fs -L test /dev/iscsi/test
tune2fs 1.42-WIP (02-Jul-2011)
[root@localhost sbin]# tune2fs -l /dev/iscsi/test | head -10
tune2fs 1.42-WIP (02-Jul-2011)
Filesystem volume name:   test
Last mounted on:          /mnt/mnt
[...]

e4defrag

e4defrag is a new tool to defragment the ext4 file system. According to the man page:

e4defrag  reduces  fragmentation of extent based file. The file targeted by e4defrag is created on ext4 filesystem made with “-O extent” option (see  mke2fs(8)).   The  targeted  file gets more contiguous blocks and improves the file access speed.

I am not yet sure how this affects file systems used for oracle datafiles. All i can say is that e4defrag seems to work with >16 TB file systems:

[root@localhost sbin]# e4defrag /mnt/
ext4 defragmentation for directory(/mnt/)
[....]
        Success:                        [ 4772040/5065465 ]
        Failure:                        [ 293425/5065465 ]

The failures are from directories which cannot be defragmented.

Conclusion

With the most recent e2fstools (1.42-WIP) it is possible to create ext4 file system larger than 16 TB.

If you do so remember the following:

  • the tool is still in development – use at your own risk!
  • tune the values for autocheck (after x mounts / after y days)
  • adjust the “-i” switch which defnes the bytes/inode ratio; in the example above one inode is created for every 8 MB
  • the more inodes you create the longer fsck takes and the more memory it needs
  • Resizing the file system (growing / shrinking) is NOT possible at the moment

http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/

分区容量大于16TB的格式化的更多相关文章

  1. CentOS 6.x 无法格式化大于16TB的ext4分区处理

    CentOS 6.x 在格式化大于16TB的ext4分区时,会提示如下错误: mke2fs 1.41.12 (17-May-2010) mkfs.ext4: Size of device /dev/s ...

  2. CentOS 6U7分区大于2TB的磁盘以及挂载大于16TB分区磁盘的解决方案

    一.内容介绍1.问题描述1).问题一 CentOS 6.x 在格式化大于16TB的ext4分区时,会提示如下错误: mke2fs 1.41.12 (17-May-2010)mkfs.ext4: Siz ...

  3. LVM XFS增加硬盘分区容量(resize2fs: Bad magic number in super-block while)

    LVM XFS增加硬盘分区容量(resize2fs: Bad magic number -- :: 分类: Linux LVM XFS增加硬盘分区容量(resize2fs: Bad magic num ...

  4. #题目:有10 台被监控主机、一台监控机,在监控机上编写脚本,一旦某台被监控机器/ 分区适用率大于80%, 就发邮件报警放到crontab 里面, 每10 分钟检查一次

    #题目:有10 台被监控主机.一台监控机,在监控机上编写脚本,一旦某台被监控机器/ 分区适用率大于80%, 就发邮件报警放到crontab 里面, 每10 分钟检查一次 #测试机器:虚拟机Linux ...

  5. Linux下基于LVM调整分区容量大小的方法

    Linux下调整分区容量大小的方法(适用于centos6-7) 说明:以下方法均使用centos6.9和centos7.4进行测试. Centos6分区容量调整方法 1.web分区空间不足,新添加一块 ...

  6. gparted 当分区空间大于1T 用gparted分区

    lsblkfdisk -lparted -s /dev/sdb mklabel msdos parted -s /dev/sdb mkpart primary 0 100%lsblk dfparted ...

  7. Ubuntu 15.04 无损扩展分区(目录)容量的方法 (无需格式化, 文件不丢失)

    源 起 用了一段时间Ubuntu,碰到了UBuntu磁盘空间不足的问题, 最初我只给Ubuntu分配了30个G的空间, 昨天试用了一下VirtualBox安装了一个xp虚拟系统,用以解决Ubuntu下 ...

  8. 【磁盘/文件系统】第三篇:标准磁盘分区流程针对parted(一般硬盘容量大于2T(但是小于2T也可以进行分区);分区数最大是支持100多个分区)

    说明: 在 Linux 上可以采用 parted 来对磁盘进行分区 1.通过 fdisk -l 可以查看磁盘是否存在, 由于使用的是大磁盘(大于2T),fdisk 不能用来作为分区工具了,而应该使用 ...

  9. 使用大于16TB的ext4文件系统

    我们的电脑想要快速开机,需要具备三个条件:第一是主板支持UEFI,二是系统支持UEFI(Win8),最后就硬盘需要采用GPT分区. GPT分区全名为Globally Unique Identifier ...

随机推荐

  1. iOS资讯详情页实现—WebView和TableView混合使用(转)

    iOS资讯详情页实现—WebView和TableView混合使用 如果要实现一个底部带有相关推荐和评论的资讯详情页,很自然会想到WebView和TableView嵌套使用的方案. 这个方案是WebVi ...

  2. .Net Framework 4.0: Using System.Lazy<T>

    原文发布时间为:2011-04-26 -- 来源于本人的百度文章 [由搬家工具导入] http://weblogs.asp.net/gunnarpeipman/archive/2009/05/19/n ...

  3. 常见的 Git 命令:

    开始一个工作区(参见:git help tutorial) clone 克隆一个仓库到一个新目录 init 创建一个空的 Git 仓库或重新初始化一个已存在的仓库 在当前变更上工作(参见:git he ...

  4. hdu 2824(欧拉函数)

    The Euler function Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Other ...

  5. Wannafly交流赛1 A 有理数[模拟/分类讨论]

    链接:https://www.nowcoder.com/acm/contest/69/A来源:牛客网 题目描述 有一个问题如下: 给你一个有理数v,请找到小于v的最大有理数. 但这个问题的答案对于任意 ...

  6. 第十二届北航程序设计竞赛决赛网络同步赛 J题 两点之间

    题目链接  Problem J 这道题思路还是很直观的,但是有两个难点: 1.题目中说$1<=NM<=10^{6}$,但没具体说明$N$和$M$的值,也就是可能出现: $N = 1, M ...

  7. webpack学习(一)安装和命令行、一次js/css的打包体验及不同版本错误

    一.前言 找了一个视频教程开始学习webpack,跟着视频学习,在自己的实际操作中发现,出现了很多问题.基本上都是因为版本的原因而导致,自己看的视频是基于webpack 1.x版,而自己现在早已是we ...

  8. GCJ——Minimum Scalar Product(2008 Round1 AA)

    题意: 给定两组各n个数,可任意调整同一组数之中数字的顺序,求 sum xi*yi i=1..n的最小值. Small: n<=8 abs xy,yi<=1000 Large: n< ...

  9. Java---Static内存图详解

    案例: 输出结果 内存图: 执行流程: java文件通过编译成class文件,class文件通过类加载器加载到方法区中,程序首先会加载核心类库,也就是你的程序想要运行所需要的一些最基本的类.接着程序会 ...

  10. MAC(Linux)升级Openssl

    系统上一般默认安装的是0.9.8版本的Openssl,不能满足需要.这时候就要重新安装Openssl. 上官网下载新版openssl:https://www.openssl.org/source/ 解 ...