ALUA and SRM
A couple important (ALUA and SRM) notes
There’s some internal dialog today on our “VMware Champions” and “CLARiiON Champions” EMC distribution lists – I want to share this with the outside world (customers/partners) – as it’s pertinent. While the second point (SRM and FLARE 29) is CLARiiON specific – the first point (ALUA and vSphere) is pertinent for multiple vendors (though I have written it up with a CLARiiON-specific bent and notes).
If you’re a CLARiiON customer, and using either vSphere or SRM – firstly thank you for being VMware and EMC customers!, secondly please read on…
First topic: Native Multipathing, Round Rob and ALUA configurations with CLARiiON and vSphere ESX/ESXi 4
This is discussed in the CLARiiON vSphere applied tech guide here, (which I can’t stress strongly enough should be mandatory reading for people with CLARiiONs supporting VMware environments) but I’m going to provide a bit more of the WHY in here. Note that in the CLARiiON case -
CLARiiON arrays are in VMware’s nomenclature an “active/passive” array – they have a LUN ownership model where internally a LUN is “owned” by one storage processor or the other. This is common in mid-range (“two brain” aka “storage processors”) arrays who’s core architecture was born in the 1990s (and is similar to NetApp’s Fibre Channel – but not iSCSI - implementation, HP EVA and others). In storage processor failure cases, in a variety of ways the LUNs become active on the secondary brain. On a CLARiiON, this is called a “trespass”. On these arrays, in the VI client (or vSphere client) LUN shows as “active” on ports behind the owning storage processor, and “standby” on the non-owning storage processor.
For this reason, in VI3 and earlier – the default failover behavior on these arrays was Most Recently Used (or MRU) – where when a storage processor fails (its paths go to “dead”), the LUN trespasses (with a CLARiiON, ESX 3 actually issues a “trespass” command) to the other storage processor, the paths change to an Active state (ESX issues what’s called a “test unit ready” or TUR command to check LUN state), and then I/O continues. MRU then doesn’t failback to the other paths when the failed storage processor is fixed and the original paths transition from “dead” to “standby” (because they aren’t the SP owner). The behavior of the “Fixed” path policy would revert back to the original path (which would trigger another trespass).
This meant you could get into a “race condition” if you didn’t use MRU, where the ESX host is chasing a constantly moving LUN (sometimes called “path thrashing”).
Does that make sense?
Now, trespasses can occur VERY fast, and are transparent to the guest. This is a VERY VERY reliable mechanism. There are very, very (VERY) rare cases where a LUN tresspass does not complete where the second storage processor is shutdown before trespasses are complete (I’ve been involved in one case amongst more than a hundred thousands of CLARiiONs out there).
So – if this works well, and is mature, what’s new?
The answer is that vSphere supports Aysmmetric Logical Unit Access (ALUA) – note that there is no support for this in VI3.x. ALUA is a SCSI standard, and is widely implemented across mid-range storage arrays – including the CLARiiON. with ALUA, the LUN is reachable across both storage processors at the same time. Paths for the “non-owning” storage processor take IO and transit it across the internal interconnect architecture of the mid-range arrays (bandwidth and behavior varies). in the example in the diagram below, the paths on SPA advertise as “active (non-optimized)”, and aren’t used for I/O unless the active I/O paths are not working.
When you use ALUA, these mid-range arrays can work with the Fixed and Round Robin Path Selection Plugin (PSP) multipathing options in vSphere 4’s Native Multipathing Plugin (NMP). You can then used Fixed and Round Robin policies without worrying about path thrashing.
The ALUA standard can be implemented with SCSI-2 commands OR SCSI-3 commands. in vSphere by default, the SCSI-3 command set for reservations required to handle ALUA behavior are implemented, NOT SCSI-2.
To make this even more clear (I hope), don’t be confused: a device can be a SCSI-3 device without using SCSI-3 for ALUA. vSphere requires that all SCSI devices are SCSI-3 devices (whether iSCSI, FC, or FCoE), otherwise they don’t show up to the ESX host.
CLARiiONs as of FLARE 26 implement SCSI-2 reservation mechanisms for ALUA. As of FLARE 28.5 they support both SCSI-2 and SCSI-3 mechansims.
FLARE 28.5 and FLARE 29 are supported on CLARiiON CX4, but are currently not supported on the older CLARiiON CX3. Ergo, a CX4 can support ALUA with vSphere, a CX3 cannot.
With CLARiiON, to configure a host for ALUA mode failover behavior, simply run the Failover wizard in Navisphere, and switch to Failover Mode = 4 (ALUA mode).
Making this change in my experience means using VMware maintenance mode to either bounce the ESX host non-disruptively, or remove/re-add the devices with a manual vmotion to make it non-disruptive. Personally – I would recommend the maintenance mode approach, it’s hard to make an error that would cause an outage.
If you are running a version of FLARE earlier than FLARE 28.5, and change the failover mode, the storage devices will not be visible to the ESX host – so remember, CLARiiON CX4, FLARE 28.5 and later only!
Once you make this change, you can change NMP from MRU to Round Robin (NMP RR) – either one device at a time in the GUI or the CLI. Note: I’m going to start standardizing on the vMA CLI syntax, as I think (personally) that’s the way to go – and applies equally to ESX and ESXi:
esxcli --server=<SERVERNAME> nmp device setpolicy --device <device UID> --psp <PSP type>
alternatively you can change the PSP which is assigned to the SATP (single mass change for an ESX host) :using this command:
esxcli --server=<SERVERNAME> nmp satp setdefault –psp <PSP type> –satp <SATP type>
The Round Robin policy doesn’t issue I/Os in a simple “round robin” between paths in the way many expect. By default RR sends 1000 commands down each path before moving to the next path; this is called the IOOperationLimit. In configurations where a LUN queue is busy, this limit doesn't demonstrate much path aggregation because quite often some of the thousand commands will have completed before the last command is sent. That means the paths aren't full (even though queue at the storage array might be). When using 1Gbit iSCSI, quite often the physical path is often the limiting factor on throughput, and making use of multiple paths at the same time shows better throughput.
You can reduce down the number of commands issued down a particular path before moving on to the next path all the way to 1, thus ensuring that each subsequent command is sent down a different path:
You can make this change by using this command:
esxcli --server=<SERVERNAME> nmp roundrobin setconfig --device <lun ID> --iops 1 --type iops
Note that cutting down the number of iops does present some potential problems with some storage arrays caching is done per path. By spreading the requests across multiple paths, you are defeating and caching optimization at the storage end and could end up hurting your performance. Luckily, most modern storage systems (this is true of CLARiiON) don't cache per port. There's still a minor path-switch penalty in ESX, so switching this often probably represents a little more CPU overhead on the host.
There are some cases where RR isn’t recommended – more to come on that in a followup around iSCSI and vSphere 4 (though they apply to all protocols)
PowerPath/VE is a MPP (a full NMP substitute) – it improves path discovery, path selection from basic round robin to adaptive in general and predictive with EMC arrays, and also adds ALUA reservation support using SCSI-2. This means you can use EMC PowerPath/VE with vSphere regardless of whether the array uses SCSI-2 or SCSI-3. In fact, PP/VE provides the benefit of full path utilization without any configuration needed other than simply loading the vmkernel module.
Each array behaves differently – so check with your storage vendor, and don’t assume anything here (for better or worse for them or for EMC) applies to others.
Netting it out:
- vSphere 4 supports ALUA, but does so with the SCSI-3 reservation mechanism.
- CLARiiON supports ALUA as of FLARE 26 using the SCSI-2 reservation mechanism, and both the SCSI-2 and SCSI-3 reservation mechanism as of FLARE 28.5.
- FLARE 28.5 and later (FLARE 29 is the most current firmware rev) are supported on CLARiiON CX4 only, not older CX3, or AX.
- If you are CLARiiON customer and want to drive all paths to devices using the free ALUA NMP Round Robin, you need to be on a CX4, and running FLARE 28.5 and later.
- If want to drive all paths to devices, and want new path discovery to automated, and the I/O distribution to be predictive, you can use any EMC array (and some HDS, HP and IBM arrays) and use EMC PowerPath/VE
Second topic: VM-Aware Navipshere (FLARE 29) and Site Recovery Manager
So, VM-Aware Navisphere (FLARE 29) is out there. If you want to know more about it, check out this post here.
A case came up late last night from an EMC partner. The partner reached out and pinged me. The MirrorView current MirrorView SRA (1.3.0.8) doesn’t work with FLARE 29 – minor API incompatibilities (you can see his post here: http://vmjunkie.wordpress.com/2009/09/15/srm-warning-flare-4-29-mirrorview-incompatible-with-current-sra)
The new SRA (1.4) is just finishing up the SRM qual, and will be immediately available when the next SRM release is out (which will be shortly). This one also has a treat in store for MirrorView customers (a free tool call MirrorView Insight) that adds a TON of SRM-related stuff, including failback. More on that shortly.
Netting it out:
- If you’re using a CLARiiON with MirrorView and Site Recovery Manager, hold off the FLARE 29 upgrade just for a BIT longer – otherwise it will break Site Recovery Manager.
ALUA and SRM的更多相关文章
- 记第一次TopCoder, 练习SRM 583 div2 250
今天第一次做topcoder,没有比赛,所以找的最新一期的SRM练习,做了第一道题. 题目大意是说 给一个数字字符串,任意交换两位,使数字变为最小,不能有前导0. 看到题目以后,先想到的找规律,发现要 ...
- SRM 513 2 1000CutTheNumbers(状态压缩)
SRM 513 2 1000CutTheNumbers Problem Statement Manao has a board filled with digits represented as St ...
- SRM 510 2 250TheAlmostLuckyNumbersDivTwo(数位dp)
SRM 510 2 250TheAlmostLuckyNumbersDivTwo Problem Statement John and Brus believe that the digits 4 a ...
- SRM 657 DIV2
-------一直想打SRM,但是感觉Topcoder用起来太麻烦了.题目还是英文,不过没什么事干还是来打一打好了.但是刚注册的号只能打DIV2,反正我这么弱也只适合DIV2了.. T1: 题目大意: ...
- SRM DIV1 500pt DP
SRM 501 DIV1 500pt SRM 502 DIV1 500pt SRM 508 DIV1 500pt SRM 509 DIV1 500pt SRM 511 DIV1 500pt SRM 5 ...
- TC srm 673 300 div1
TC srm.673 300 Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 Description 给你n(n<=50)匹马和n个人,一匹马和一个人能 ...
- SRM 584 第一次玩TopCoder。。。只水题一道。。。
第一次topcoder,以前老感觉没有资格去做tc,cf什么的,现在已经慢慢接触了. 感觉还可以,还是有让我们这些蒻菜安慰的水题. tc的确很好玩,用客户端比赛,还有各种规则,而且还是只编写一个类提交 ...
- SRM 616 ColorfulCoins
题意:给定一个从小到大的货币面值,每一个面额都是其前面面额的倍数(倍数大于等于2),每一种货币面值对应一种颜色,目前不清楚面值与颜色的对应关系.要求用最少的查询次数来确定面额与颜色的对应关系.(一次查 ...
- SRM144 - SRM 148(少144-DIV1-LV3,147-DIV2-LV3)
SRM 144 DIV 1 500pt tag:组合 题意:彩票中奖.给定n, m,从1-n中选择m个数组成数列a1, a2, a3...am.对于数列{am}分别满足以下条件的概率: (1)数列所有 ...
随机推荐
- Pandas缺失数据
数据丢失(缺失)在现实生活中总是一个问题. 机器学习和数据挖掘等领域由于数据缺失导致的数据质量差,在模型预测的准确性上面临着严重的问题. 在这些领域,缺失值处理是使模型更加准确和有效的重点. 何时以及 ...
- Linux常用命令.rpm
1.安装: rpm -ivh 包全名(查询依赖网址:http://www.rpmfind.net) -i(install):安装 -v(verbose):显示详细信息 -h(hash):显示进度 -- ...
- libnetwork 源码浅析
[编者的话]从docker 1.6开始关注docker网络这块,从原来的铁板一块,到后来的libnetwork拆分,到现在的remote driver,docker 一直在改进.功能缺失,实用性不足, ...
- iOS CoreData (二) 版本升级和数据库迁移
前言:最近ChinaDaily项目需要迭代一个新版本,在这个版本中CoreData数据库模型上有新增表.实体字段的增加,那么在用户覆盖安装程序时就必须要进行CoreData数据库的版本升级和旧数据迁移 ...
- 带你彻底明白 Android Studio 打包混淆
前言 在使用Android Studio混淆打包时,该IDE自身集成了Java语言的ProGuard作为压缩,优化和混淆工具,配合Gradle构建工具使用很简单.只需要在工程应用目录的gradle文件 ...
- LeetCode第[44]题(Java):Wildcard Matching
题目:通配符匹配 难度:hard 题目内容: Given an input string (s) and a pattern (p), implement wildcard pattern match ...
- mac用ssh连接linux云服务器中文乱码或无法显示解决
问题1:服务器是ubuntu16.04,用mac自带的ssh连接后无法正常输入中文? 解:这种情况一般是终端和服务器的字符集不匹配,MacOSX下默认的是utf8字符集. 打开编辑 .bashrc 文 ...
- HTTP返回结果状态码小结
HTTP 状态码负责表示客户端 HTTP 请求的返回结果.标记服务器端的处理是否正常.通知出现的错误等工作. 一.状态码的类别 状态码的职责是当客户端向服务器端发送请求时,描述返回的请求结果.借助状态 ...
- [置顶]
【机器学习PAI实践三】雾霾成因分析
一.背景 如果要人们评选当今最受关注话题的top10榜单,雾霾一定能够入选.如今走在北京街头,随处可见带着厚厚口罩的人在埋头前行,雾霾天气不光影响了人们的出行和娱乐,对于人们的健康也有很大危害.本文通 ...
- 3.了解linux系统以及搭建学习环境
目录: 1.linux的前世今生. 2.企业如何选择linux系统? 3.如何在虚拟机上安装linux系统?搭建学习环境. 1.linux的前世今生. 1).起源:先是贝尔实验室的Unix系统,因为各 ...