The Mirantis NFV initiative aims to create an NFV ecosystem for OpenStack, with validated  hardware at the bottom; hardened, configurationally-optimized OpenStack as a platform in the middle, and validated VNFs and other NFV software and application components at the top. As the pure play OpenStack company, we know that OpenStack is the best way to create an NFV infrastructure (NFVi), but we also know that our NFV clients – both telcos and enterprises – need more than just the OpenStack platform. They need a complete solution for NFV Infrastructure (NFVi) that answers the whole stack of architectural challenges presented by NFV — in compute, networking, storage, availability, scale and performance — and that reliably provides the network functions, orchestration and management functionality carriers need.

To provide this solution, Mirantis is integrating and optimizing OpenStack itself, and working with an ever-growing number of partners. In this article, we’ll talk about one important innovation that will help turn OpenStack into NFVi, Single Root I/O Virtualization or SR-IOV.

SR-IOV is a PCI Special Interest Group (PCI-SIG) specification for virtualizing network interfaces, representing each physical resource as a configurable entity (called a PF for Physical Function), and creating multiple virtual interfaces (VFs or Virtual Functions) with limited configurability on top of it, recruiting support for doing so from the system BIOS, and conventionally, also from the host OS or hypervisor. Among other benefits, SR-IOV makes it possible to run a very large number of network-traffic-handling VMs per compute without increasing the number of physical NICs/ports, and provides means for pushing processing for this down into the hardware layer, off-loading the hypervisor and significantly improving both throughput and deterministic network performance. That’s why it’s an NFV must-have.

We first talked about SR-IOV at the OpenStack Summit in Vancouver, in a session with an unofficial title that might as well have been “Run, Forrest, run!” because the main idea of SR-IOV is to get data to VMs more quickly. Now, we’re going to look at actually using SR-IOV with Mirantis OpenStack.

SR-IOV can be complicated. Note: On Intel NICs, PF cannot support promiscuous mode when SR-IOV is enabled, so it cannot be doing L2 bridging. Because of this, you shouldn’t enable SR-IOV on interfaces that have standard Fuel networks assigned to them. (One way to get around this problem is to use nova host aggregates and different flavours for normal and SR-IOV enabled instances, but it’s out of scope for us in this article; if you’d like to hear more about it, let us know in the comments, and we’ll do a separate blog post.)

You should note that SR-IOV has a couple of limitations in the Kilo release of OpenStack. Most notably, instance migration with SR-IOV attached ports is not supported. Also, iptables-based filtering is not usable with SR-IOV NICs, because SR-IOV bypasses the normal network stack, so security groups cannot be used with SR-IOV enabled ports (though you still can use security groups for normal ports).

So now that we know what we’re talking about, let’s look at how to enable SR-IOV and use SR-IOV. While you can use Fuel to deploy a Mirantis OpenStack cloud that includes all of the pieces for SR-IOV, it still needs to be configured separately.

Enabling SR-IOV

To enable SR-IOV, you need to configure it on compute and controller nodes.  Let’s start with the compute nodes.

Configure SR-IOV on Compute nodes

To enable SR-IOV, perform the following steps only on Compute nodes that will be used for running instances with SR-IOV virtual NICs:

  1. Ensure that your compute nodes are capable of PCI passthrough and SR-IOV. Your hardware must provide VT-d and SR-IOV capabilities and these extensions may need to be enabled in the BIOS. VT-d options are usually configured in the Chipset Configuration/North Bridge/IIO configuration” section of the BIOS, while SR-IOV support is configured in “PCIe/PCI/PnP Configuration.”
    If your system supports VT-d you should see the messages related to DMAR in dmesg output:

    1. # grep -i dmar /var/log/dmesg
    2. [ 0.000000] ACPI: DMAR 0000000079d31860 000140 (v01 ALASKA A M I 00000001 INTL 20091013)
    3. [ 0.061993] dmar: Host address width 46
    4. [ 0.061996] dmar: DRHD base: 0x000000fbffc000 flags: 0x0
    5. [ 0.062004] dmar: IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
    6. [ 0.062007] dmar: DRHD base: 0x000000c7ffc000 flags: 0x1
    7. [ 0.062012] dmar: IOMMU 1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020de
    8. [ 0.062014] dmar: RMRR base: 0x0000007bc94000 end: 0x0000007bca2fff

    This is just an example, of course; your output may differ.

    If your system supports SR-IOV you should see SR-IOV capability section for each NIC PF, and the total VFs should be non-zero:

    1. lspci -vvv | grep -i "initial vf"

    Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
    Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 01
    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01

  2. Check that VT-d is enabled in the kernel using this command:
    1. # grep -i "iommu.*enabled" /var/log/dmesg

    If you don’t see a response similar to:

    1. [0.000000] Intel-IOMMU: enabled

    then it’s not yet enabled.  Enable it by editing /etc/default/grub to add:

    1. GRUB_CMDLINE_LINUX=" console=ttyS0,9600 console=tty0 net.ifnames=0 biosdevname=0 rootdelay=90 nomodeset root=UUID=d2b06335-bf6d-44b8-a0a4-a54224bdc7f8 intel_iommu=on"

    Next, update grub and reboot to get the changes to take effect:

    1. # update-grub
    2. # reboot

    and repeat the check. For new environments you may  want to add these kernel parameters before deploying so that they will be applied to all nodes of environment.  You can do that from the Fuel interface in the “Kernel Parameters” section of the “Settings” tab.

    NOTE: If you have an AMD motherboard, you need to check for ‘AMD-Vi’ in the output of the dmesg command and pass the options iommu=pt iommu=1″ to kernel, (but we haven’t yet tested that).
  3. Enable the number of virtual functions required on the SR-IOV interface. NOTE: Do not set the number of VFs to more than required, since this might degrade performance. Depending on kernel and NIC driver version you might get more queues on each PF with fewer VFs (usually, fewer than 32).First, enable the interface:
    1. ip link set eth1 up

    Next, from the command-line, get the maximum number of functions that could potentially be enabled for your NIC:

    1. cat /sys/class/net/eth1/device/sriov_totalvfs

    Then finally, enable the desired number of virtual functions for your NIC:

    1. echo 31 > /sys/class/net/eth1/device/sriov_numvfs

    NOTE: These settings aren’t saved across reboots. To save them, add them to /etc/rc.local:

    1. ip link set eth1 up
    2. echo "echo 31 > /sys/class/net/eth1/device/sriov_numvfs" >> /etc/rc.local
  4. Check to make sure that SR-IOV is enabled:
    1. # ip link show eth1 |grep vf
    2. vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    3. vf 1 MAC c2:cd:57:9b:6c:7d, spoof checking on, link-state auto
    4. ...

    If you don’t see ‘link-state auto’ in output, then your installation will require an SR-IOV agent.  You can enable it like so:

    1. apt-get install neutron-plugin-sriov-agent
    2. # nohup neutron-sriov-nic-agent --debug --log-file /tmp/sriov_agent --config-file
    3. /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini
  5. Edit /etc/nova/nova.conf:
    1. pci_passthrough_whitelist={"devname": "eth1", "physical_network":"physnet2"}
  6. Edit /etc/neutron/plugins/ml2/ml2_conf_sriov.ini:
    1. [sriov_nic]
    2. physical_device_mappings = physnet2:eth1
  7. Restart the compute service:
    1. # restart nova-compute
  8. Get the vendor’s product id; you’ll need it to configure SR-IOV on the controller nodes.
    NOTE: This is just an example of the output. Actual value may differ on your hardware.
    1. # lspci -nn|grep -e "Ethernet.*Virtual"
    2. 06:10.1 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01)
    3. 06:10.3 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01)
    4. ...

    Write down the vendor’s product id (the value in square brackets).

Configure SR-IOV on the Controller nodes

  1. Edit /etc/neutron/plugins/ml2/ml2_conf.ini; use the vendor’s product id from the previous step as the value for supported_pci_vendor_devs:
    Change the line for mechanism_drivers

    1. mechanism_drivers =openvswitch,l2population,sriovnicswitch

    and add new section at the end of file:

    1. [ml2_sriov]
    2. supported_pci_vendor_devs = 8086:10ed
  2. Edit /etc/nova/nova.conf:
    1. [DEFAULT]
    2. scheduler_default_filters=DifferentHostFilter,RetryFilter,
    3. AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter,
    4. ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,
    5. ServerGroupAffinityFilter,PciPassthroughFilter
  3. Restart services:
    1. restart neutron-server
    2. restart nova-api

Using SR-IOV

Now you’re ready to actually use SR-IOV.

  1. A recommended practice for using SR-IOV is to create a separate host aggregate for SR-IOV enabled computes.

    1. nova aggregate-create sriov
    2. nova aggregate-set-metadata sriov sriov=true
    3. nova aggregate-create normal
    4. nova aggregate-set-metadata normal sriov=false

    … and add some hosts to them:

    1. nova aggregate-add-host sriov node-9.domain.tld
    2. nova aggregate-add-host normal node-10.domain.tld
  2. Create a new flavor for VMs that require SR-IOV support:
    1. nova flavor-create m1.small.sriov auto 2048 20 2
    2. nova flavor-key m1.small.sriov set aggregate_instance_extra_specs:sriov=true

    You should update all other flavours so they will start only on hosts without SR-IOV support:

    1. openstack flavor list -f csv|grep -v sriov|cut -f1 -d,| tail -n +2|
    2. xargs -I% -n 1 nova flavor-key %
    3. set aggregate_instance_extra_specs:sriov=false

    To use the SR-IOV port you need to create an instance with ports that use the vnic-type “direct”. For now, you’ll need to do this via the command line. Because the default Cirros image does not have the Intel NIC drivers included, we’ll use an Ubuntu cloud image to test SR-IOV.

  3. Prepare the ubuntu cloud image:
    1. # glance image-create --name trusty --disk-format raw --container-format bare
    2. --is-public True
    3. --location https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img

    You can only login to this instance by using an ssh public key, so let’s go ahead and create a keypair. You can do this from the Horizon interface, but we’ll do it from the command-line, like so:

    1. # nova keypair-add key1 > key1.pem
    1. # chmod 600 key1.pem
  4. Create a port for the instance:
    1. # neutron port-create net04 --binding:vnic-type direct --device_owner nova-compute --name sriov-port1
  5. Spawn the instance:
    1. # port_id=`neutron port-list | grep sriov-port1 | awk ‘{print $2}’`
    2. # nova boot --flavor m1.small --image trusty --key_name key1
    3. --nic port-id=$port_id sriov-vm1
  6. Get the node’s ip address:
    1. # nova list | grep sriov-vm1 | awk '{print $12}'
    2. net04=192.168.111.5
  7. Connect to the instance to check if everything up and running:
    Find controllers with namespace which has access to instance:
    1. # dhcp-agent-list-hosting-net net04
    2. # neutron dhcp-agent-list-hosting-net -f csv -c host net04 --quote none | tail -n+2
    3. node-7.domain.tld
    4. node-9.domain.tld

    Connect to the instance (this command should be run on one of the controllers which we found in previous step):

    1. # ip netns exec `ip netns show|grep qdhcp-$(neutron net-list | grep 'net04 ' | awk '{print$2}')` ssh -i key1.pem ubuntu@192.168.111.5

    And that should be it!

Troubleshooting

Sometimes something goes wrong. Here are some common problems and solutions.

  • If you see errors in /var/log/nova/nova-compute.log on the compute host:

    1. libvirtError: internal error: missing IFLA_VF_INFO in netlink response

    … you should install a newer version of libnl3, as shown above.

  • If you see:
    1. libvirtError: unsupported configuration: host doesn't support passthrough of host PCI devices

    … in /var/log/nova/nova-compute.log, it means that VT-d is not supported or not enabled.

  • If you see:
    1. NovaException: Unexpected vif_type=binding_failed

    You should enable the SR-IOV agent, or if you’ve already done so, check that it’s running:

    1. # neutron agent-list | grep sriov-nic-agent
    2. | dfa4edcf-63c1-4af7-a291-ec139a16f346 | NIC Switch agent | node-16.domain.tld | :-) | True | neutron-sriov-nic-agent |

    Otherwise, examine the log file /tmp/sriov_agent for clues to what else might be wrong.

Conclusion

For now, configuring Mirantis OpenStack for SR-IOV is still relatively complex, thus potentially challenging to do on large clusters and prone to error. During the Mikata cycle, we’ll be making improvements to current configurations, doing deeper testing, and working on automating configuration and deployment of SR-IOV via Fuel.

http://dev-vpierre-plugindev.pantheon.io/carrier-grade-mirantis-openstack-the-mirantis-nfv-initiative-part-1-single-root-io-virtualization-sr-iov/

Carrier-Grade Mirantis OpenStack (the Mirantis NFV Initiative), Part 1: Single Root I/O Virtualization (SR-IOV)的更多相关文章

  1. OpenStack for NFV applications: enabling Single Root I/O virtualization and PCI-Passthrough

    http://superuser.openstack.org/articles/openstack-for-nfv-applications-enabling-single-root-i-o-virt ...

  2. RedHat 和 Mirantis OpenStack 产品的版本和功能汇总和对比(持续更新)

    Mirantis 和 Red Hat 作为 OpenStack 商业化产品领域的两大领军企业,在行业内有重要的地位.因此,研究其产品版本发布周期和所支持的功能,对制定 OpenStack 产品的版本和 ...

  3. Mirantis OpenStack 8.0 版本大概性分析

    作为 OpenStack 领域标杆性企业之一的 Mirantis 在2016年3月初发布了最新的 MOS 8.0 版本.本文试着基于公开资料进行一些归纳分析. 1. 版本概况 1.1 概况 社区版本: ...

  4. Mirantis OpenStack 8.0 版本

    作为 OpenStack 领域标杆性企业之一的 Mirantis 在2016年3月初发布了最新的 MOS 8.0 版本.本文试着基于公开资料进行一些归纳分析. 1. 版本概况 1.1 概况 社区版本: ...

  5. Mirantis OpenStack 7.0: NFVI Deployment Guide — NUMA/CPU pinning

    https://www.mirantis.com/blog/mirantis-openstack-7-0-nfvi-deployment-guide-numacpu-pinning/ Compute ...

  6. Mirantis OpenStack HA

    Mysql使用Galera做Active/Active集群,同时使用Pacemaker,因为Galera mysql用到了领导机选举机制quorum,所以控制节点至少三个 RabbitMQ使用mirr ...

  7. 开源NFV管理器 - OpenStack Tacker介绍 NFV和Tacker介绍和主要功能

    原文链接:https://blog.csdn.net/bc_vnetwork/article/details/51463518 1.NFV概述 NFV(网络功能虚拟化Network Function ...

  8. NFV实验平台

    NFV架构如下图所示. NFVI对应于数据平面,数据平面转发数据并提供用于运行网络服务的资源. MANO对应于控制平面,该控制平面负责构建各种VNF之间的连接以及编排NFVI中的资源. VNF层对应于 ...

  9. openstack系列文章(一)

    学习openstack的系列文章-虚拟化 虚拟化 KVM CPU 虚拟化 KVM 内存虚拟化 全虚拟化 I/O 设备 半虚拟化 I/O 设备 I/O PCI PCIe 设备直接分配 SR-IOV 在 ...

随机推荐

  1. android菜鸟学习笔记29----Android应用向用户发送提示信息的方式总结

    常见的向用户发送提示信息的方式有3种,分别为: 1)发送Toast信息 2)弹出对话框 3)发送通知 总结如下: 方式1:发送Toast信息: 这种方式最简单,在之前的学习中多次使用过.Toast是在 ...

  2. Redis3.2.5配置主从服务器遇到的一些错误

    注意:关闭主从服务器的防火墙 问题一: WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net ...

  3. Oracle学习笔记—connect、resource和dba三种权限(转载)

    转载自: connect.resource和dba三种标准角色: 授权语句: grant connect ,resource,dba to user with admin option; (注意:其中 ...

  4. Activiti 5.16 流程图高亮追踪 中文乱码问题解决方法

    最近研究activiti的高亮流程图,发现中文是乱码,为了让大家少走弯路共享出来. 本文包含三个主要技术点: 1.spring MVC架构下输出动态图片 2.获得activiti流程图的stream流 ...

  5. Python 1 的数据类型

    Python3 中有六个标准的数据类型: Number(数字)String(字符串)List(列表)Tuple(元组)Sets(集合)Dictionary(字典) 1.Number(数字) pytho ...

  6. SpringBoot Redis工具类封装

    1.接口类 package com.sze.redis.util; import java.util.List; import java.util.Set; import java.util.conc ...

  7. Oracle 9i Unix Manager

    在Unix上被迫终止ORACLE进程时,必须做以下事情: (1) 杀掉所有Oracle进程.    ps -ef|grep $ORACLE_SID|grep -v grep|awk '{print $ ...

  8. 【转载】格式化存储装置成为 Ext2/Ext3/Ext4 档案系统

    格式化 用系统管理员帐户 (即 root) 身份打「mkfs -t ext2|ext3|ext4 储存装置」: mkfs -t ext3 /dev/sdb5 要格式化档案系统为 Ext2,亦可以直接使 ...

  9. [POI2008]账本BBB

    题目 BZOJ 做法 明确: \(~~~1.\)为了达到目标分数所取反的次数是固定的 \(~~~2.\)为了满足前缀非负,得增加取反和滚动次数 滚动的次数可以枚举,增加的取反可以通过最小前缀和得到 滚 ...

  10. 外网IP地址API

    新浪的IP地址查询接口:http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=js 新浪多地域测试方法:http://int.dpool. ...