iOS VideoToolbox硬编H.265（HEVC）H.264（AVC）：1 概述

本文档尝试用Video Toolbox进行H.265（HEVC）硬件编码，视频源为iPhone后置摄像头。去年做完硬解H.264，没做编码，技能上感觉有些缺失。正好刚才发现CMFormatDescription.h中enum : CMVideoCodecType提供了kCMVideoCodecType_HEVC枚举值。所以呢，作死试试 iOS 9.2 硬编HEVC。

结论：不支持开发者使用H.265（HEVC），可以用H.264（AVC）。

1、读取iPhone后置摄像头

提示：iPhone不支持同时打开前后摄像头。因为SoC目前通常只有一个视频通道（Video Channel），当有两个AVCaptureSession先后运行，前一个会自动停止，后一个会继续运行。或者，有人想一个AVCaptureSession添加前后摄像头作为AVCaptureDeviceInput，这样会异常。因为两个办法我都试过。

iOS 8及后续版本，打开摄像头需要用户授权。

1.1、指定摄像头

我使用iPhone 6p当测试机，它有两个摄像头，要指定需使用的摄像头，在此使用后置摄像头当数据源。

AVCaptureDevice *avCaptureDevice;

NSArray *cameras = [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo];

for (AVCaptureDevice *device in cameras) {    

    if (device.position == AVCaptureDevicePositionBack) {        

        avCaptureDevice = device;    

    }

}

若想直接使用后置摄像头，可简化上述代码。

AVCaptureDevice * avCaptureDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];

1.2、打开摄像头

对于捕获摄像头，整个行为都由AVCaptureSession会话类维护，简化了编程复杂度。输入为摄像头，输出为用户需要的通道，如屏幕。

NSError *error = nil;

AVCaptureDeviceInput *videoInput = [AVCaptureDeviceInput deviceInputWithDevice:avCaptureDevice error:&error];

if (!videoInput) {    

    return;

}

AVCaptureSession *avCaptureSession = [[AVCaptureSession alloc] init];

avCaptureSession.sessionPreset = AVCaptureSessionPresetHigh; // sessionPreset为AVCaptureSessionPresetHigh，可不显式指定

[avCaptureSession addInput:videoInput];

配置好输入，现在配置输出，即摄像头的输出数据格式等。由AVCaptureDevice.formats可知当前设备支持的像素格式，对于iPhone 6，就两个默认格式：420f和420v。需要输出32BGRA，则需AVCaptureSession进行配置kCVPixelBufferPixelFormatTypeKey，已测可用值为

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange，即420v
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange，即420f
kCVPixelFormatType_32BGRA，iOS在内部进行YUV至BGRA格式转换

YUV420一般用于标清视频，YUV422用于高清视频，这里的限制让人感到意外。但是，在相同条件下，YUV420计算耗时和传输压力比YUV422都小。

AVCaptureVideoDataOutput *avCaptureVideoDataOutput = [[AVCaptureVideoDataOutput alloc] init];

NSDictionary*settings = @{(__bridge id)kCVPixelBufferPixelFormatTypeKey: @(kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange)};

avCaptureVideoDataOutput.videoSettings = settings;

dispatch_queue_t queue = dispatch_queue_create("com.github.michael-lfx.back_camera_io", NULL);

[avCaptureVideoDataOutput setSampleBufferDelegate:self queue:queue];

[avCaptureSession addOutput:avCaptureVideoDataOutput];

添加预览界面。

AVCaptureVideoPreviewLayer *previewLayer = [AVCaptureVideoPreviewLayer layerWithSession:avCaptureSession];

previewLayer.frame = self.view.bounds;

previewLayer.videoGravity= AVLayerVideoGravityResizeAspectFill;

[self.view.layer addSublayer:previewLayer];

启动会话。

[avCaptureSession startRunning];

启动应用可看到摄像头当前图像。

1.3、从回调中获取摄像头数据

默认情况下，iPhone 6p为30 fps，意味着如下函数每秒调用30次，那么，先简单打印摄像头输出数据的信息。

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {    

    CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);    

    if (CVPixelBufferIsPlanar(pixelBuffer)) {        

        NSLog(@"kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer");    

    }    

    CMVideoFormatDescriptionRef desc = NULL;    

    CMVideoFormatDescriptionCreateForImageBuffer(NULL, pixelBuffer, &desc);    

    CFDictionaryRef extensions = CMFormatDescriptionGetExtensions(desc);    

    NSLog(@"extensions = %@", extensions);

}

结果如下：

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer

extensions = {    

    CVBytesPerRow = 2904;    

    CVImageBufferColorPrimaries = "ITU_R_709_2";    

    CVImageBufferTransferFunction = "ITU_R_709_2";    

    CVImageBufferYCbCrMatrix = "ITU_R_709_2";    

    Version = 2;

}

在我有限的视频基础中，ITU_R_709_2是HD视频的方案，一般用于YUV422，YUV至RGB的转换矩阵和SD视频（一般是ITU_R_601_4）并不相同。

CVPixelBufferGetPixelFormatType()可获取摄像头输出的像素数据格式，和前面指定的格式一致。

在当iPhone 6上运行且将sessionPreset设置为AVCaptureSessionPreset640x480，得到如下输出结果。

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer

extensions = {    

    CVBytesPerRow = 964;    

    CVImageBufferColorPrimaries = "ITU_R_709_2";    

    CVImageBufferTransferFunction = "ITU_R_709_2";    

    CVImageBufferYCbCrMatrix = "ITU_R_601_4";    

    Version = 2;

}

分析一下CVBytesPerRow。CVBytesPerRow值964与CVPixelBufferGetBytesPerRow函数返回值一致。从预置可知，Y平面为640和CVPixelBufferGetWidth、CVPixelBufferGetWidthOfPlane(0)函数返回值一致。

CVPixelBufferGetBytesPerRow文档
The number of bytes per row of the image data. For planar buffers, this function returns a rowBytes value such that bytesPerRow * height covers the entire image, including all planes.

从上述文档可知CVPixelBufferGetBytesPerRow返回Planar缓冲区多个通道的宽度和，在此是Y、UV通道的宽度和：Y + U + V = 640 + (640/2 + 640/2) = 1280。当然，这个计算方式是错的。按YUV420采样规则计算，则一个像素点用8+2+2表示，即是，每个像素点12个位，那么每行图像实际拥有字节数为

640x12/8 = 960

与CVBytesPerRow不等。下面，再用理论公式计算图像的体积。
通过CVPixelBufferGetHeight得到高为480，图像体积为

640x480 + ((640/2) x (480/2)) + ((640/2) x (480/2))

=>640x480x3/2

=>460800

而CVPixelBufferGetDataSize返回462728，显然不相等。就像FFmpeg出于加速读取内存的目的，在AVFrame.data中加入填充数据，导致AVFrame.linesize >= AVFrame.width。那么，CVPixelBuffer是否存在行为呢？

size_t  extraColumnsOnLeft;

size_t extraColumnsOnRight;

size_t extraRowsOnTop;

size_t extraRowsOnBottom;

CVPixelBufferGetExtendedPixels(pixelBuffer,                              

                                &extraColumnsOnLeft,                              

                                &extraColumnsOnRight,                              

                                &extraRowsOnTop,                              

                                &extraRowsOnBottom);

NSLog(@"extra (left, right, top, bottom) = (%ld, %ld, %ld, %ld)",      

    extraColumnsOnLeft,    

    extraColumnsOnRight,    

    extraRowsOnTop,    

    extraRowsOnBottom);

上述代码输出结果都为0，并无拓展像素。此问题留待解决。

2、VideoToolbox HEVC、AVC编码尝试

iOS支持硬编H.264（AVC）的Profile与Level描述在VTCompressionProperties.h，简单总结为：

Baseline

1 - 3
3 - [0, 2]
4 - [0, 2]
5 - [0, 2]
自动Profile、Level

Main

3 - [0, 2]
4 - [0, 2]
5 - [0, 2]
自动Profile、Level

Extended Main

5 - [0]
自动Profile、Level

High

3 - [0, 2]
4 - [0, 2]
5 - [0, 2]
自动Profile、Level

VideoToolbox编码算法如下：

创建编码会话
准备编码
逐帧编码
结束编码

2.1、创建编码会话

// 获取摄像头输出图像的宽高

size_t width = CVPixelBufferGetWidth(pixelBuffer);

size_t height = CVPixelBufferGetHeight(pixelBuffer);

static VTCompressionSessionRef compressionSession;

OSStatus status =  VTCompressionSessionCreate(NULL,                                              

                                            width, height,                                              

                                            kCMVideoCodecType_H264,                                              

                                            NULL,                                              

                                            NULL,                                              

                                            NULL, &compressionOutputCallback, NULL, &compressionSession);

kCMVideoCodecType_H264改成kCMVideoCodecType_HEVC，在iOS 9.2.1 iPhone 6p、iPhone 6sp执行均返回错误-12908，kVTCouldNotFindVideoEncoderErr，找不到编码器。看来iOS 9.2并不开放HEVC编码器。

编码回调函数定义如下：

static void compressionOutputCallback(void * CM_NULLABLE outputCallbackRefCon,                                      void * CM_NULLABLE sourceFrameRefCon,                                      

                                        OSStatus status,                                      

                                        VTEncodeInfoFlags infoFlags,                                      

                                        CM_NULLABLE CMSampleBufferRef sampleBuffer ) {    

    if (status != noErr) {        

        NSLog(@"%s with status(%d)", __FUNCTION__, status);        

        return;    

    }    

    if (infoFlags == kVTEncodeInfo_FrameDropped) {        

        NSLog(@"%s with frame dropped.", __FUNCTION__);        

        return;    

    }    

    /* ------ 辅助调试 ------ */    

    CMFormatDescriptionRef fmtDesc = CMSampleBufferGetFormatDescription(sampleBuffer);    CFDictionaryRef extensions = CMFormatDescriptionGetExtensions(fmtDesc);    NSLog(@"extensions = %@", extensions);    

    CMItemCount count = CMSampleBufferGetNumSamples(sampleBuffer);    NSLog(@"samples count = %d", count);    /* ====== 辅助调试 ====== */      

    // 推流或写入文件

}

编码成功时输出如下信息：

extensions = {    

    FormatName = "H.264";    

    SampleDescriptionExtensionAtoms =     {        

        avcC = <014d0028 ffe1000b 274d0028 ab603c01 13f2a001 000428ee 3c30>;    

    };

}

samples count = 1

采样数据为1，并不意味着slice数量为1。目前没找到输出多slice码流（多个I、P Slice）的参数配置。sampleBuffer的详细信息示例如下：

CMSampleBuffer 0x126e9fd80 retainCount: 1 allocator: 0x1a227cb68    

    invalid = NO    

    dataReady = YES    

    makeDataReadyCallback = 0x0    

    makeDataReadyRefcon = 0x0    

    formatDescription = <CMVideoFormatDescription 0x126e9fd50 [0x1a227cb68]> {    

    mediaType:'vide'      

    mediaSubType:'avc1'      

    mediaSpecific: {        

        codecType: 'avc1'        dimensions: 1920 x 1080      

    }      

    extensions: {<CFBasicHash 0x126e9eae0 [0x1a227cb68]>{type = immutable dict, count = 2, entries =>    

    0 : <CFString 0x19dd523e0 [0x1a227cb68]>{contents = "SampleDescriptionExtensionAtoms"} = <CFBasicHash 0x126e9e090 [0x1a227cb68]>{type = immutable dict, count = 1, entries =>    

    2 : <CFString 0x19dd57c20 [0x1a227cb68]>{contents = "avcC"} = <CFData 0x126e9e1b0 [0x1a227cb68]>{length = 26, capacity = 26, bytes = 0x014d0028ffe1000b274d0028ab603c01 ... a001000428ee3c30} }    

    2 : <CFString 0x19dd52440 [0x1a227cb68]>{contents = "FormatName"} = H.264} } }    

    sbufToTrackReadiness = 0x0    

    numSamples = 1    

    sampleTimingArray[1] = {        

        {PTS = {196709596065916/1000000000 = 196709.596}, DTS = {INVALID}, duration = {INVALID}},    

    }    

    sampleSizeArray[1] = {        

        sampleSize = 5707,    

    }    

    sampleAttachmentsArray[1] = {        

        sample 0:            DependsOnOthers = false    

    }    

    dataBuffer = 0x126e9fc50

为方便调试，可将H264文件写入文件，用VLC等工具分析，这是本系列文档第二篇：
iOS VideoToolbox硬编H.265（HEVC）H.264（AVC）：2 H264数据写入文件。

下面介绍avcC的作用。

avcC放入CFDictionaryRef然后传递至CMVideoFormatDescriptionCreate，创建视频格式描述，接着创建解码会话，开始解码。

由此也可发现，VideoToolbox编码输出为avcC格式，而且VideoToolbox也只支持avcC格式的H.264。如果从网络中得到Annex-B格式的H.264数据（一般称作H.264裸流或Elementary Stream），用CMVideoFormatDescriptionCreateFromH264ParameterSets创建视频格式描述更方便，同时解码时需要将Annex-B转换成avcC，这也是WWDC2014 513 "direct access to media encoding and decoding"中说VideoToolbox只支持MP4容器装载的H.264数据的原因，就我所知，当写入MP4时，Annex-B使用的起始码（Start Code）会被写成长度（Length）。这就是VideoToolBox硬解最容易出问题的点，我去年做硬解花了很长时间就是因为不了解H.264相关知识，各种出错。

2.2、准备编码

开始编码前，可配置H.264 Profile、Level、帧间距等设置，它们最终体现在SPS、PPS，指导解码器进行解码操作。

VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel);

// 等等一系列属性

OSStatus status = VTCompressionSessionPrepareToEncodeFrames(compressionSession);

if (status != noErr) {    

    // FAILED.

}

本系列文档第二篇iOS VideoToolbox硬编H.265（HEVC）H.264（AVC）：2 H264数据写入文件进一步解释SPS、PPS。

2.3、逐帧编码

编码前，一般会锁定像素缓冲区基位置，编码完解除。同时，需要指定显示时间戳和持续时间。

if(CVPixelBufferLockBaseAddress(pixelBuffer, 0) != kCVReturnSuccess) {    

    // FAILED.

}  

CMTime presentationTimeStamp = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer);

CMTime duration = CMSampleBufferGetOutputDuration(sampleBuffer);  

status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, NULL, pixelBuffer, NULL);  

CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);

编码不像解码一样可以指定VTDecodeFrameFlags为同步操作，所以编码的回调是异步的。异步虽然提高了代码运行效率，同时带来整理帧序等额外操作，让音频同步编码等操作变复杂。

2.4、结束编码

编码结束时，调用VTCompressionSessionCompleteFrames停止编码并指示编码器如何处理已编码及待编码帧。
接着调用VTCompressionSessionInvalidate结束会话，否则硬件容易异常，需要重启手机。
最后释放VTCompressionSession。

3、讨论

WWDC2014 513 "direct access to media encoding and decoding" 提及了在实时要求不高的场合，编码用MultiPass可得到更好的效果。我并没尝试。