iOS Core ML与Vision初识

代码地址如下：
http://www.demodashi.com/demo/11715.html

教之道贵以专昔孟母择邻处子不学断机杼

随着苹果新品iPhone x的发布，正式版iOS 11也就马上要推送过来了，在正式版本到来之前比较好奇，于是就去下载了个Beat版本刷了下，感觉还不错。 WWDC 2017推出了机器学习框架和ARKit两个比较有意思的东西，本想先来学习学习AR，无奈手机刚好不在版本中.....真受伤，只好来学习学习机器学习了，下面进入正题吧。

先看看大概效果吧

什么是机器学习？

在Core ML出现之前，机器学习应该还是比较难学的，然而这一出现，直接大大降低了学习的门槛，可见苹果在这方面花的精力还是不少。那么机器学习到底是什么呢？简单来说，就是用大量的数据去采集物体的特性特征，将其装入模型，当我们用的时候，可以通过查询模型，来快速区别出当前物体属于什么类，有什么特性等等。而Core ML实际做的事情就是使用事先训练好的模型，在使用时，对相关模块进行预测，最终返回结果，这种在本地进行预测的方式可以不依赖网络，也可以降低处理时间。可以这么说，Core ML 让我们更容易在 App中使用训练过的模型，而Vision 让我们轻松访问苹果的模型，用于面部检测、面部特征点、文字、矩形、条形码和物体。

Core ML 和 Vision使用

在使用之前，你必须要保证你的环境是在xcode 9.0 + iOS 11，然后你可以去官网下载Core ML模型，目前已经有6种模型了，分别如下

从其介绍我们可以看出分别的功能

MobileNet :大意是从一组1000个类别中检测出图像中的占主导地位的物体，如树、动物、食物、车辆、人等等。

SqueezeNet ：同上

Places205-GoogLeNet:大意是从205个类别中检测到图像的场景，如机场终端、卧室、森林、海岸等。

ResNet50 ：大意是从一组1000个类别中检测出图像中的占主导地位的物体，如树、动物、食物、车辆、人等等

Inception v3:同上

VGG16 ：同上

当然这都是苹果提供的模型，如果你有自己的模型的话，可以通过工具将其转换，参考文档

在了解上面的模型功能后，我们可以选择性的对其进行下载，目前我这里下载了四种模型

将下载好的模型，直接拖入工程中，这里需要注意的问题是，需要检查下

这个位置是否有该模型，我不知道是不是我这个xcode版本的bug，当我拖入的时候，后面并没有，这个时候就需要手动进行添加一次，在这之后，我们还需要检查下模型类是否生成，点击你需要用的模型，然后查看下面位置是否有箭头

当这个位置箭头生成好后，我们就可以进行代码的编写了

代码部分

在写代码之前，我们还需要了解一些东西，那就是模型生成的类中都有什么方法，这里我们就以Resnet50为类，在ViewController中导入头文件#import "Resnet50.h"，当我们在输入Res的时候，就会自动补全，导入其它模型的时候，也可以这么来模仿。在进入Resnet50头文件中，我们可以看到其中分为三个类，分别为：Resnet50Input、Resnet50Output、Resnet50，看其意思也能猜到，分别为输入、输出、和主要使用类。

在Resnet50中，我们可以看到三个方法，分别如下:

- (nullable instancetype)initWithContentsOfURL:(NSURL *)url error:(NSError * _Nullable * _Nullable)error;

/**

    Make a prediction using the standard interface

    @param input an instance of Resnet50Input to predict from

    @param error If an error occurs, upon return contains an NSError object that describes the problem. If you are not interested in possible errors, pass in NULL.

    @return the prediction as Resnet50Output

*/

- (nullable Resnet50Output *)predictionFromFeatures:(Resnet50Input *)input error:(NSError * _Nullable * _Nullable)error;

/**

    Make a prediction using the convenience interface

    @param image Input image of scene to be classified as color (kCVPixelFormatType_32BGRA) image buffer, 224 pixels wide by 224 pixels high:

    @param error If an error occurs, upon return contains an NSError object that describes the problem. If you are not interested in possible errors, pass in NULL.

    @return the prediction as Resnet50Output

*/

- (nullable Resnet50Output *)predictionFromImage:(CVPixelBufferRef)image error:(NSError * _Nullable * _Nullable)error;

第一个应该是初始化方法，后面两个应该是输出对象的方法，看到这里，不由的马上开始动手了。都说心急吃不了热豆腐，果然是这样，后面遇到一堆堆坑，容我慢慢道来。

一开始我的初始化方法是这样的

    Resnet50* resnet50 = [[Resnet50 alloc] initWithContentsOfURL:[NSURL fileURLWithPath:[[NSBundle mainBundle] pathForResource:@"Resnet50" ofType:@"mlmodel"]] error:nil];

咋一看，恩，应该是相当的perfect，然而现实是残酷的，出意外的崩溃了...

日志如下

 Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[NSURL initFileURLWithPath:]: nil string parameter'

为了找准位置，我决定打个全局断点，信心倍增的开始下一次运行，然而还是一样的效果，气的我，果断直接只写了下面的初始化方法

Resnet50* resnet50 = [[Resnet50 alloc] init];

这次没有崩溃，而是直接进入了下面的图

偶然的机会，见识到了Resnet50内部的实现方法，首先映入眼帘的是mlmodelc这个类型....想必大家也明白了吧！但是咋就进入了这个地方了？幸运的是让断点继续执行两次就ok了，于是我大胆猜想，是不是断点引起的，马上取消断点，重新Run，耶，果然正确，一切顺利进行中...此时的我是泪崩的。

这一系列经过说明：

1、模型的后缀为mlmodelc

2、调试的时候可以取消断点，方便调试，省的点来点去，当然如果想看看内部实现，可以加上断点

在这里调通后，就是下一步输出的问题了，上面也看到了有两个方法，一个是根据Resnet50Input 一个是根据CVPixelBufferRef ，而在Resnet50Input中又有这么一个初始化方法

- (instancetype)initWithImage:(CVPixelBufferRef)image;

看来这个CVPixelBufferRef是必不可少的了

关于这个，我在网上找了一个方法，方法如下

- (CVPixelBufferRef)pixelBufferFromCGImage:(CGImageRef)image{

    NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:

                             [NSNumber numberWithBool:YES], kCVPixelBufferCGImageCompatibilityKey,

                             [NSNumber numberWithBool:YES], kCVPixelBufferCGBitmapContextCompatibilityKey,

                             nil];

    CVPixelBufferRef pxbuffer = NULL;

    CGFloat frameWidth = CGImageGetWidth(image);

    CGFloat frameHeight = CGImageGetHeight(image);

    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault,

                                          frameWidth,

                                          frameHeight,

                                          kCVPixelFormatType_32ARGB,

                                          (__bridge CFDictionaryRef) options,

                                          &pxbuffer);

    NSParameterAssert(status == kCVReturnSuccess && pxbuffer != NULL);

    CVPixelBufferLockBaseAddress(pxbuffer, 0);

    void *pxdata = CVPixelBufferGetBaseAddress(pxbuffer);

    NSParameterAssert(pxdata != NULL);

    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();

    CGContextRef context = CGBitmapContextCreate(pxdata,

                                                 frameWidth,

                                                 frameHeight,

                                                 8,

                                                 CVPixelBufferGetBytesPerRow(pxbuffer),

                                                 rgbColorSpace,

                                                 (CGBitmapInfo)kCGImageAlphaNoneSkipFirst);

    NSParameterAssert(context);

    CGContextConcatCTM(context, CGAffineTransformIdentity);

    CGContextDrawImage(context, CGRectMake(0,

                                           0,

                                           frameWidth,

                                           frameHeight),

                       image);

    CGColorSpaceRelease(rgbColorSpace);

    CGContextRelease(context);

    CVPixelBufferUnlockBaseAddress(pxbuffer, 0);

    return pxbuffer;

}

在这个方法写完之后，我将之前的方法进行了完善，得到下面的代码

- (NSString*)predictionWithResnet50:(CVPixelBufferRef )buffer

{

    Resnet50* resnet50 = [[Resnet50 alloc] init];

    NSError *predictionError = nil;

    Resnet50Output *resnet50Output = [resnet50 predictionFromImage:buffer error:&predictionError];

    if (predictionError) {

        return predictionError.description;

    } else {

        return [NSString stringWithFormat:@"识别结果:%@,匹配率:%.2f",resnet50Output.classLabel, [[resnet50Output.classLabelProbs valueForKey:resnet50Output.classLabel]floatValue]];

    }

}

怀着激动的心情，添加了imageview和lable，和下面的代码

    CGImageRef cgImageRef = [imageview.image CGImage];

    lable.text = [self predictionWithResnet50:[self pixelBufferFromCGImage:cgImageRef]];

Run...

看到这个结果，失落的半天不想说话，幸好有日志，仔细看日志，你会发现，好像是图片的大小不对...提示说是要224，好吧，那就改改尺寸看看

- (UIImage *)scaleToSize:(CGSize)size image:(UIImage *)image {

    UIGraphicsBeginImageContext(size);

    [image drawInRect:CGRectMake(0, 0, size.width, size.height)];

    UIImage* scaledImage = UIGraphicsGetImageFromCurrentImageContext();

    UIGraphicsEndImageContext();

    return scaledImage;

}

    UIImage *scaledImage = [self scaleToSize:CGSizeMake(224, 224) image:imageview.image];

    CGImageRef cgImageRef = [scaledImage CGImage];

    lable.text = [self predictionWithResnet50:[self pixelBufferFromCGImage:cgImageRef]];

再Run...

终于成功了!!!，至于结果嘛，还可以算满意，毕竟狼王加内特就是打篮球的，哈哈。

后面我又尝试了其它类，我原以为尺寸都是224，然而在Inceptionv3的时候，提示我是要用229，于是我就仔细查看了下类代码，发现其中已经有这方面的说明....

/// Input image to be classified as color (kCVPixelFormatType_32BGRA) image buffer, 299 pixels wide by 299 pixels high

/// Input image of scene to be classified as color (kCVPixelFormatType_32BGRA) image buffer, 224 pixels wide by 224 pixels high

到此突然想到，在上面，我们查看模型的图中，也有说明，就是inputs相关参数那列。

到这里，好像我们还有一个类没有用到，那就是Vision，那么通过Vision又怎么和Core ML来一起实现呢？

Vision使用

@interface VNCoreMLModel : NSObject

- (instancetype) init  NS_UNAVAILABLE;

/*!

	@brief Create a model container to be used with VNCoreMLRequest based on a Core ML model. This can fail if the model is not supported. Examples for a model that is not supported is a model that does not take an image as any of its inputs.

	@param model	The MLModel from CoreML to be used.

	@param	error	Returns the error code and description, if the model is not supported.

 */

+ (nullable instancetype) modelForMLModel:(MLModel*)model error:(NSError**)error;

@end

在上面VNCoreMLModel 类中，我们可以看到其初始化方法之一一个modelForMLModel ，而init 是无效的，在modelForMLModel 中，有MLModel 这么一个对象的参数，而在Core ML模型类中，我们也发现有这么一个属性，看来我们可以通过这个关系将其联系起来。

@interface Resnet50 : NSObject

@property (readonly, nonatomic, nullable) MLModel * model;

在当前类继续往下翻，就能看到类VNCoreMLRequest

@interface VNCoreMLRequest : VNImageBasedRequest

/*!

 @brief The model from CoreML wrapped in a VNCoreMLModel.

 */

@property (readonly, nonatomic, nonnull) VNCoreMLModel *model;

@property (nonatomic)VNImageCropAndScaleOption imageCropAndScaleOption;

/*!

	@brief Create a new request with a model.

	@param model		The VNCoreMLModel to be used.

 */

- (instancetype) initWithModel:(VNCoreMLModel *)model;

/*!

	@brief Create a new request with a model.

	@param model		The VNCoreMLModel to be used.

	@param	completionHandler	The block that is invoked when the request has been performed.

 */

- (instancetype) initWithModel:(VNCoreMLModel *)model completionHandler:(nullable VNRequestCompletionHandler)completionHandler NS_DESIGNATED_INITIALIZER;

- (instancetype) init  NS_UNAVAILABLE;

- (instancetype) initWithCompletionHandler:(nullable VNRequestCompletionHandler)completionHandler NS_UNAVAILABLE;

@end

在其中，我们看到方法initWithModel 和VNCoreMLModel类相关联，于是就有了下面的代码

- (void)predictionWithResnet50WithImage:(CIImage * )image

{

    //两种初始化方法均可

//    Resnet50* resnet50 = [[Resnet50 alloc] initWithContentsOfURL:[NSURL fileURLWithPath:[[NSBundle mainBundle] pathForResource:@"Resnet50" ofType:@"mlmodelc"]] error:nil];

    Resnet50* resnet50 = [[Resnet50 alloc] init];

    NSError *error = nil;

    //创建VNCoreMLModel

    VNCoreMLModel *vnCoreMMModel = [VNCoreMLModel modelForMLModel:resnet50.model error:&error];

    // 创建request

    VNCoreMLRequest *request = [[VNCoreMLRequest alloc] initWithModel:vnCoreMMModel completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {

    }];

}

到这里，好像还差点什么，是的，貌似我们的图片没有关联上来，只好去查找资料，最后发现一个最重要的类，那就是VNImageRequestHandler，在这个类中，我还发现一个非常重要的方法

- (BOOL)performRequests:(NSArray<VNRequest *> *)requests error:(NSError **)error;

瞬间就将VNCoreMLRequest类关联起来了，因为VNCoreMLRequest最终还是继承VNRequest，在相关文档的帮助下，最终有了下面的代码

- (void)predictionWithResnet50WithImage:(CIImage * )image

{

    //两种初始化方法均可

//    Resnet50* resnet50 = [[Resnet50 alloc] initWithContentsOfURL:[NSURL fileURLWithPath:[[NSBundle mainBundle] pathForResource:@"Resnet50" ofType:@"mlmodelc"]] error:nil];

    Resnet50* resnet50 = [[Resnet50 alloc] init];

    NSError *error = nil;

    //创建VNCoreMLModel

    VNCoreMLModel *vnCoreMMModel = [VNCoreMLModel modelForMLModel:resnet50.model error:&error];

    // 创建处理requestHandler

    VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithCIImage:image options:@{}];

    NSLog(@" 打印信息:%@",handler);

    // 创建request

    VNCoreMLRequest *request = [[VNCoreMLRequest alloc] initWithModel:vnCoreMMModel completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {

        CGFloat confidence = 0.0f;

        VNClassificationObservation *tempClassification = nil;

        for (VNClassificationObservation *classification in request.results) {

            if (classification.confidence > confidence) {

                confidence = classification.confidence;

                tempClassification = classification;

            }

        }

        self.descriptionLable.text = [NSString stringWithFormat:@"识别结果:%@,匹配率:%.2f",tempClassification.identifier,tempClassification.confidence];

    }];

    // 发送识别请求

    [handler performRequests:@[request] error:&error];

    if (error) {

        NSLog(@"%@",error.localizedDescription);

    }

}

通过这个方法，我们就可以不用再去考虑图片的大小了，所有的处理和查询Vision 已经帮我们解决了。

到这里为止，还有几个疑问

- (instancetype)initWithCIImage:(CIImage *)image options:(NSDictionary<VNImageOption, id> *)options;

/*!

 @brief initWithCIImage:options:orientation creates a VNImageRequestHandler to be used for performing requests against the image passed in as a CIImage.

 @param image A CIImage containing the image to be used for performing the requests. The content of the image cannot be modified.

 @param orientation The orientation of the image/buffer based on the EXIF specification. For details see kCGImagePropertyOrientation. The value has to be an integer from 1 to 8. This superceeds every other orientation information.

 @param options A dictionary with options specifying auxilary information for the buffer/image like VNImageOptionCameraIntrinsics

 @note:  Request results may not be accurate in simulator due to CI's inability to render certain pixel formats in the simulator

 */

- (instancetype)initWithCIImage:(CIImage *)image orientation:(CGImagePropertyOrientation)orientation options:(NSDictionary<VNImageOption, id> *)options;

就是在VNImageRequestHandler还有许多初始化函数，而且还有些参数，暂时还没去研究，后续研究好了，再来补充。

运行之前

由于demo大小的限制，此demo中的模型已经改成了GoogLeNetPlaces改模型，如果你需要查看其它模型的效果，请去官网进行下载，然后按照上面的步骤进行放入便可。

参考文章

iOS Core ML与Vision初识

代码地址如下：
http://www.demodashi.com/demo/11715.html

注：本文著作权归作者，由demo大师代发，拒绝转载，转载需要作者授权