理解问题

出租车的车费不仅与距离有关，还涉及乘客数量，是否使用信用卡等因素(这是的出租车是指纽约市的)。所以并不是一个简单的一元方程问题。

准备数据

建立一控制台应用程序工程，新建Data文件夹，在其目录下添加taxi-fare-train.csv与taxi-fare-test.csv文件，不要忘了把它们的Copy to Output Directory属性改为Copy if newer。之后，添加Microsoft.ML类库包。

加载数据

新建MLContext对象，及创建TextLoader对象。TextLoader对象可用于从文件中读取数据。

MLContext mlContext = new MLContext(seed: 0);

_textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()

{

    Separator = ",",

    HasHeader = true,

    Column = new[]

    {

        new TextLoader.Column("VendorId", DataKind.Text, 0),

        new TextLoader.Column("RateCode", DataKind.Text, 1),

        new TextLoader.Column("PassengerCount", DataKind.R4, 2),

        new TextLoader.Column("TripTime", DataKind.R4, 3),

        new TextLoader.Column("TripDistance", DataKind.R4, 4),

        new TextLoader.Column("PaymentType", DataKind.Text, 5),

        new TextLoader.Column("FareAmount", DataKind.R4, 6)

    }

});

提取特征

数据集文件里共有七列，前六列做为特征数据，最后一列是标记数据。

public class TaxiTrip

{

    [Column("0")]

    public string VendorId;

    [Column("1")]

    public string RateCode;

    [Column("2")]

    public float PassengerCount;

    [Column("3")]

    public float TripTime;

    [Column("4")]

    public float TripDistance;

    [Column("5")]

    public string PaymentType;

    [Column("6")]

    public float FareAmount;

}

public class TaxiTripFarePrediction

{

    [ColumnName("Score")]

    public float FareAmount;

}

训练模型

首先读取训练数据集，其次建立管道。管道中第一步是把FareAmount列复制到Label列，做为标记数据。第二步，通过OneHotEncoding方式将VendorId，RateCode，PaymentType三个字符串类型列转换成数值类型列。第三步，合并六个数据列为一个特征数据列。最后一步，选择FastTreeRegressionTrainer算法做为训练方法。

完成管道后，开始训练模型。

IDataView dataView = _textLoader.Read(dataPath);

var pipeline = mlContext.Transforms.CopyColumns("FareAmount", "Label")

    .Append(mlContext.Transforms.Categorical.OneHotEncoding("VendorId"))

    .Append(mlContext.Transforms.Categorical.OneHotEncoding("RateCode"))

    .Append(mlContext.Transforms.Categorical.OneHotEncoding("PaymentType"))

    .Append(mlContext.Transforms.Concatenate("Features", "VendorId", "RateCode", "PassengerCount", "TripTime", "TripDistance", "PaymentType"))

    .Append(mlContext.Regression.Trainers.FastTree());

var model = pipeline.Fit(dataView);

评估模型

这里要使用测试数据集，并用回归问题的Evaluate方法进行评估。

IDataView dataView = _textLoader.Read(_testDataPath);

var predictions = model.Transform(dataView);

var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

Console.WriteLine();

Console.WriteLine($"*************************************************");

Console.WriteLine($"*       Model quality metrics evaluation         ");

Console.WriteLine($"*------------------------------------------------");

Console.WriteLine($"*       R2 Score:      {metrics.RSquared:0.##}");

Console.WriteLine($"*       RMS loss:      {metrics.Rms:#.##}");

保存模型

完成训练的模型可以被保存为zip文件以备之后使用。

using (var fileStream = new FileStream(_modelPath, FileMode.Create, FileAccess.Write, FileShare.Write))

    mlContext.Model.Save(model, fileStream);

使用模型

首先加载已经保存的模型。接着建立预测函数对象，TaxiTrip为函数的输入类型，TaxiTripFarePrediction为输出类型。之后执行预测方法，传入待测数据。

ITransformer loadedModel;

using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))

{

    loadedModel = mlContext.Model.Load(stream);

}

var predictionFunction = loadedModel.MakePredictionFunction<TaxiTrip, TaxiTripFarePrediction>(mlContext);

var taxiTripSample = new TaxiTrip()

{

    VendorId = "VTS",

    RateCode = "1",

    PassengerCount = 1,

    TripTime = 1140,

    TripDistance = 3.75f,

    PaymentType = "CRD",

    FareAmount = 0 // To predict. Actual/Observed = 15.5

};

var prediction = predictionFunction.Predict(taxiTripSample);

Console.WriteLine($"**********************************************************************");

Console.WriteLine($"Predicted fare: {prediction.FareAmount:0.####}, actual fare: 15.5");

Console.WriteLine($"**********************************************************************");

完整示例代码

using Microsoft.ML;

using Microsoft.ML.Core.Data;

using Microsoft.ML.Runtime.Data;

using System;

using System.IO;

namespace TexiFarePredictor

{

    class Program

    {

        static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");

        static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");

        static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

        static TextLoader _textLoader;

        static void Main(string[] args)

        {

            MLContext mlContext = new MLContext(seed: 0);

            _textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()

            {

                Separator = ",",

                HasHeader = true,

                Column = new[]

                {

                    new TextLoader.Column("VendorId", DataKind.Text, 0),

                    new TextLoader.Column("RateCode", DataKind.Text, 1),

                    new TextLoader.Column("PassengerCount", DataKind.R4, 2),

                    new TextLoader.Column("TripTime", DataKind.R4, 3),

                    new TextLoader.Column("TripDistance", DataKind.R4, 4),

                    new TextLoader.Column("PaymentType", DataKind.Text, 5),

                    new TextLoader.Column("FareAmount", DataKind.R4, 6)

                }

            });

            var model = Train(mlContext, _trainDataPath);

            Evaluate(mlContext, model);

            TestSinglePrediction(mlContext);

            Console.Read();

        }

        public static ITransformer Train(MLContext mlContext, string dataPath)

        {

            IDataView dataView = _textLoader.Read(dataPath);

            var pipeline = mlContext.Transforms.CopyColumns("FareAmount", "Label")

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("VendorId"))

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("RateCode"))

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("PaymentType"))

                .Append(mlContext.Transforms.Concatenate("Features", "VendorId", "RateCode", "PassengerCount", "TripTime", "TripDistance", "PaymentType"))

                .Append(mlContext.Regression.Trainers.FastTree());

            var model = pipeline.Fit(dataView);

            SaveModelAsFile(mlContext, model);

            return model;

        }

        private static void SaveModelAsFile(MLContext mlContext, ITransformer model)

        {

            using (var fileStream = new FileStream(_modelPath, FileMode.Create, FileAccess.Write, FileShare.Write))

                mlContext.Model.Save(model, fileStream);

        }

        private static void Evaluate(MLContext mlContext, ITransformer model)

        {

            IDataView dataView = _textLoader.Read(_testDataPath);

            var predictions = model.Transform(dataView);

            var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

            Console.WriteLine();

            Console.WriteLine($"*************************************************");

            Console.WriteLine($"*       Model quality metrics evaluation         ");

            Console.WriteLine($"*------------------------------------------------");

            Console.WriteLine($"*       R2 Score:      {metrics.RSquared:0.##}");

            Console.WriteLine($"*       RMS loss:      {metrics.Rms:#.##}");

        }

        private static void TestSinglePrediction(MLContext mlContext)

        {

            ITransformer loadedModel;

            using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))

            {

                loadedModel = mlContext.Model.Load(stream);

            }

            var predictionFunction = loadedModel.MakePredictionFunction<TaxiTrip, TaxiTripFarePrediction>(mlContext);

            var taxiTripSample = new TaxiTrip()

            {

                VendorId = "VTS",

                RateCode = "1",

                PassengerCount = 1,

                TripTime = 1140,

                TripDistance = 3.75f,

                PaymentType = "CRD",

                FareAmount = 0 // To predict. Actual/Observed = 15.5

            };

            var prediction = predictionFunction.Predict(taxiTripSample);

            Console.WriteLine($"**********************************************************************");

            Console.WriteLine($"Predicted fare: {prediction.FareAmount:0.####}, actual fare: 15.5");

            Console.WriteLine($"**********************************************************************");

        }

    }

}

程序运行后显示的结果：

*************************************************

*       Model quality metrics evaluation

*------------------------------------------------

*       R2 Score:      0.92

*       RMS loss:      2.81

**********************************************************************

Predicted fare: 15.7855, actual fare: 15.5

**********************************************************************

最后的预测结果还是比较符合实际数值的。

ML.NET教程之出租车车费预测(回归问题)的更多相关文章

ML.NET教程之情感分析(二元分类问题)
机器学习的工作流程分为以下几个步骤: 理解问题准备数据加载数据提取特征构建与训练训练模型评估模型运行使用模型理解问题本教程需要解决的问题是根据网站内评论的意见采取合适的行动. 可用 ...
ML 05、分类、标注与回归
机器学习算法原理.实现与实践 —— 分类.标注与回归 1. 分类问题分类问题是监督学习的一个核心问题.在监督学习中,当输出变量$Y$取有限个离散值时,预测问题便成为分类问题. 监督学习从数据中学习 ...
学习ML.NET(2): 使用模型进行预测
训练模型在上一篇文章中,我们已经通过LearningPipeline训练好了一个“鸢尾花瓣预测”模型, var model = pipeline.Train<IrisData, IrisPre ...
ML.NET教程之客户细分(聚类问题)
理解问题客户细分需要解决的问题是按照客户之间的相似特征区分不同客户群体.这个问题的先决条件中没有可供使用的客户分类列表,只有客户的人物画像. 数据集已有的数据是公司的历史商业活动记录以及客户的购买 ...
教程 | Kaggle网站流量预测任务第一名解决方案：从模型到代码详解时序预测
https://mp.weixin.qq.com/s/JwRXBNmXBaQM2GK6BDRqMw 选自GitHub 作者:Artur Suilin 机器之心编译参与:蒋思源.路雪.黄小天近日,A ...
ML.NET 示例：回归之价格预测
写在前面准备近期将微软的machinelearning-samples翻译成中文,水平有限,如有错漏,请大家多多指正. 如果有朋友对此感兴趣,可以加入我:https://github.com/fei ...
.NET开发人员如何开始使用ML.NET
随着谷歌,Facebook发布他们的工具机器学习工具Tensorflow 2和PyTorch ,微软的CNTK 2.7之后不再继续更新(https://docs.microsoft.com/zh-cn ...
ML.NET Model Builder 更新
ML.NET是面向.NET开发人员的跨平台机器学习框架,而Model Builder是Visual Studio中的UI工具,它使用自动机器学习(AutoML)轻松地允许您训练和使用自定义ML.NET ...
ML.NET相关资源整理
在人工智能领域,无论是机器学习,还是深度学习等,Python编程语言都是绝对的主流,尽管底层都是C++实现的,似乎人工智能和C#/F#编程语言没什么关系.在人工智能的工程实现,通常都是将Pytho ...

随机推荐

安装nginx和添加ssl证书
一. 准备: 1. 需要有一台centos的服务器 2. 域名解析到服务器 3. 域名的nginx证书二. 安装Nginx(输入下面的指令后:可访问实验机器外网 HTTP 服务http://118. ...
IntelliJ IDEA的配置优化
IntelliJ IDEA的配置优化我们安装完IntelliJ IDEA之后,在弹出的欢迎页面下方点击Configure,选择Setting,打开以下界面,我们在这个界面中进行配置. Appeara ...
第三部分：Android 应用程序接口指南---第二节：UI---第十二章自定义组件
第12章自定义组件 Android平台提供了一套完备的.功能强大的组件化模型用于搭建用户界面,这套组件化模型以View和 ViewGroup这两个基础布局类为基础.平台本身已预先实现了多种用于构建界 ...
adb命令使用总结
1.启动/停止启动 adb server 命令: adb start-server (一般无需手动执行此命令,在运行 adb 命令时若发现 adb server 没有启动会自动调起.) 停止 adb ...
【Spark 深入学习 01】 Spark是什么鬼？
经过一段时间的学习和测试,是时候给spark的学习经历做一个总结了,对于spark的了解相对晚了写.春节期间(预计是无大事),本博准备推出20篇左右spark系列原创文章(先把牛吹出去再说) ,尽量将 ...
CAP原理中的一致性
CAP原理指的是,这三个要素最多只能同时实现两点,不可能三者兼顾.因此在进行分布式架构设计时,必须做出取舍.而对于分布式数据系统,分区容忍性是基本要求,否则就失去了价值.因此设计分布式数据系统,就是在 ...
centos7系统下安装php-fpm并配置nginx支持并开启网站gzip压缩
注:此处不介绍nginx的安装.以下教程默认已安装nginx. 1. yum install -y php-fpm yum install php-pdo yum install php-mysql ...
RedisLive监控工具 windows部署笔记
1. Python2.7环境安装 Path环境变量中添加 2.下载安装 VC Compiler for Python 地址: http://www.microsoft.com/en-us/dow ...
Excel文档间的数据替换 ---电脑版APP 自动操作魔法师
http://www.won-soft.com/macro/solution/excel-data-replace.htm 介绍: 在我们的日常工作中, 可能经常需要使用同各种数据表格打交道．比如财务 ...
［转］搞个这样的 APP 要多久
我有些尴尬地拿着水杯,正对面坐着来访的王总,他是在别处打拼的人,这几年据说收获颇丰,见移动互联网如火如荼,自然也想着要进来干一场,尽管王总从事的行当也算跟IT沾边,但毕竟太长时间不接触技术,有些东西不 ...

ML.NET教程之出租车车费预测(回归问题)