写在前面

准备近期将微软的machinelearning-samples翻译成中文，水平有限，如有错漏，请大家多多指正。

如果有朋友对此感兴趣，可以加入我：https://github.com/feiyun0112/machinelearning-samples.zh-cn

出租车费预测

ML.NET 版本	API 类型	状态	应用程序类型	数据类型	场景	机器学习任务	算法
v0.7	动态 API	最新版本	控制台应用程序	.csv 文件	价格预测	回归	Sdca 回归

在这个介绍性示例中，您将看到如何使用ML.NET预测出租车费。在机器学习领域，这种类型的预测被称为回归

问题

这个问题主要集中在预测纽约出租车的行程费用。从表面看，它似乎仅仅取决于行程的距离。但是，由于其他因素（比如额外的乘客或使用信用卡而非现金付款），纽约的出租车供应商收费不同。这种预测可用于出租车供应商向用户和司机提供乘车费用的估计。

为了解决这个问题，我们将使用下列输入建立一个ML模型:

供应商ID
费率代码
乘客数量
出行时间
出行距离
支付方式

并预测乘车的费用。

ML 任务 - 回归

回归的广义问题是预测给定参数的某些连续值，例如：

根据房间的数量、位置、建造年份等预测房子的价格。
根据燃油类型和汽车参数预测汽车燃油消耗量。
预测基于问题属性来修复问题的时间估计。

所有这些示例的共同特征是我们想要预测的参数可以取特定范围内的任何数值。换句话说，这个值用integer或float/double表示，而不是由enum或boolean类型表示。

解决方案

为了解决这个问题，首先我们将建立一个ML模型。然后，我们将在现有数据的基础上训练模型，评估其有多好，最后我们将使用该模型来预测出租车费。

1. 建立模型

建立模型包括：上传数据（使用TextLoader加载taxi-fare-train.csv），对数据进行转换，以便ML算法（本例中为“StochasticDualCoordinateAscent”）能够有效地使用它：

//Create ML Context with seed for repeteable/deterministic results

MLContext mlContext = new MLContext(seed: 0);

// STEP 1: Common data loading configuration

TextLoader textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()

                                {

                                    Separator = ",",

                                    HasHeader = true,

                                    Column = new[]

                                                {

                                                    new TextLoader.Column("VendorId", DataKind.Text, 0),

                                                    new TextLoader.Column("RateCode", DataKind.Text, 1),

                                                    new TextLoader.Column("PassengerCount", DataKind.R4, 2),

                                                    new TextLoader.Column("TripTime", DataKind.R4, 3),

                                                    new TextLoader.Column("TripDistance", DataKind.R4, 4),

                                                    new TextLoader.Column("PaymentType", DataKind.Text, 5),

                                                    new TextLoader.Column("FareAmount", DataKind.R4, 6)

                                                }

                                });

IDataView baseTrainingDataView = textLoader.Read(TrainDataPath);

IDataView testDataView = textLoader.Read(TestDataPath);

//Sample code of removing extreme data like "outliers" for FareAmounts higher than $150 and lower than $1 which can be error-data

var cnt = baseTrainingDataView.GetColumn<float>(mlContext, "FareAmount").Count();

IDataView trainingDataView = mlContext.Data.FilterByColumn(baseTrainingDataView, "FareAmount", lowerBound: 1, upperBound: 150);

var cnt2 = trainingDataView.GetColumn<float>(mlContext, "FareAmount").Count();

// STEP 2: Common data process configuration with pipeline data transformations

var dataProcessPipeline = mlContext.Transforms.CopyColumns("FareAmount", "Label")

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("VendorId", "VendorIdEncoded"))

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("RateCode", "RateCodeEncoded"))

                .Append(mlContext.Transforms.Categorical.OneHotEncoding("PaymentType", "PaymentTypeEncoded"))

                .Append(mlContext.Transforms.Normalize(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance))

                .Append(mlContext.Transforms.Normalize(inputName: "TripTime", mode: NormalizerMode.MeanVariance))

                .Append(mlContext.Transforms.Normalize(inputName: "TripDistance", mode: NormalizerMode.MeanVariance))

                .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PaymentTypeEncoded", "PassengerCount", "TripTime", "TripDistance"));

// STEP 3: Set the training algorithm, then create and config the modelBuilder - Selected Trainer (SDCA 回归 algorithm)

var trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent(labelColumn: "Label", featureColumn: "Features");

var trainingPipeline = dataProcessPipeline.Append(trainer);

2. 训练模型

训练模型是在训练数据（具有已知的费用）上运行所选算法以调整模型参数的过程。它在Fit（）API中实现。要执行训练，我们只需在提供DataView时调用该方法。

var trainedModel = trainingPipeline.Fit(trainingDataView);

3. 评估模型

我们需要这一步来总结我们的模型对新数据的准确性。为此，上一步中的模型针对另一个未在训练中使用的数据集运行（taxi-fare-test.csv）。此数据集也包含已知的费用。 Regression.Evaluate()计算已知费用和模型预测的费用之间差异的各种指标。

IDataView predictions = trainedModel.Transform(testDataView);

var metrics = mlContext.Regression.Evaluate(predictions, label: "Label", score: "Score");

Common.ConsoleHelper.PrintRegressionMetrics(trainer.ToString(), metrics);

要了解有关如何理解指标的更多信息，请查看ML.NET指南中的机器学习词汇表或使用任何有关数据科学和机器学习的材料。

如果您对模型的质量不满意，可以采用多种方法对其进行改进，这些方法将在examples类别中介绍。

请记住，对于这个示例，其质量低于可能达到的水平，因为出于性能目的，数据集的大小已减小。您可以使用原始数据集来显著提高质量（原始数据集在数据集README中引用）。

4. 使用模型

在训练模型之后，我们可以使用Predict() API来预测指定行程的费用。

//Sample:

//vendor_id,rate_code,passenger_count,trip_time_in_secs,trip_distance,payment_type,fare_amount

//VTS,1,1,1140,3.75,CRD,15.5

var taxiTripSample = new TaxiTrip()

{

    VendorId = "VTS",

    RateCode = "1",

    PassengerCount = 1,

    TripTime = 1140,

    TripDistance = 3.75f,

    PaymentType = "CRD",

    FareAmount = 0 // To predict. Actual/Observed = 15.5

};

ITransformer trainedModel;

using (var stream = new FileStream(ModelPath, FileMode.Open, FileAccess.Read, FileShare.Read))

{

    trainedModel = mlContext.Model.Load(stream);

}

// Create prediction engine related to the loaded trained model

var predFunction = trainedModel.MakePredictionFunction<TaxiTrip, TaxiTripFarePrediction>(mlContext);

//Score

var resultprediction = predFunction.Predict(taxiTripSample);

Console.WriteLine($"**********************************************************************");

Console.WriteLine($"Predicted fare: {resultprediction.FareAmount:0.####}, actual fare: 15.5");

Console.WriteLine($"**********************************************************************");

最后，您可以用方法PlotRegressionChart()在图表中展现测试预测的分布情况以及回归的执行方式，如下面的屏幕截图所示：

ML.NET 示例：回归之价格预测的更多相关文章

ML.NET 示例：回归之销售预测
写在前面准备近期将微软的machinelearning-samples翻译成中文,水平有限,如有错漏,请大家多多指正. 如果有朋友对此感兴趣,可以加入我:https://github.com/fei ...
ML.NET 示例：开篇
写在前面准备近期将微软的machinelearning-samples翻译成中文,水平有限,如有错漏,请大家多多指正. 如果有朋友对此感兴趣,可以加入我:https://github.com/fei ...
ML.NET 示例：目录
ML.NET 示例中文版:https://github.com/feiyun0112/machinelearning-samples.zh-cn 英文原版请访问:https://github.com/ ...
二手车价格预测 | 构建AI模型并部署Web应用 ⛵
作者:韩信子@ShowMeAI 数据分析实战系列:https://www.showmeai.tech/tutorials/40 机器学习实战系列:https://www.showmeai.tech/t ...
TensorFlow-Bitcoin-Robot:一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人
简介 TensorFlow-Bitcoin-Robot:一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人. 文章包括一下几个部分: 1.为什么要尝试做这个项目? 2.为 ...
Python之逻辑回归模型来预测
建立一个逻辑回归模型来预测一个学生是否被录取. import numpy as np import pandas as pd import matplotlib.pyplot as plt impor ...
TensorFlow-Bitcoin-Robot:一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人。
简介 TensorFlow-Bitcoin-Robot:一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人. 文章包括一下几个部分: 1.为什么要尝试做这个项目? 2.为 ...
ML.NET教程之出租车车费预测(回归问题)
理解问题出租车的车费不仅与距离有关,还涉及乘客数量,是否使用信用卡等因素(这是的出租车是指纽约市的).所以并不是一个简单的一元方程问题. 准备数据建立一控制台应用程序工程,新建Data文件夹,在其 ...
ML.NET 示例：聚类之客户细分
写在前面准备近期将微软的machinelearning-samples翻译成中文,水平有限,如有错漏,请大家多多指正. 如果有朋友对此感兴趣,可以加入我:https://github.com/fei ...

随机推荐

codeforces 735C Tennis Championship(贪心+递推)
Tennis Championship 题目链接:http://codeforces.com/problemset/problem/735/C ——每天在线,欢迎留言谈论. 题目大意: 给你一个 n ...
Spring Data Redis 让 NoSQL 快如闪电(2)
[编者按]本文作者为 Xinyu Liu,文章的第一部分重点概述了 Redis 方方面面的特性.在第二部分,将介绍详细的用例.文章系国内 ITOM 管理平台 OneAPM 编译呈现. 把 Redis ...
洗礼灵魂，修炼python（76）--全栈项目实战篇（4）—— 购物车系统
要求: 1.基本符合日常购物车的要求(根据你的想法开放性提升功能) 2.展示商品信息,并且可随时上新商品 3.用户购买每一样商品时都对所剩的钱做一次对比,如果够则提示“已购买”,如果不够提示“余额不足 ...
selenium驱动程序下载和使用流程
转自https://blog.csdn.net/weixin_42660771/article/details/81286982 1.下载地址 https://github.com/mozill ...
Linux CFS调度器之队列操作--Linux进程的管理与调度(二十七）
1. CFS进程入队和出队完全公平调度器CFS中有两个函数可用来增删队列的成员:enqueue_task_fair和dequeue_task_fair分别用来向CFS就绪队列中添加或者删除进程 2 ...
Hadoop2.7.6_01_部署
1. 主机规划主机名称外网IP 内网IP 操作系统备注安装软件 mini01 10.0.0.11 172.16.1.11 CentOS 7.4 ssh port:22 Hadoop [Name ...
关于phpstorm ftp目录乱码
关于IIS FTP服务器汉字文件目录乱码问题:一般来说,IIS 服务器编码默认为GBK,而你的目录可能是UTF-8,将phpstorm的远程连接设置为GBK就OK了.记住服务器的编码,文件的编码要统一
Ubuntu 12.04上安装Hadoop并运行
Ubuntu 12.04上安装Hadoop并运行作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ 在官网上下载好四个文件在Ubuntu的/home/w ...
【Python语言】Python介绍
目前在大数据的行业中有3种语言:1. Java ---> 用于大数据工程2. Scala ---> 用于大数据工程和数据科学3.Python ---> 用于数据科学 Python是一 ...
nuxt拦截IE浏览器
需求场景判断浏览器类型,让譬如IE的低版本浏览器跳转到指定提示浏览器升级页面. 难点分析使用过的都知道,nuxt没有暴露主入口页面也就是index.html啊,我们以前常用的IE条件判断没地方写. ...

ML.NET 示例：回归之价格预测

写在前面