0 Neural Networks: Learning
1 Cost Function and BackPropagation
2 Backpropagation in Pratice
3 Autonomous Driving
Neural Network-Based Autonomous Driving. 1992 11 23

As for the back propagation algorithm, the formula given by the teacher is really useful.

But you don't understand why you're doing this, including what delta means. And the best way to do that is to actually compute a small neural network, using the chain rule for derivatives. Calculate each θ once. Then put them together to understand how to use vectorization implementation.

There are no meanings. There are just laws of arithmetic.

0 Neural Networks: Learning

In Week 5, you will be learning how to train Neural Networks. The Neural Network is one of the most powerful learning algorithms (when a linear classifier doesn't work, this is what I usually turn to), and this week's videos explain the 'backpropagation' algorithm for training these models. In this week's programming assignment, you'll also get to implement this algorithm and see it work for yourself.

The Neural Network programming exercise will be one of the more challenging ones of this class. So please start early and do leave extra time to get it done, and I hope you'll stick with it until you get it to work! As always, if you get stuck on the quiz and programming assignment, you should post on the Discussions to ask for help. (And if you finish early, I hope you'll go there to help your fellow classmates as well.)-- by Andrew NG

1 Cost Function and BackPropagation

1.1 Cost Function

Let's first define a few variables that we will need to use:

L = total number of layers in the network
\(s_l\) = number of units (not counting bias unit) in layer l
K = number of output units/classes
Binary classification: y = 0 or y = 1, K=1;
Multi-class classification: K>=3;

\[h_\Theta(x)\in \mathbb{R}^K\\y\in \mathbb{R}^K
\]

Recall that in neural networks, we may have many output nodes. We denote \(h_\Theta(x)_k\) as being a hypothesis that results in the \(k^{th}\) output. Our cost function for neural networks is going to be a generalization of the one we used for logistic regression. Recall that the cost function for regularized logistic regression was:

\[J(\theta) = - \frac{1}{m} \sum_{i=1}^m [ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2
\]

For neural networks, it is going to be slightly more complicated:

\[\begin{gather*} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gather*}
\]

We have added a few nested summations to account for our multiple output nodes. In the first part of the equation, before the square brackets, we have an additional nested summation that loops through the number of output nodes.

With the explanation of the regularization part, the lectures are not as same as what theacher says. So I do some corrections.

Teacher

In the regularization part, Completely, we don't sum over the terms responding to where i is equal to 0. And so this is kinda like a bias unit and by analogy to what we were doing for logistic progression, we won't sum over those terms in our regularization term because we don't want to regularize them and string their values as zero. But this is just one possible convention, and even if you were to sum over i equals 0 up to Sl, it would work about the same and doesn't make a big difference. But maybe this convention of not regularizing the bias term is just slightly more common. Corresponds to the formula above.

Lecture

\[\begin{gather*} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=0}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gather*}
\]

In the regularization part, after the square brackets, we must account for multiple theta matrices. The number of columns in our current theta matrix is equal to the number of nodes in our current layer (including the bias unit). The number of rows in our current theta matrix is equal to the number of nodes in the next layer (excluding the bias unit). As before with logistic regression, we square every term.

Note:

the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
the triple sum simply adds up the squares of all the individual Θs in the entire network.
the i in the triple sum does not refer to training example i

1.2 Backpropagation Algorithm

"Backpropagation" is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression. Our goal is to compute:

\[\min_\Theta J(\Theta)
\]

That is, we want to minimize our cost function J using an optimal set of parameters in theta. In this section we'll look at the equations we use to compute the partial derivative of J(Θ):

\[\dfrac{\partial}{\partial \Theta_{i,j}^{(l)}}J(\Theta)
\]

To do so, we use the following algorithm:

Back propagation Algorithm

Given training set \(\lbrace (x^{(1)}, y^{(1)}) \cdots (x^{(m)}, y^{(m)})\rbrace\)

Set \(\Delta^{(l)}_{i,j}\) := 0 for all (l,i,j), (hence you end up having a matrix full of zeros)

For training example t =1 to m:

Set \(a^{(1)} := x^{(t)}\)
Perform forward propagation to compute \(a^{(l)}\) for l=2,3,…,L
Using \(y^{(t)}\), compute \(\delta^{(L)} = a^{(L)} - y^{(t)}\)

Where L is our total number of layers and \(a^{(L)}\) is the vector of outputs of the activation units for the last layer. So our "error values" for the last layer are simply the differences of our actual results in the last layer and the correct outputs in y. To get the delta values of the layers before the last layer, we can use an equation that steps us back from right to left:

Compute \(\delta^{(L-1)}, \delta^{(L-2)},\dots,\delta^{(2)}\) using \(\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)})\)

The delta values of layer l are calculated by multiplying the delta values in the next layer with the theta matrix of layer l. We then element-wise multiply that with a function called g', or g-prime, which is the derivative of the activation function g evaluated with the input values given by \(z^{(l)}\).

The g-prime derivative terms can also be written out as:

\[g'(z^{(l)}) = a^{(l)}\ .*\ (1 - a^{(l)})
\]

\(\Delta^{(l)}_{i,j} := \Delta^{(l)}_{i,j} + a_j^{(l)} \delta_i^{(l+1)}\) or with vectorization, \(\Delta^{(l)} := \Delta^{(l)} + \delta^{(l+1)}(a^{(l)})^T\)

Hence we update our new \(\Delta\) matrix.

\(D^{(l)}_{i,j} := \dfrac{1}{m}\left(\Delta^{(l)}_{i,j} + \lambda\Theta^{(l)}_{i,j}\right)\) , if j≠0.
\(D^{(l)}_{i,j} := \dfrac{1}{m}\Delta^{(l)}_{i,j}\) If j=0.

The capital-delta matrix D is used as an "accumulator" to add up our values as we go along and eventually compute our partial derivative. Thus we get \(\frac \partial {\partial \Theta_{ij}^{(l)}} J(\Theta)\)

1.3 Backpropagation Intuition

Note: [4:39, the last term for the calculation for \(z^3_1\) (three-color handwritten formula) should be \(a^2_2\) instead of \(a^2_1\). 6:08 - the equation for cost(i) is incorrect. The first term is missing parentheses for the log() function, and the second term should be \((1-y^{(i)})\log(1-h{_\theta}{(x^{(i)}}))\). 8:50 - \(\delta^{(4)} = y - a^{(4)}\) is incorrect and should be \(\delta^{(4)} = a^{(4)} - y\).]

Recall that the cost function for a neural network is:

\[\begin{gather*}J(\Theta) = - \frac{1}{m} \sum_{t=1}^m\sum_{k=1}^K \left[ y^{(t)}_k \ \log (h_\Theta (x^{(t)}))_k + (1 - y^{(t)}_k)\ \log (1 - h_\Theta(x^{(t)})_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} ( \Theta_{j,i}^{(l)})^2\end{gather*}
\]

If we consider simple non-multiclass classification (k = 1) and disregard regularization, the cost is computed with:

\[cost(t) =y^{(t)} \ \log (h_\Theta (x^{(t)})) + (1 - y^{(t)})\ \log (1 - h_\Theta(x^{(t)}))
\]

Intuitively, \(\delta_j^{(l)}\) is the "error" for \(a^{(l)}_j\) (unit j in layer l). More formally, the delta values are actually the derivative of the cost function:

\[\delta_j^{(l)} = \dfrac{\partial}{\partial z_j^{(l)}} cost(t)
\]

Recall that our derivative is the slope of a line tangent to the cost function, so the steeper the slope the more incorrect we are. Let us consider the following neural network below and see how we could calculate some \(\delta_j^{(l)}\):

In the image above, to calculate \(\delta_2^{(2)}\), we multiply the weights \(\Theta_{12}^{(2)}\) and \(\Theta_{22}^{(2)}\) by their respective \(\delta\) values found to the right of each edge. So we get \(\delta_2^{(2)}\) = \(\Theta_{12}^{(2)}\) * \(\delta_1^{(3)}\) +\(\Theta_{22}^{(2)}\) * \(\delta_2^{(3)}\). To calculate every single possible \(\delta_j^{(l)}\), we could start from the right of our diagram. We can think of our edges as our \(\Theta_{ij}\). Going from right to left, to calculate the value of \(\delta_j^{(l)}\), you can just take the over all sum of each weight times the \(\delta\) it is coming from. Hence, another example would be \(\delta_2^{(3)}\) = \(\Theta_{12}^{(3)}\) * \(\delta_1^{(4)}\).

2 Backpropagation in Pratice

2.1 Implementation Note: Unrolling Parameters

With neural networks, we are working with sets of matrices:

\[\begin{align*} \Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)}, \dots \newline D^{(1)}, D^{(2)}, D^{(3)}, \dots \end{align*}
\]

In order to use optimizing functions such as "fminunc()", we will want to "unroll" all the elements and put them into one long vector:

thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]

deltaVector = [ D1(:); D2(:); D3(:) ]

If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11, then we can get back our original matrices from the "unrolled" versions as follows:

Theta1 = reshape(thetaVector(1:110),10,11)

Theta2 = reshape(thetaVector(111:220),10,11)

Theta3 = reshape(thetaVector(221:231),1,11)

To summarize:

2.2 Gradient Checking

Gradient checking will assure that our backpropagation works as intended. We can approximate the derivative of our cost function with:

\[\dfrac{\partial}{\partial\Theta}J(\Theta) \approx \dfrac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}
\]

With multiple theta matrices, we can approximate the derivative with respect to \(Θ_j\) as follows:

\[\dfrac{\partial}{\partial\Theta_j}J(\Theta) \approx \dfrac{J(\Theta_1, \dots, \Theta_j + \epsilon, \dots, \Theta_n) - J(\Theta_1, \dots, \Theta_j - \epsilon, \dots, \Theta_n)}{2\epsilon}
\]

A small value for \({\epsilon}\) (epsilon) such as \({\epsilon = 10^{-4}}\), guarantees that the math works out properly. If the value for ϵ is too small, we can end up with numerical problems.

Hence, we are only adding or subtracting epsilon to the \(\Theta_j\) matrix. In octave we can do it as follows:

epsilon = 1e-4;

for i = 1:n,

  thetaPlus = theta;

  thetaPlus(i) += epsilon;

  thetaMinus = theta;

  thetaMinus(i) -= epsilon;

  gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)

end;

We previously saw how to calculate the deltaVector. So once we compute our gradApprox vector, we can check that gradApprox ≈ deltaVector.

Once you have verified once that your backpropagation algorithm is correct, you don't need to compute gradApprox again. The code to compute gradApprox can be very slow.

2.3 Random Initialization

When you're running an algorithm of gradient descent, or also the advanced optimization algorithms, we need to pick some initial value for the parameters theta. So for the advanced optimization algorithm, it assumes you will pass it some initial value for the parameters theta.

optTheta = fminunc(@costFunction, initialTheta, options)

Is it possible to set the initial value of theta to the vector of all zeros.Whereas this worked okay when we were using logistic regression, initializing all of your parameters to zero actually does not work when you are trading on your own network.

神经网络是多个函数作用在一起，形成非线性的函数做出预测。隐藏层的每一个节点都对应一个不同的函数，也就是不同的参数。例如一个100个节点的隐藏层就计算了100个不同的参数。

一旦所有初始参数都相同，那么所有的隐藏层节点计算的函数只有一个。整个神经网络就从100个节点变成了一个节点，无论是前向传播还是反向传播，都只是一个函数，怎么更新都是一样的。

同时注意到，老师所讲的，神经网络的梯度下降常常会得到一个局部最优解而不是全局最优解，也就是说，代价函数不是一个convex function。这里很耐人寻味。因为逻辑回归的代价函数与神经网络的代价函数基本上是一样的，但是神经网络的代价函数里嵌套了多个逻辑回归函数。逻辑回归是一个convex function, 神经网络的就不是了。

Initializing all theta weights to zero does not work with neural networks. When we backpropagate, all nodes will update to the same value repeatedly. Instead we can randomly initialize our weights for our Θ matrices using the following method:

Hence, we initialize each \(\Theta^{(l)}_{ij}\) to a random value between \([-\epsilon,\epsilon]\). Using the above formula guarantees that we get the desired bound. The same procedure applies to all the \(\Theta\)'s. Below is some working code you could use to experiment.

If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.

Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

rand(x,y) is just a function in octave that will initialize a matrix of random real numbers between 0 and 1.

(Note: the epsilon used above is unrelated to the epsilon from Gradient Checking)

2.4 Putting it Together

First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.

Number of input units = dimension of features \(x^{(i)}\)
Number of output units = number of classes
Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

Training a Neural Network

Randomly initialize the weights
Implement forward propagation to get \(h_\Theta(x^{(i)})\) for any \(x^{(i)}\)
Implement the cost function
Implement backpropagation to compute partial derivatives
Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

When we perform forward and back propagation, we loop on every training example:

Training a Neural Network

for i = 1:m,

   Perform forward propagation and backpropagation using example (x(i),y(i))

   (Get activations a(l) and delta terms d(l) for l = 2,...,L

The following image gives us an intuition of what is happening as we are implementing our neural network:

And by the way, for neural networks, this cost function j of theta is non-convex, or is not convex and so it can theoretically be susceptible to local minima, and in fact algorithms like gradient descent and the advance optimization methods can, in theory, get stuck in local optima, but it turns out that in practice this is not usually a huge problem and even though we can't guarantee that these algorithms will find a global optimum, usually algorithms like gradient descent will do a very good job minimizing this cost function j of theta and get a very good local minimum, even if it doesn't get to the global optimum. Finally, gradient descents for a neural network might still seem a little bit magical. So, let me just show one more figure to try to get that intuition about what gradient descent for a neural network is doing.

This was actually similar to the figure that I was using earlier to explain gradient descent. So, we have some cost function, and we have a number of parameters in our neural network. Right here I've just written down two of the parameter values.

So what gradient descent does is we'll start from some random initial point like that one over there, and it will repeatedly go downhill.

And so what back propagation is doing is computing the direction of the gradient, and what gradient descent is doing is it's taking little steps downhill until hopefully it gets to, in this case, a pretty good local optimum.

So, when you implement back propagation and use gradient descent or one of the advanced optimization methods, this picture sort of explains what the algorithm is doing. It's trying to find a value of the parameters where the output values in the neural network closely matches the values of the y(i)'s observed in your training set. So, hopefully this gives you a better sense of how the many different pieces of neural network learning fit together.

Ideally, you want \(h_\Theta(x^{(i)}) \approx y^{(i)}\). This will minimize our cost function. However, keep in mind that \(J(\Theta)\) is not convex and thus we can end up in a local minimum instead.

3 Autonomous Driving

In this video, I'd like to show you a fun and historically important example of neural networks learning of using a neural network for autonomous driving. That is getting a car to learn to drive itself.

The video that I'll showed a minute was something that I'd gotten from Dean Pomerleau, who was a colleague who works out in Carnegie Mellon University out on the east coast of the United States. And in part of the video you see visualizations like this. And I want to tell what a visualization looks like before starting the video.

Down here on the lower left is the view seen by the car of what's in front of it. And so here you kinda see a road that's maybe going a bit to the left, and then going a little bit to the right.

And up here on top, this first horizontal bar shows the direction selected by the human driver. And this location of this bright white band that shows the steering direction selected by the human driver where you know here far to the left corresponds to steering hard left, here corresponds to steering hard to the right. And so this location which is a little bit to the left, a little bit left of center means that the human driver at this point was steering slightly to the left. And this second bar here corresponds to the steering direction selected by the learning algorithm and again the location of this sort of white band means that the neural network was here selecting a steering direction that's slightly to the left.

And in fact before the neural network starts leaning initially, you see that the network outputs a grey band, like a grey, like a uniform grey band throughout this region and sort of a uniform gray fuzz corresponds to the neural network having been randomly initialized. And initially having no idea how to drive the car. Or initially having no idea of what direction to steer in. And is only after it has learned for a while, that will then start to output like a solid white band in just a small part of the region corresponding to choosing a particular steering direction. And that corresponds to when the neural network becomes more confident in selecting a band in one particular location, rather than outputting a sort of light gray fuzz, but instead outputting a white band that's more constantly selecting one's steering direction.

Neural Network-Based Autonomous Driving. 1992 11 23

ALVINN is a system of artificial neural networks that learns to steer by watching a person drive. ALVINN is designed to control the NAVLAB 2, a modified Army Humvee who had put sensors, computers, and actuators for autonomous navigation experiments.

The initial step in configuring ALVINN is creating a network just here. During training, a person drives the vehicle while ALVINN watches. Once every two seconds, ALVINN digitizes a video image of the road ahead, and records the person's steering direction.

This training image is reduced in resolution to 30 by 32 pixels and provided as input to ALVINN's three layered network. Using the back propagation learning algorithm,ALVINN is training to output the same steering direction as the human driver for that image.

Initially the network steering response is random. After about two minutes of training the network learns to accurately imitate the steering reactions of the human driver. This same training procedure is repeated for other road types. After the networks have been trained the operator pushes the run switch and ALVINN begins driving.

Twelve times per second, ALVINN digitizes the image and feeds it to its neural networks. Each network, running in parallel, produces a steering direction, and a measure of its' confidence in its' response.

The steering direction, from the most confident network, in this network training for the one lane road, is used to control the vehicle.

Suddenly an intersection appears ahead of the vehicle. As the vehicle approaches the intersection the confidence of the lone lane network decreases. As it crosses the intersection and the two lane road ahead comes into view, the confidence of the two lane network rises.

When its' confidence rises the two lane network is selected to steer. Safely guiding the vehicle into its lane onto the two lane road.

So that was autonomous driving using the neural network. Of course there are more recently more modern attempts to do autonomous driving. There are few projects in the US and Europe and so on, that are giving more robust driving controllers than this, but I think it's still pretty remarkable and pretty amazing how instant neural network trained with backpropagation can actually learn to drive a car somewhat well.

Machine Learning Week_5 Cost Function and BackPropagation的更多相关文章

Machine Learning/Introducing Logistic Function
Machine Learning/Introducing Logistic Function 打算写点关于Machine Learning的东西, 正好也在cnBlogs上新开了这个博客, 也就更新在 ...
CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
白话machine learning之Loss Function
转载自:http://eletva.com/tower/?p=186 有关Loss Function(LF),只想说,终于写了一.Loss Function 什么是Loss Function?wik ...
machine learning 之 Neural Network 2
整理自Andrew Ng的machine learning 课程 week5. 目录: Neural network and classification Cost function Backprop ...
Course Machine Learning Note
Machine Learning Note Introduction Introduction What is Machine Learning? Two definitions of Machine ...
Machine Learning - 第5周（Neural Networks: Learning）
The Neural Network is one of the most powerful learning algorithms (when a linear classifier doesn't ...
[Machine Learning] 浅谈LR算法的Cost Function
了解LR的同学们都知道,LR采用了最小化交叉熵或者最大化似然估计函数来作为Cost Function,那有个很有意思的问题来了,为什么我们不用更加简单熟悉的最小化平方误差函数(MSE)呢? 我个人理解 ...
machine learning(11) -- classification: advanced optimization 去求cost function最小值的方法
其它的比gradient descent快, 在某些场合得到广泛应用的求cost function的最小值的方法 when have a large machine learning problem, ...
machine learning(10) -- classification:logistic regression cost function 和使用 gradient descent to minimize cost function
logistic regression cost function(single example) 图像分布 logistic regression cost function(m examples) ...
[machine learning] Loss Function view
[machine learning] Loss Function view 有关Loss Function(LF),只想说,终于写了一.Loss Function 什么是Loss Function? ...

随机推荐

PHP 字符串大小写操作
PHP为我们提供了字符串中大小写字母转换的函数, strtoupper()将指定的字符全部转换为大写: strtolower()将北定的字符都转换成小写: ucwords()将指定字符串中每个单词的首 ...
DolphinScheduler分布式集群部署指南(小白版)
官方文档地址:https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9 DolphinScheduler简介摘自官网:Apache DolphinSc ...
Spring启动报8080端口被占用问题
1.window下关闭8080端口 win+R:输入cmd,回车在黑窗口中输入指令:netstat -ano | findstr 8080 指令的意思是找出占用8080端口的进程pid ...
dubbo序列化问题（二）hession2与kryo切换转
dubbo提供了好几种序列化方式,一般我们都是用的是默认的hession2,而dubbox为我们增加了kryo和fst许了方式,主要体现在速度快,占用内存小,然后我们将序列化配置改为是用kryo: & ...
mongodb 中嵌套数组的且查询
如果在mongodb中存在如下数据 { audit:{ experts:[{expertId:"1",result:"success",......} {exp ...
Camera | 7.瑞芯微rk3568平台摄像头控制器MIPI-CSI驱动架构梳理
因为有拍照.录制视频.直播等刚需,现在手机的摄像头基本都是高清,支持高清摄像头的SoC都支持MIPI-CSI. 不同SoC的MIPI-CSI在实现上有一定差别,即使同一厂家设计生产的芯片也都不尽相同. ...
MAC地址格式详解
以太网编址在数据链路层,数据帧通常依赖于MAC地址来进行数据交换,它如同公网IP地址一样要求具有全球唯一性,这样才可以识别每一台主机.那么MAC地址如何做到这点?它的格式又是什么? MAC地址,英文 ...
使用 Dependify 工具探索 .NET 应用程序依赖项
在大型项目中,由于各种组件的复杂性和互连性,管理依赖项可能变得具有挑战性.如果没有适当的工具或文档,可能很难浏览项目并对依赖项做出假设.以下是在大型项目中难以导航项目依赖项的几个原因: 复杂性:大型项 ...
五分钟入门Webworker
Webworker是基于HTML5提出的一种技术,允许主线程创建Worker线程,将一些任务分配给Worker运行,主线程运行同时,Worker线程在后台运行,互不干扰.等Worker线程完成计算任务 ...
RxJS 系列 – Join Operators
前言前几篇介绍过了 Creation Operators Filtering Operators Join Creation Operators Error Handling Operators T ...

Machine Learning Week_5 Cost Function and BackPropagation