【原】Coursera—Andrew Ng机器学习—Week 1 习题—Linear Regression with One Variable 单变量线性回归
Question 1
Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.
Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ0+θ1x, and we use m to denote the number of training examples.
|
x |
y |
|---|---|
|
5 |
4 |
|
3 |
4 |
|
0 |
1 |
|
4 |
3 |
For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).
Answer:
4
Question 2
Consider the following training set of m=4 training examples:
|
x |
y |
|---|---|
|
1 |
0.5 |
|
2 |
1 |
|
4 |
2 |
|
0 |
0 |
Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1
that you would expect to obtain upon running gradient descent on this
model? (Linear regression will be able to fit this data perfectly.)
θ0=0.5,θ1=0
θ0=0.5,θ1=0.5
θ0=1,θ1=0.5
θ0=0,θ1=0.5
θ0=1,θ1=1
Answer:
θ0=0,θ1=0.5
As J(θ0,θ1)=0, y = hθ(x) = θ0 + θ1x. Using any two values in the table, solve for θ0, θ1.
Question 3
Suppose we set θ0=−1,θ1=0.5. What is hθ(4)?
Answer:
Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 1
Question 4
Let f be some function so that f(θ0,θ1) outputs a number. For this problem,f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima).Suppose we use gradient descent to try to minimize f(θ0,θ1) as a function of θ0 and θ1. Which of thefollowing statements are true? (Check all that apply.)
Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ0,θ1).
If the learning rate is too small, then gradient descent may take a very long time to converge.
If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values.
If θ0 and θ1 are initialized so that θ0=θ1,
then by symmetry (because we do simultaneous updates to the two
parameters), after one iteration of gradient descent, we will still have
θ0=θ1.
Answers:
|
True or False |
Statement |
Explanation |
|---|---|---|
|
True |
If the learning rate is too small, then gradient descent may take a very long time to converge. |
If the learning rate is small, gradient descent ends up taking an |
|
True |
If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values. |
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters. |
|
False |
Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ0,θ1). |
If the learning rate is too large, one step of gradient descent |
|
False |
If θ0 and θ1 are initialized so that θ0=θ1, |
The updates to θ0 and θ1 are different (even |
Other Options:
|
True or False |
Statement |
Explanation |
|---|---|---|
|
True |
If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value |
if alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(θ0,θ1) |
|
False |
No matter how θ0 and θ1 are initialized, so |
This is not true, depending on the initial condition, gradient descent may end up at different local optima. |
|
False |
Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent. |
If the learning rate is small, gradient descent ends up taking an |
Question 5
Suppose that for some linear regression problem (say, predicting
housing prices as in the lecture), we have some training set, and for
our training set we managed to find some θ0, θ1 such that J(θ0,θ1)=0.
Which of the statements below must then be true? (Check all that apply.)
For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0
Our training set can be fit perfectly by a straight line, i.e.,
all of our training examples lie perfectly on some straight line.
Answers:
|
True or False |
Statement |
Explanation |
|---|---|---|
|
False |
For this to be true, we must have y(i)=0 for every value of i=1,2,…,m. |
So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1) so that J(θ0,θ1)=0. It is not necessary that y(i) for all our examples. |
|
False |
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. |
none |
|
False |
For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0 |
If J(θ0,θ1)=0 that means the line defined by the equation “y = θ0 + θ1x” perfectly fits all of our data. There’s no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples). |
|
True |
Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line. |
If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data. |
| False |
We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.) |
None |
| False |
This is not possible: By the definition of J(θ0,θ1), it is not possible for there to exist θ0 and θ1 so that J(θ0,θ1)=0 |
None |
| True |
For these values of θ0 and θ1 that satisfy J(θ0,θ1)=0, we have that hθ(x(i))=y(i) for every training example (x(i),y(i)) |
Not all the hθ(x(i)) need to be equal to y(i) |
【原】Coursera—Andrew Ng机器学习—Week 1 习题—Linear Regression with One Variable 单变量线性回归的更多相关文章
- 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 2_Linear regression with one variable 单变量线性回归
Lecture2 Linear regression with one variable 单变量线性回归 2.1 模型表示 Model Representation 2.1.1 线性回归 Li ...
- 【原】Coursera—Andrew Ng机器学习—Week 2 习题—Linear Regression with Multiple Variables 多变量线性回归
Gradient Descent for Multiple Variables [1]多变量线性模型 代价函数 Answer:AB [2]Feature Scaling 特征缩放 Answer:D ...
- 【原】Coursera—Andrew Ng机器学习—Week 3 习题—Logistic Regression 逻辑回归
课上习题 [1]线性回归 Answer: D A 特征缩放不起作用,B for all 不对,C zero error不对 [2]概率 Answer:A [3]预测图形 Answer:A 5 - x1 ...
- 【原】Coursera—Andrew Ng机器学习—Week 11 习题—Photo OCR
[1]机器学习管道 [2]滑动窗口 Answer:C ((200-20)/4)2 = 2025 [3]人工数据 [4]标记数据 Answer:B (10000-1000)*10 /(8*60*60) ...
- 【原】Coursera—Andrew Ng机器学习—Week 5 习题—Neural Networks learning
课上习题 [1]代价函数 [2]代价函数计算 [3] [4]矩阵的向量化 [5]梯度校验 Answer:(1.013 -0.993) / 0.02 = 3.001 [6]梯度校验 Answer:学习的 ...
- 【原】Coursera—Andrew Ng机器学习—Week 10 习题—大规模机器学习
[1]大规模数据 [2]随机梯度下降 [3]小批量梯度下降 [4]随机梯度下降的收敛 Answer:BD A 错误.学习率太小,算法容易很慢 B 正确.学习率小,效果更好 C 错误.应该是确定阈值吧 ...
- 【原】Coursera—Andrew Ng机器学习—Week 9 习题—异常检测
[1]异常检测 [2]高斯分布 [3]高斯分布 [4] 异常检测 [5]特征选择 [6] [7]多变量高斯分布 Answer: ACD B 错误.需要矩阵Σ可逆,则要求m>n 测验1 Answ ...
- 【原】Coursera—Andrew Ng机器学习—Week 8 习题—聚类 和 降维
[1]无监督算法 [2]聚类 [3]代价函数 [4] [5]K的选择 [6]降维 Answer:本来是 n 维,降维之后变成 k 维(k ≤ n) [7] [8] Answer: 斜率-1 [9] A ...
- 【原】Coursera—Andrew Ng机器学习—Week 7 习题—支持向量机SVM
[1] [2] Answer: B. 即 x1=3这条垂直线. [3] Answer: B 因为要尽可能小.对B,右侧红叉,有1/2 * 2 = 1 ≥ 1,左侧圆圈,有1/2 * -2 = -1 ...
随机推荐
- CoreData之增删改查
1. 导入库文件CoreData.framework2. 在iOS的Core Data 中建Data Model文件 此时有三种选择 2.1. 选Data Model(如默认名Model.xcdata ...
- C++中const指针用法汇总
这里以int类型为例,进行说明,在C++中const是类型修饰符: int a; 定义一个普通的int类型变量a,可对此变量的值进行修改. const int a = 3;与 int const a ...
- 卷积神经网络实战-----0001(移植卷积神经网络c++ to python or java)
1. https://github.com/174high/simple_cnn 自己fork的 2. https://github.com/can1357/simple_cnn 最初始的 3. ...
- 【vs2013】如何在VS的MFC中配置使用GDI+?
摘自:http://www.cnblogs.com/CSGrandeur/p/3156843.html (已实验,可行) 1.配置GDI+ VS2010自带GDI+,直接使用. (1)首先要添加头文件 ...
- RabbitMQ学习系列二-C#代码发送消息
RabbitMQ学习系列二:.net 环境下 C#代码使用 RabbitMQ 消息队列 http://www.80iter.com/blog/1437455520862503 上一篇已经讲了Rabbi ...
- yield 与生成器
yield的功能类似于return,但是不同之处在于它返回的是生成器. 生成器 生成器是通过一个或多个yield表达式构成的函数,每一个生成器都是一个迭代器(但是迭代器不一定是生成器). 如果一个函数 ...
- 数据结构(栈&堆 )
在计算机领域,堆栈是一个不容忽视的概念,堆栈是两种数据结构.堆栈都是一种数据项按序排列的数据结构,只能在一端(称为栈顶(top))对数据项进行插入和删除.在单片机应用中,堆栈是个特殊的存储区,主要功能 ...
- SCARA——OpenGL入门学习四(颜色)
OpenGL入门学习[四] 本次学习的是颜色的选择.终于要走出黑白的世界了~~ OpenGL支持两种颜色模式:一种是RGBA,一种是颜色索引模式. 无论哪种颜色模式,计算机都必须为每一个像素保存一些数 ...
- 十大Java人物
James Gosling : Java之父文/陶文 作 为Java之父,James Gosling的名字可谓是耳熟能详.当人们评论一种编程语言时,总喜欢捎带着把下蛋的母鸡一起带上.Java做为中国的 ...
- Windows下查看什么进程占用文件
任务管理器→性能Tab→资源管理器→CPU→关联的句柄后面的检索框中录入文件名(关键文件夹即可). 比如我的是在删除tomcat下面的WEB-INF文件出现问题:就输入WEB-INF:mygod,发现 ...