Question 1

Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.

Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).

Here each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ01x, and we use m to denote the number of training examples.

x

y

5

4

3

4

0

1

4

3

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

Answer:
4

Question 2

Consider the following training set of m=4 training examples:

x

y

1

0.5

2

1

4

2

0

0

Consider the linear regression model hθ(x)=θ01x. What are the values of θ0 and θ1
that you would expect to obtain upon running gradient descent on this
model? (Linear regression will be able to fit this data perfectly.)

    • θ0=0.5,θ1=0

    • θ0=0.5,θ1=0.5

    • θ0=1,θ1=0.5

    • θ0=0,θ1=0.5

    • θ0=1,θ1=1

Answer:
θ0=0,θ1=0.5

As J(θ01)=0, y = hθ(x) = θ0 + θ1x. Using any two values in the table, solve for θ0, θ1.

Question 3

Suppose we set θ0=−1,θ1=0.5. What is hθ(4)?

Answer:

Setting x = 4, we have hθ(x)=θ01x = -1 + (0.5)(4) = 1

Question 4

Let f be some function so that f(θ01) outputs a number. For this problem,f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima).Suppose we use gradient descent to try to minimize f(θ01)  as a function of θ0 and θ1. Which of thefollowing statements are true? (Check all that apply.)

    • Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ01).

    • If the learning rate is too small, then gradient descent may take a very long time to converge.

    • If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values.

    • If θ0 and θ1 are initialized so that θ01,
      then by symmetry (because we do simultaneous updates to the two
      parameters), after one iteration of gradient descent, we will still have
      θ01.

Answers:

True or False

Statement

Explanation

True

If the learning rate is too small, then gradient descent may take a very long time to converge.

If the learning rate is small, gradient descent ends up taking an
extremely small step on each iteration, and therefor can take a long
time to converge

True

If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values.

At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters.

False

Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ01).

If the learning rate is too large, one step of gradient descent
can actually vastly “overshoot” and actually increase the value of f(θ01).

False

If θ0 and θ1 are initialized so that θ01,
then by symmetry (because we do simultaneous updates to the two
parameters), after one iteration of gradient descent, we will still have
θ01.

The updates to θ0 and θ1 are different (even
though we’re doing simulaneous updates), so there’s no particular
reason to update them to be same after one iteration of gradient
descent.

Other Options:

True or False

Statement

Explanation

True

If the first few iterations of gradient descent cause f(θ01) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value

if alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(θ01)
at least a little bit. If gradient descent instead increases the
objective value, that means alpha is too large (or you have a bug in
your code!).

False

No matter how θ0 and θ1 are initialized, so
long as learning rate is sufficiently small, we can safely expect
gradient descent to converge to the same solution

This is not true, depending on the initial condition, gradient descent may end up at different local optima.

False

Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent.

If the learning rate is small, gradient descent ends up taking an
extremely small step on each iteration, so this would actually slow down
(rather than speed up) the convergence of the algorithm.

Question 5

Suppose that for some linear regression problem (say, predicting
housing prices as in the lecture), we have some training set, and for
our training set we managed to find some θ0, θ1 such that J(θ01)=0.

Which of the statements below must then be true? (Check all that apply.)

    • For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.

    • Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.

    • For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0

    • Our training set can be fit perfectly by a straight line, i.e.,
      all of our training examples lie perfectly on some straight line.

Answers:

True or False

Statement

Explanation

False

For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.

So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1) so that J(θ01)=0. It is not necessary that y(i) for all our examples.

False

Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.

none

False

For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0

If J(θ01)=0 that means the line defined by the equation “y = θ0 + θ1x” perfectly fits all of our data. There’s no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples).

True

Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.

If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data.

 False

We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.)

 None
False

This is not possible: By the definition of J(θ01), it is not possible for there to exist θ0 and θ1 so that J(θ01)=0

None
True

For these values of θ0 and θ1 that satisfy J(θ01)=0, we have that hθ(x(i))=y(i) for every training example (x(i),y(i))

Not all the hθ(x(i)) need to be equal to y(i)

【原】Coursera—Andrew Ng机器学习—Week 1 习题—Linear Regression with One Variable 单变量线性回归的更多相关文章

  1. 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 2_Linear regression with one variable 单变量线性回归

    Lecture2   Linear regression with one variable  单变量线性回归 2.1 模型表示 Model Representation 2.1.1  线性回归 Li ...

  2. 【原】Coursera—Andrew Ng机器学习—Week 2 习题—Linear Regression with Multiple Variables 多变量线性回归

    Gradient Descent for Multiple Variables [1]多变量线性模型  代价函数 Answer:AB [2]Feature Scaling 特征缩放 Answer:D ...

  3. 【原】Coursera—Andrew Ng机器学习—Week 3 习题—Logistic Regression 逻辑回归

    课上习题 [1]线性回归 Answer: D A 特征缩放不起作用,B for all 不对,C zero error不对 [2]概率 Answer:A [3]预测图形 Answer:A 5 - x1 ...

  4. 【原】Coursera—Andrew Ng机器学习—Week 11 习题—Photo OCR

    [1]机器学习管道 [2]滑动窗口 Answer:C ((200-20)/4)2 = 2025 [3]人工数据 [4]标记数据 Answer:B (10000-1000)*10 /(8*60*60) ...

  5. 【原】Coursera—Andrew Ng机器学习—Week 5 习题—Neural Networks learning

    课上习题 [1]代价函数 [2]代价函数计算 [3] [4]矩阵的向量化 [5]梯度校验 Answer:(1.013 -0.993) / 0.02 = 3.001 [6]梯度校验 Answer:学习的 ...

  6. 【原】Coursera—Andrew Ng机器学习—Week 10 习题—大规模机器学习

    [1]大规模数据 [2]随机梯度下降 [3]小批量梯度下降 [4]随机梯度下降的收敛 Answer:BD A 错误.学习率太小,算法容易很慢 B 正确.学习率小,效果更好 C 错误.应该是确定阈值吧 ...

  7. 【原】Coursera—Andrew Ng机器学习—Week 9 习题—异常检测

    [1]异常检测 [2]高斯分布 [3]高斯分布 [4] 异常检测 [5]特征选择 [6] [7]多变量高斯分布 Answer: ACD B 错误.需要矩阵Σ可逆,则要求m>n  测验1 Answ ...

  8. 【原】Coursera—Andrew Ng机器学习—Week 8 习题—聚类 和 降维

    [1]无监督算法 [2]聚类 [3]代价函数 [4] [5]K的选择 [6]降维 Answer:本来是 n 维,降维之后变成 k 维(k ≤ n) [7] [8] Answer: 斜率-1 [9] A ...

  9. 【原】Coursera—Andrew Ng机器学习—Week 7 习题—支持向量机SVM

    [1] [2] Answer: B. 即 x1=3这条垂直线. [3] Answer: B 因为要尽可能小.对B,右侧红叉,有1/2 * 2  = 1 ≥ 1,左侧圆圈,有1/2 * -2  = -1 ...

随机推荐

  1. 【网络】<网络是怎样连接的>笔记

    [一] 浏览器 http://user:pwd@hosturl:port/dir/of/file 基本思路: 1.1 生成http请求信息 包含“对什么”“进行怎样的操作”两个方法.一般常用操作是GE ...

  2. 日常生活小技巧 -- 惠普 Windows10 进入安全模式

    今天手贱,是真的很贱.将用户模式从管理员组改为标准用户 方法是:WIN+R 打开 control userpasswords2 然后出现了用户账户控制,你要允许此应用对你的设备进行更改吗?最关键的是没 ...

  3. 6.etc目录下重要文件和目录详解

    1./etc/下的重要的配置文件 /etc(二进制软件包的 yum /rpm 安装的软件和所有系统管理所需要的配置文件和子目录.还有安装的服务的启动命令也放置在此处) /etc/sysconfig/n ...

  4. html调bug

    F12-->Sources-->相应文件-->找有波浪线

  5. PS常用美化处理方法大全

    学习PS的同学都知道,我们日常生活中使用PS就是进行一些简单的图像美白,图像颜色的优化,其他的基本不用,在长时间的PS使用过程中本人总结了一些处理皮肤的方法,都是一些非常简单的方法,希望能够帮助那些刚 ...

  6. Codeforces 589F Gourmet and Banquet

    A gourmet came into the banquet hall, where the cooks suggested n dishes for guests. The gourmet kno ...

  7. Python学习系列(七)( 数据库编程)

    Python学习系列(七)( 数据库编程)        Python学习系列(六)(模块) 一,MySQL-Python插件       Python里操作MySQL数据库,需要Python下安装访 ...

  8. jQuery ajax submit form 被拦截问题的解决

    一般情况下用js或jquery的submit方法提交form表单是不会被浏览器拦截的,但是发现异步的情况下用js提交form表单就会被浏览器拦截,这样就对功能的实现带来了很多的麻烦.网上看了好多都是同 ...

  9. selenium - 三种元素等待

    1.sleep 休眠方法 sleep()由python的time模块提供. 当执行到sleep()方法时,脚本会定时休眠所设置的时长,sleep()方法默认参数是s(秒),sleep(2) 表示休眠2 ...

  10. dubbox下载编译运行demo

    最近公司要搞微服务改造,拿了一个小项目开刀,找来找去,还是偏向当当的dubbox作为分布式服务框架.这里介绍下怎么一条龙跑起一个demo. 1.下载代码 因为代码放在github上,所以我们直接用Ec ...