Question 1

Consider the problem of predicting how well a student does in her second year of college/university, given how well they did in their first year. Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).

Questions 1 through 4 will use the following training set of a small sample of different students' performances. Here each row is one training example. Recall that in linear regression, our hypothesis is $hθ(x)=θ_0+θ_1x$, and we use m to denote the number of training examples.

x y
5 4
3 4
0 1
4 3

For the training set given above, what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

Answer

m is the number of training examples. In this example, we have m=4 examples.

4

Question 2

Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemists obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released. examples.

You would like to use linear regression (hθ(x)=θ_0+θ_1x) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for θ_0 and θ_1? You should be able to select the right answer without actually implementing linear regression.

Answer

Since the carbon atoms (x) increase and the released heat (y) decreases, θ_1 has to be negative. θ_0 functionas as the offset. Looking at the table: a few θ_0 should be higher than -1000

  • θ_0=−1780.0,θ_1=−530.9
  • θ_0=−569.6,θ_1=−530.9
  • θ_0=−1780.0,θ_1=530.9
  • θ_0=−569.6,θ_1=530.9

Question Explanation

We can give an approximate estimate of the θ0 and θ1 values observing the trend of the data in the training set. We see that the y values decrease quite regularly when the x values increase, then θ1 must be negative. θ0 is the value that the hypothesis takes when x is equal to zero, therefore it must be superior to y(1) in order to satisfy the decreasing trend of the data. Among the proposed answers, the only one that meets both the conditions is hθ(x)=−569.6−530.9x. We can better appreciate these considerations observing the graph of the training data and the linear regression (below):


Question 3

Suppose we set θ_0=−1,θ_1=0.5. What is hθ(4)?

Answer

hθ(x) = θ_0 + θ_1x
hθ(x) = -1 + 0.5x
hθ(4) = -1 + 0.5 * 4
hθ_θ(4) = 1

Question 4

Let f be some function so that f(θ0,θ1) outputs a number. For this problem, f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima). Suppose we use gradient descent to try to minimize f(θ0,θ1) as a function of θ0 and θ1. Which of the following statements are true? (Check all that apply.)

Answer

  • If θ0 and θ1 are initialized so that θ0=θ1, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ0=θ1.

    • The updates to θ0 and θ1 are different (even though we're doing simultaneous updates), so there's no particular reason to expect them to be the same after one iteration of gradient descent.
  • Setting the learning rate α to be very small is not harmful, and can only speed up the convergence of gradient descent.
    • If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.
  • If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate α to too large a value.
    • If alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(\theta_0,\theta_1) at least a little bit. If gradient descent instead increases the objective value, that means alpha is too large (or you have a bug in your code!).
  • If the learning rate is too small, then gradient descent may take a very long time to converge.
    • If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, and therefore can take a long time to converge.

Question 5

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some θ0, θ1 such that J(θ0,θ1)=0. Which of the statements below must then be true? (Check all that apply.)

Answer

  • For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0

    • If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data. There's no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples).
  • Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.
    • If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data.
  • For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.
    • So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1 so that J(θ0,θ1)=0. It is not necessary that y(i)=0 for all of our examples.
  • We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.)
    • Even though we can fit our training set perfectly, this does not mean that we'll always make perfect predictions on houses in the future/on houses that we have not yet seen.

【Coursera - machine learning】 Linear regression with one variable-quiz的更多相关文章

  1. machine learning (2)-linear regression with one variable

    machine learning- linear regression with one variable(2) Linear regression with one variable = univa ...

  2. Machine Learning #Lab1# Linear Regression

    Machine Learning Lab1 打算把Andrew Ng教授的#Machine Learning#相关的6个实验一一实现了贴出来- 预计时间长度战线会拉的比較长(毕竟JOS的7级浮屠还没搞 ...

  3. 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)

    从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...

  4. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  5. CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression

    (1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput t ...

  6. Machine learning(2-Linear regression with one variable )

    1.Model representation Our Training Set [训练集]: We will start with this ''Housing price prediction'' ...

  7. 【Machine Learning】机器学习及其基础概念简介

    机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...

  8. 【Machine Learning】决策树案例:基于python的商品购买能力预测系统

    决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...

  9. 【Machine Learning】KNN算法虹膜图片识别

    K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...

随机推荐

  1. Nginx均衡负载(IP_HASH)未生效

    由于公司业务的发展,单台服务器已经无法满足并发和用户的需求,所以只能通过水平拓展的方式加机器来解决,线上采用的是Nginx+Tomcat集群的方式来解决.由于当前业务量不是很大,而且由于之前代码的问题 ...

  2. DDD分层架构之值对象(介绍篇)

    DDD分层架构之值对象(介绍篇) 前面介绍了DDD分层架构的实体,并完成了实体层超类型的开发,同时提供了验证方面的支持.本篇将介绍另一个重要的构造块——值对象,它是聚合中的主要成分. 如果说你已经在使 ...

  3. DDD分层架构之领域实体(验证篇)

    DDD分层架构之领域实体(验证篇) 在应用程序框架实战十四:DDD分层架构之领域实体(基础篇)一文中,我介绍了领域实体的基础,包括标识.相等性比较.输出实体状态等.本文将介绍领域实体的一个核心内容—— ...

  4. Spring IOC 之ApplicationContext的其他功能

    正如上面章节所介绍的那样, org.springframework.beans.factory 包提供了管理和操作beans的 基本功能. org.springframework.context包增加 ...

  5. [译]Java垃圾回收是如何工作的

    说明:这篇文章来翻译来自于Javapapers 的How Java Garbage Collection Works 这部分教程是为了理解Java垃圾回收的基础以及它是如何工作的.这是垃圾回收系列教程 ...

  6. 深入了解 Java HelloWorld

    Java 的 Hello World 代码 public class HelloWorld { /** * * @param args */ public static void main(Strin ...

  7. easyui datagrid显示进度条控制操作

    在当我们需要控制时间前台实际项目页面datagrid显示进度条的数据加载时运行,和datagrid默认情况下只在有url加载运行时的数据显示方式的进度条.下面的代码手动控制: 打开一个进度条: $(' ...

  8. vector成员函数解析

    vector线性集装箱,其元素颜格排序根据线性序列,和动态数组很阶段似,像阵列,它的元素被存储在连续的存储空间,这也意味着,我们不仅能够使用迭代器(iterator)访问元素,也可以用一个指针访问偏移 ...

  9. C# 6.0 功能预览

    C# 6.0 功能预览 (一) 一.索引的成员和元素初始化 1.1 原始初始化集合 Dictionary 1.2 键值初始化集合 Dictionary 1.3 运算符 $ 初始化集合 Dictiona ...

  10. WCF入门教程(图文)VS2012

    WCF入门教程(图文)VS2012 上一遍到现在已经有一段时间了,先向关注本文的各位“挨踢”同仁们道歉了.小生自认为一个ITer如果想要做的更好,就需要将自己的所学.所用积极分享出来,接收大家的指导和 ...