<Hands-on ML with Sklearn & TF>  Chapter 1

  1. what is ml
    1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  2. what problems to solve
    1. exist solution but a lot of hand-tuning/rules
    2. no good solutions using a traditional approach
    3. fluctuating environment
    4. get insight about conplex problem and large data
  3. type
    1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
    2. whether or not learn incrementally on the fly(online, batch)
    3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
  4. (un)supervision learning
    1. supervision : include the desired solution called labels

      • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
    2. unsupervision : without labels
      • Clustering : k-means, HCA, ecpectation maximization
      • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
      • Association rule learning : Apriori, Eclat
    3. semisupervision
      • unsupervision --> supervision
    4. reinforcement : an agent in context
      1. observe the environment
      2. select and perform action
      3. get rewards in return
  5. batch/online learning
    1. batch : offline, to known new data need to train a new version from scratch one the full dataset
    2. online : incremental learning : challenge is bad data
  6. instance-based/model-based
    1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
    2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
  7. Challenge
    1. insufficient quantity of training data
    2. nonrepresentative training data
    3. poor-quality data
    4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
    5. overfitting : regularization -> hyperparameter
    6. underfitting : powerful model; better feature; reduce construct
  8. Testing and Validating
    1. 80% of data for training 20% for testing
    2. validating : best model and hyperparameter for training set unliking perform as well on new data
      1. train multiple models with various hyperparameters using training data
      2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
    3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

   Example 1-1:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

      

课后练习挺好的

Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

  1. Notes : <Hands-on ML with Sklearn & TF> Chapter 5

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  2. Notes : <Hands-on ML with Sklearn & TF> Chapter 7

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  3. Notes : <Hands-on ML with Sklearn & TF> Chapter 6

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Notes : <Hands-on ML with Sklearn & TF> Chapter 4

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  5. Notes : <Hands-on ML with Sklearn & TF> Chapter 3

    Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...

  6. Book : <Hands-on ML with Sklearn & TF> pdf/epub

    非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...

  7. H5 Notes:PostMessage Cross-Origin Communication

    Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...

  8. H5 Notes:Navigator Geolocation

    H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...

  9. notes:spm多重比较校正

    SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

  1. ADB文件及文件夹操作

    1.创建文件夹: adb shell mkdir /data/local/tmp/local多级的一次只能创建一级adb shell mkdir /data/local/tmp/local/tmp 2 ...

  2. 1-hadoop、mr

    1.HDFS的优缺点: 优点: ① 高容错 ② 可扩展 ③ 适合大文件存储 ④ 可构建在廉价的机器上 缺点: ① 高延迟 ② 文件不能修改 ③ 不适合小文件存储 2.HDFS架构(类似于文件系统): ...

  3. clear 属性

    clear属性:规定元素的哪一侧不允许有其他的浮动元素 Example: <html> <head> <style type="text/css"&g ...

  4. bower install的时候报错

    安装错误提示:C:\Scott>bower install bootstrap bower not-cached git://github.com/twbs/bootstrap.git#* bo ...

  5. Java内存泄漏定位

    Java虚拟机内存分为五个区域:方法区,堆,虚拟机栈,本地方法栈,程序计数器.其中方法区和堆是java虚拟机共享的内存区域,虚拟机栈,本地方法栈,程序计数器是线程私有的. 程序计数器(Program ...

  6. Source Code Pro 编程字体

    Source Code Pro :是 Adobe 公司号称最佳的编程字体,而且还是开源的 它非常适合用于阅读代码,支持 Linux.Mac OS X 和 Windows 等操作系统,而且无论商业或个人 ...

  7. C++学习基础十六-- 函数学习笔记

    C++ Primer 第七章-函数学习笔记 一步一个脚印.循序渐进的学习. 一.参数传递 每次调用函数时,都会重新创建函数所有的形参,此时所传递的实参将会初始化对应的形参. 「如果形参是非引用类型,则 ...

  8. spring学习1

    1.<context:property-placeholder/> :用于从外部属性文件中获取Bean的配置 <context:property-placeholder locati ...

  9. leetcode42

    class Solution: def calLeft(self,height,rightval,left,right): if left>=right: return 0 sums = 0 m ...

  10. spring的ioc与aop原理

    ioc(反向控制) 原理:    在编码阶段,既没有实例化对象,也没有设置依赖关系,而把它交给Spring,由Spring在运行阶段实例化.组装对象.这种做法颠覆了传统的写代码实例化.组装对象.然后一 ...