<Hands-on ML with Sklearn & TF>  Chapter 1

  1. what is ml
    1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  2. what problems to solve
    1. exist solution but a lot of hand-tuning/rules
    2. no good solutions using a traditional approach
    3. fluctuating environment
    4. get insight about conplex problem and large data
  3. type
    1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
    2. whether or not learn incrementally on the fly(online, batch)
    3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
  4. (un)supervision learning
    1. supervision : include the desired solution called labels

      • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
    2. unsupervision : without labels
      • Clustering : k-means, HCA, ecpectation maximization
      • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
      • Association rule learning : Apriori, Eclat
    3. semisupervision
      • unsupervision --> supervision
    4. reinforcement : an agent in context
      1. observe the environment
      2. select and perform action
      3. get rewards in return
  5. batch/online learning
    1. batch : offline, to known new data need to train a new version from scratch one the full dataset
    2. online : incremental learning : challenge is bad data
  6. instance-based/model-based
    1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
    2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
  7. Challenge
    1. insufficient quantity of training data
    2. nonrepresentative training data
    3. poor-quality data
    4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
    5. overfitting : regularization -> hyperparameter
    6. underfitting : powerful model; better feature; reduce construct
  8. Testing and Validating
    1. 80% of data for training 20% for testing
    2. validating : best model and hyperparameter for training set unliking perform as well on new data
      1. train multiple models with various hyperparameters using training data
      2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
    3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

   Example 1-1:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

      

课后练习挺好的

Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

  1. Notes : <Hands-on ML with Sklearn & TF> Chapter 5

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  2. Notes : <Hands-on ML with Sklearn & TF> Chapter 7

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  3. Notes : <Hands-on ML with Sklearn & TF> Chapter 6

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Notes : <Hands-on ML with Sklearn & TF> Chapter 4

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  5. Notes : <Hands-on ML with Sklearn & TF> Chapter 3

    Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...

  6. Book : <Hands-on ML with Sklearn & TF> pdf/epub

    非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...

  7. H5 Notes:PostMessage Cross-Origin Communication

    Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...

  8. H5 Notes:Navigator Geolocation

    H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...

  9. notes:spm多重比较校正

    SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

  1. 爬虫基础线程进程学习-Scrapy

    性能相关 学习参考:http://www.cnblogs.com/wupeiqi/articles/6229292.html 在编写爬虫时,性能的消耗主要在IO请求中,当单进程单线程模式下请求URL时 ...

  2. Oracle 学习笔记 (七)

    一.数据库的启动 启动数据库的三个阶段: nomount, mount,open mount 阶段:. 1.读参数文件 2.分配内存 3.启动后台进程 4.初始化部分v$视图 mount 阶段: 读参 ...

  3. Xeon Phi 《协处理器高性能编程指南》随书代码整理 part 2

    ▶ 第四章,逐步优化了一个三维卷积计算的过程 ● 基准代码 #include <stdio.h> #include <stdlib.h> #include <string ...

  4. PHPsocket、CURL、File_get_contents采集

    1.socket采集.采用最底层的,它只是建立一个长连接,然后我们自己构造http协议字符串去发送请求.例如想获取这个页面内容(http://tv.youku.com/?spm=a2hww.20023 ...

  5. leetcode322

    public class Solution { public int coinChange(int[] coins, int amount) { ) ; ]; dp[] = ; ;i <= am ...

  6. jsonArray返回

    dao <select id="selectShopInfo" resultType="java.util.HashMap"> SELECT * F ...

  7. ffmpeg使用经验

    1.工作要使用ffmpeg将视频转换成H264格式,网上查到的很多使用方法都是如下: ffmpeg -i input.mov -c:v libx264 -crf output.mov -i后面表示输入 ...

  8. mongodb相关文章

    1.Windows 平台安装 MongoDB 2.MONGODB基本命令用 3.MongoDB 教程

  9. Android Studio 3.1.3正式版的新坑。。。

    Gradle编译时没问题,运行App时候出现: java.util.NoSuchElementException java.lang.RuntimeException: com.android.bui ...

  10. linux下用python搭建简单的httpServer

    1.服务器端:python -m SimpleHTTPServer 12000 python -m :  相当于import,当做模块来启动; 后面的12000代表的是端口 使用浏览器打开如下: 2. ...