<Hands-on ML with Sklearn & TF>  Chapter 1

  1. what is ml
    1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  2. what problems to solve
    1. exist solution but a lot of hand-tuning/rules
    2. no good solutions using a traditional approach
    3. fluctuating environment
    4. get insight about conplex problem and large data
  3. type
    1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
    2. whether or not learn incrementally on the fly(online, batch)
    3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
  4. (un)supervision learning
    1. supervision : include the desired solution called labels

      • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
    2. unsupervision : without labels
      • Clustering : k-means, HCA, ecpectation maximization
      • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
      • Association rule learning : Apriori, Eclat
    3. semisupervision
      • unsupervision --> supervision
    4. reinforcement : an agent in context
      1. observe the environment
      2. select and perform action
      3. get rewards in return
  5. batch/online learning
    1. batch : offline, to known new data need to train a new version from scratch one the full dataset
    2. online : incremental learning : challenge is bad data
  6. instance-based/model-based
    1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
    2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
  7. Challenge
    1. insufficient quantity of training data
    2. nonrepresentative training data
    3. poor-quality data
    4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
    5. overfitting : regularization -> hyperparameter
    6. underfitting : powerful model; better feature; reduce construct
  8. Testing and Validating
    1. 80% of data for training 20% for testing
    2. validating : best model and hyperparameter for training set unliking perform as well on new data
      1. train multiple models with various hyperparameters using training data
      2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
    3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

   Example 1-1:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

      

课后练习挺好的

Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

  1. Notes : <Hands-on ML with Sklearn & TF> Chapter 5

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  2. Notes : <Hands-on ML with Sklearn & TF> Chapter 7

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  3. Notes : <Hands-on ML with Sklearn & TF> Chapter 6

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Notes : <Hands-on ML with Sklearn & TF> Chapter 4

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  5. Notes : <Hands-on ML with Sklearn & TF> Chapter 3

    Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...

  6. Book : <Hands-on ML with Sklearn & TF> pdf/epub

    非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...

  7. H5 Notes:PostMessage Cross-Origin Communication

    Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...

  8. H5 Notes:Navigator Geolocation

    H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...

  9. notes:spm多重比较校正

    SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

  1. 四、Html列表、块、布局

  2. autocomplete input

    <html> <head> <title>jQuery UI Autocomplete - Combobox</title> <link rel= ...

  3. [Unity优化]UI优化(二):Mask组件分析

    参考链接: https://www.sohu.com/a/211665096_99940808 1.Mask组件实现原理 使用模板测试,一方面使Mask对象所在区域的模板缓冲值置为1,另一方面使被Ma ...

  4. [UnityShader基础]04.ColorMask

    语法如下: ColorMask RGB | A | 0 | 其他R,G,B,A的组合 ColorMask R,意思是输出颜色中只有R通道会被写入 ColorMask 0,意思是不会输出任何颜色 默认值 ...

  5. jquery与原生JS实现增加、减小字号功能

    预览效果: 实现代码: <!DOCTYPE html> <html lang="en"> <head> <meta charset=&qu ...

  6. js 菜单收起和展开

  7. ansible自动化运维详细教程及playbook详解

    前言 当下有许多的运维自动化工具( 配置管理 ),例如:Ansible.SaltStack.Puppet.Fabric 等. Ansible 一种集成 IT 系统的配置管理.应用部署.执行特定任务的开 ...

  8. ajax----tomact服务器运行

    一.菜鸟教程的代码本地运行 <!DOCTYPE html> <html> <head> <meta charset="utf-8"> ...

  9. Ubuntu安装pyucharm的专业版本

    看到了不错的教程,亲测有效. https://www.cnblogs.com/huozf/p/9304396.html

  10. ELK填坑总结和优化过程

    做了几周的测试,踩了无数的坑,总结一下,全是干货,给大家分享~ 一.elk 实用知识点总结 1.编码转换问题(主要就是中文乱码) (1)input 中的codec => plain 转码 cod ...