<Hands-on ML with Sklearn & TF>  Chapter 1

  1. what is ml
    1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  2. what problems to solve
    1. exist solution but a lot of hand-tuning/rules
    2. no good solutions using a traditional approach
    3. fluctuating environment
    4. get insight about conplex problem and large data
  3. type
    1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
    2. whether or not learn incrementally on the fly(online, batch)
    3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
  4. (un)supervision learning
    1. supervision : include the desired solution called labels

      • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
    2. unsupervision : without labels
      • Clustering : k-means, HCA, ecpectation maximization
      • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
      • Association rule learning : Apriori, Eclat
    3. semisupervision
      • unsupervision --> supervision
    4. reinforcement : an agent in context
      1. observe the environment
      2. select and perform action
      3. get rewards in return
  5. batch/online learning
    1. batch : offline, to known new data need to train a new version from scratch one the full dataset
    2. online : incremental learning : challenge is bad data
  6. instance-based/model-based
    1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
    2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
  7. Challenge
    1. insufficient quantity of training data
    2. nonrepresentative training data
    3. poor-quality data
    4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
    5. overfitting : regularization -> hyperparameter
    6. underfitting : powerful model; better feature; reduce construct
  8. Testing and Validating
    1. 80% of data for training 20% for testing
    2. validating : best model and hyperparameter for training set unliking perform as well on new data
      1. train multiple models with various hyperparameters using training data
      2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
    3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

   Example 1-1:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

      

课后练习挺好的

Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

  1. Notes : <Hands-on ML with Sklearn & TF> Chapter 5

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  2. Notes : <Hands-on ML with Sklearn & TF> Chapter 7

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  3. Notes : <Hands-on ML with Sklearn & TF> Chapter 6

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  4. Notes : <Hands-on ML with Sklearn & TF> Chapter 4

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

  5. Notes : <Hands-on ML with Sklearn & TF> Chapter 3

    Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...

  6. Book : <Hands-on ML with Sklearn & TF> pdf/epub

    非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...

  7. H5 Notes:PostMessage Cross-Origin Communication

    Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...

  8. H5 Notes:Navigator Geolocation

    H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...

  9. notes:spm多重比较校正

    SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

  1. 使用jquery+css实现瀑布流布局

    虽然可以直接使用css实现瀑布流布局,但显示的方式有点问题,所以这儿就直接使用jquery+css来实现瀑布流布局,最终效果如下:      思路是通过将每个小块的position设置为relativ ...

  2. oracle自定义函数返回结果集

    首先要弄两个type,不知道什么鬼: 1. create or replace type obj_table as object ( id ), name ), ) ) 2. create or re ...

  3. ROS--导航、路径规划和SLAM

    一.用move_base导航走正方形 1. roscore 2.执行 roslaunch rbx1_bringup fake_turtlebot.launch 然后 roslaunch rbx1_na ...

  4. Oracle不能并行直接添加主键的方法:先建唯一索引后建主键

    环境:Oracle 11.2.0.3 需求:生产一张表由于前期设计不当,没有主键.现需要添加主键,数据量很大,想并行建立. 1.直接添加,提示ora-3001:未实施的功能;只能单线程建立主键 SQL ...

  5. BBS(第二天) Django之Admin 自动化管理数据页面 与创建一个用户注册的验证码

    1.admin的概念 # Admin是Django自带的一个功能强大的自动化数据管理界面 # 被授权的用户可以直接在Admin中操作数据库 # Django提供了许多针对Admin的定制功能 2. 配 ...

  6. tmp32dll\sha1-586.asm(1432) : error A2070:invalid instruction operands 编译openssl出错

    vs命令行工具编译openssl最新版本的时候报perl版本太低. 后来换了openssl 1.0.2的版本旧版本到是可以正常编译了,但是1.0.2应该是版本还是优点新. 编译的时候报了下面的错误: ...

  7. subString(index,end) 用法

    sb = sb.Substring(0, sb.Length - 1); 获取当前字符串的前一部分去掉最后一个字符

  8. c++符号常量:limits头文件

    CHAR_BIT char的位数 CHAR_MAX char的最大值 CHAR_MIN char的最小值 SCHAR_MAX signed char的最大值 SCHR_MIN signedchar的最 ...

  9. vue单页面处理SEO问题

    设置vue 单页面meta info信息 vue-meta-info,(https://github.com/muwoo/vue-meta-info)如果需要单页面SEO,可以和 prerender- ...

  10. win10系统goole浏览器安装postMan插件

    1. 首先是下载PostMan工具,可以通过谷歌插件网站查询下载postman插件工具.解压文件 2. 解压压缩包 3. 修改_metadata文件重命名为metadata文件,保存待用.修改后为: ...