Notes ： <Hands-on ML with Sklearn & TF> Chapter 1

<Hands-on ML with Sklearn & TF>　　Chapter 1

what is ml
1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
what problems to solve
1. exist solution but a lot of hand-tuning/rules
2. no good solutions using a traditional approach
3. fluctuating environment
4. get insight about conplex problem and large data
type
1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
2. whether or not learn incrementally on the fly(online, batch)
3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
(un)supervision learning
1. supervision : include the desired solution called labels
  - classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
2. unsupervision : without labels
  - Clustering : k-means, HCA, ecpectation maximization
  - Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
  - Association rule learning : Apriori, Eclat
3. semisupervision
  - unsupervision --> supervision
4. reinforcement : an agent in context
  1. observe the environment
  2. select and perform action
  3. get rewards in return
batch/online learning
1. batch : offline, to known new data need to train a new version from scratch one the full dataset
2. online : incremental learning : challenge is bad data
instance-based/model-based
1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
Challenge
1. insufficient quantity of training data
2. nonrepresentative training data
3. poor-quality data
4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
5. overfitting : regularization -> hyperparameter
6. underfitting : powerful model; better feature; reduce construct
Testing and Validating
1. 80% of data for training 20% for testing
2. validating : best model and hyperparameter for training set unliking perform as well on new data
  1. train multiple models with various hyperparameters using training data
  2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

　　　Example 1-1:

import matplotlib

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import sklearn.linear_model

#load the data

oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')

gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a')

#prepare the data

def prepare_country_stats(oecd_bli, gdp_per_capita):

    #get the pandas dataframe of GDP per capita and Life satisfaction

    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]

    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")

    gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)

    gdp_per_capita.set_index("Country", inplace=True)

    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)

    return full_country_stats[["GDP per capita", 'Life satisfaction']]

country_stats = prepare_country_stats(oecd_bli, gdp_per_capita) 
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]

country_stats.to_csv('country_stats.csv',encoding='utf-8')

X = np.c_[country_stats["GDP per capita"]]

Y = np.c_[country_stats["Life satisfaction"]]

#Visualize the data

country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction')

#Select a linear model

lin_reg_model = sklearn.linear_model.LinearRegression()

#Train the model

lin_reg_model.fit(X, Y)

#plot Regression model

t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]

X = np.linspace(0, 110000, 1000)

plt.plot(X, t0 + t1 * X, "k")

plt.show()

#Make a prediction for Cyprus

X_new=[[22587]]

print(lin_reg_model.predict(X_new))

课后练习挺好的

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

Notes ： <Hands-on ML with Sklearn & TF> Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
Book ： <Hands-on ML with Sklearn & TF> pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
H5 Notes：PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
H5 Notes：Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
notes：spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

zookeeper 集群部署
参考: https://www.cnblogs.com/linuxprobe/p/5851699.html
leetcode581
public class Solution { public int FindUnsortedSubarray(int[] nums) { , end = -, min = nums[n - ], m ...
通过yum安装php7
Linux下全局安装composer方法: //下载composercurl -sS https://getcomposer.org/installer | php //将composer.phar文 ...
centos安装tree命令
centos安装tree命令 sudo yum -y install tree windows安装tree命令我的另一篇
Python3 复制和深浅copy
赋值: 列表的赋值: list1 = ['peter','sam'] list2 = list1 print(list1,id(list1)) print(list2,id(list2)) list1 ...
从performance_schema中查看MySQL活动Session的详细执行信息
本文出处:http://www.cnblogs.com/wy123/p/7851294.html 在做数据库的异常诊断的时候,之前在SQL Server上的时候,最主要的参考信息之一就是去看当前的活动 ...
使用jQuery+huandlebars防止编码注入攻击
兼容ie8(很实用,复制过来,仅供技术参考,更详细内容请看源地址:http://www.cnblogs.com/iyangyuan/archive/2013/12/12/3471227.html) & ...
sessionStorage实现note的功能
功能图如图所示: 文本域中输入点击保存后的结果如图所示: 点击读取后的结果图: 选择山羊对应的按钮进行修改并点击保存后的结果: 选择山羊养对应的单选按钮进行删除操作后的结果图: 点击清空后的结果: 源 ...
MySql/Oracle树形结构查询
Oracle树形结构递归查询在Oracle中,对于树形查询可以使用start with ... connect by select * from treeTable start with id='1 ...
easyui - 标签属性顺序要对否则options 错误
标签属性顺序要对否则options 错误

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

随机推荐

热门专题