Notes ： <Hands-on ML with Sklearn & TF> Chapter 1

<Hands-on ML with Sklearn & TF>　　Chapter 1

what is ml
1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
what problems to solve
1. exist solution but a lot of hand-tuning/rules
2. no good solutions using a traditional approach
3. fluctuating environment
4. get insight about conplex problem and large data
type
1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
2. whether or not learn incrementally on the fly(online, batch)
3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
(un)supervision learning
1. supervision : include the desired solution called labels
  - classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
2. unsupervision : without labels
  - Clustering : k-means, HCA, ecpectation maximization
  - Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
  - Association rule learning : Apriori, Eclat
3. semisupervision
  - unsupervision --> supervision
4. reinforcement : an agent in context
  1. observe the environment
  2. select and perform action
  3. get rewards in return
batch/online learning
1. batch : offline, to known new data need to train a new version from scratch one the full dataset
2. online : incremental learning : challenge is bad data
instance-based/model-based
1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
Challenge
1. insufficient quantity of training data
2. nonrepresentative training data
3. poor-quality data
4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
5. overfitting : regularization -> hyperparameter
6. underfitting : powerful model; better feature; reduce construct
Testing and Validating
1. 80% of data for training 20% for testing
2. validating : best model and hyperparameter for training set unliking perform as well on new data
  1. train multiple models with various hyperparameters using training data
  2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

　　　Example 1-1:

import matplotlib

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import sklearn.linear_model

#load the data

oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')

gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a')

#prepare the data

def prepare_country_stats(oecd_bli, gdp_per_capita):

    #get the pandas dataframe of GDP per capita and Life satisfaction

    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]

    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")

    gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)

    gdp_per_capita.set_index("Country", inplace=True)

    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)

    return full_country_stats[["GDP per capita", 'Life satisfaction']]

country_stats = prepare_country_stats(oecd_bli, gdp_per_capita) 
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]

country_stats.to_csv('country_stats.csv',encoding='utf-8')

X = np.c_[country_stats["GDP per capita"]]

Y = np.c_[country_stats["Life satisfaction"]]

#Visualize the data

country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction')

#Select a linear model

lin_reg_model = sklearn.linear_model.LinearRegression()

#Train the model

lin_reg_model.fit(X, Y)

#plot Regression model

t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]

X = np.linspace(0, 110000, 1000)

plt.plot(X, t0 + t1 * X, "k")

plt.show()

#Make a prediction for Cyprus

X_new=[[22587]]

print(lin_reg_model.predict(X_new))

课后练习挺好的

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

Notes ： <Hands-on ML with Sklearn & TF> Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
Notes ： <Hands-on ML with Sklearn & TF> Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
Book ： <Hands-on ML with Sklearn & TF> pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
H5 Notes：PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
H5 Notes：Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
notes：spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...

随机推荐

C语言链表（Dev C++/分文件版）
头文件:quechain.h struct Question { int _id; struct Question* pre; struct Question* next; }; void chain ...
mongo官方企业版安装及数据库授权使用
通过安装.deb包的方式,系统是Ubuntu 16.04 1. Import the public key used by the package management system.(导入包管理系统 ...
HDFS 常用命令行：
1. 查看各库的存储大小 hdfs dfs -du -h /user/hive/warehouse 2. 删除HDFS 文件 hdfs dfs -rmr 绝对路径名例如:hdfs dfs -rmr ...
orcal 程序自动和手动项
orcal在电脑开机后,为了可以使用这两个服务设置为自动(为了使用),其他设置为手动(减少电脑压力):
java细节知识
代码优化细节 (1)尽量指定类.方法的final修饰符带有final修饰符的类是不可派生的.在Java核心API中,有许多应用final的例子,例如java.lang.String,整个类都是fin ...
tomcat 部署swagger 请求到后端乱码
问题: @ApiOperation(value = "", notes = "查看关键词列表") @ResponseBody @RequestMapping(v ...
ARC下野指针 EXC_BAD_ACCESS错误
一般都是多线程造成的,某一个线程在操作一个对象时,另一个线程将此对象释放,此时就有可能造成野指针的问题.一种解决办法是如果都是UI操作则将这些操作都放在主线程去执行. 通常出现此问题的地方都在RAC, ...
ArcPy开发教程1-面向ArcGIS的Python语言基础
ArcPy开发教程1-面向ArcGIS的Python语言基础联系方式:谢老师,135-4855-4328,xiexiaokui#qq.com 第一节课时间2019年2月26日上午第一节讲解:A ...
php 计算距离
function getdistance($lng1,$lat1,$lng2,$lat2){ //将角度转为狐度 $radLat1=deg2rad($lat1);//deg2rad()函数将角度转换为 ...
python入门（三）：循环
1.for i in xxx xxx: 序列(列表,元祖,字符串) xxx: 可迭代对象 >>> for i in "abc": ... print(i) ...

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1

Notes ： <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章

随机推荐

热门专题