Notes : <Hands-on ML with Sklearn & TF> Chapter 1
<Hands-on ML with Sklearn & TF> Chapter 1
- what is ml
- from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
- what problems to solve
- exist solution but a lot of hand-tuning/rules
- no good solutions using a traditional approach
- fluctuating environment
- get insight about conplex problem and large data
- type
- whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
- whether or not learn incrementally on the fly(online, batch)
- whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
- (un)supervision learning
- supervision : include the desired solution called labels
- classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
- unsupervision : without labels
- Clustering : k-means, HCA, ecpectation maximization
- Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
- Association rule learning : Apriori, Eclat
- semisupervision
- unsupervision --> supervision
- reinforcement : an agent in context
- observe the environment
- select and perform action
- get rewards in return
- supervision : include the desired solution called labels
- batch/online learning
- batch : offline, to known new data need to train a new version from scratch one the full dataset
- online : incremental learning : challenge is bad data
- instance-based/model-based
- instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
- model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
- Challenge
- insufficient quantity of training data
- nonrepresentative training data
- poor-quality data
- irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
- overfitting : regularization -> hyperparameter
- underfitting : powerful model; better feature; reduce construct
- Testing and Validating
- 80% of data for training 20% for testing
- validating : best model and hyperparameter for training set unliking perform as well on new data
- train multiple models with various hyperparameters using training data
- to get generatlization error , select the model and hyperparamaters that perform best on the validation set
- cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.
Example 1-1:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

课后练习挺好的
Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章
- Notes : <Hands-on ML with Sklearn & TF> Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
- Book : <Hands-on ML with Sklearn & TF> pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
- H5 Notes:PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
- H5 Notes:Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
- notes:spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...
随机推荐
- C语言 链表(Dev C++/分文件版)
头文件:quechain.h struct Question { int _id; struct Question* pre; struct Question* next; }; void chain ...
- mongo官方企业版安装及数据库授权使用
通过安装.deb包的方式,系统是Ubuntu 16.04 1. Import the public key used by the package management system.(导入包管理系统 ...
- HDFS 常用命令行:
1. 查看各库的存储大小 hdfs dfs -du -h /user/hive/warehouse 2. 删除HDFS 文件 hdfs dfs -rmr 绝对路径名 例如:hdfs dfs -rmr ...
- orcal 程序自动和手动项
orcal在电脑开机后,为了可以使用 这两个服务设置为自动(为了使用),其他设置为手动(减少电脑压力):
- java细节知识
代码优化细节 (1)尽量指定类.方法的final修饰符 带有final修饰符的类是不可派生的.在Java核心API中,有许多应用final的例子,例如java.lang.String,整个类都是fin ...
- tomcat 部署swagger 请求到后端乱码
问题: @ApiOperation(value = "", notes = "查看关键词列表") @ResponseBody @RequestMapping(v ...
- ARC下野指针 EXC_BAD_ACCESS错误
一般都是多线程造成的,某一个线程在操作一个对象时,另一个线程将此对象释放,此时就有可能造成野指针的问题.一种解决办法是如果都是UI操作则将这些操作都放在主线程去执行. 通常出现此问题的地方都在RAC, ...
- ArcPy开发教程1-面向ArcGIS的Python语言基础
ArcPy开发教程1-面向ArcGIS的Python语言基础 联系方式:谢老师,135-4855-4328,xiexiaokui#qq.com 第一节课 时间2019年2月26日 上午第一节 讲解:A ...
- php 计算 距离
function getdistance($lng1,$lat1,$lng2,$lat2){ //将角度转为狐度 $radLat1=deg2rad($lat1);//deg2rad()函数将角度转换为 ...
- python入门(三):循环
1.for i in xxx xxx: 序列(列表,元祖,字符串) xxx: 可迭代对象 >>> for i in "abc": ... print(i) ...