Notes : <Hands-on ML with Sklearn & TF> Chapter 1
<Hands-on ML with Sklearn & TF> Chapter 1
- what is ml
- from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
- what problems to solve
- exist solution but a lot of hand-tuning/rules
- no good solutions using a traditional approach
- fluctuating environment
- get insight about conplex problem and large data
- type
- whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
- whether or not learn incrementally on the fly(online, batch)
- whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
- (un)supervision learning
- supervision : include the desired solution called labels
- classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
- unsupervision : without labels
- Clustering : k-means, HCA, ecpectation maximization
- Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
- Association rule learning : Apriori, Eclat
- semisupervision
- unsupervision --> supervision
- reinforcement : an agent in context
- observe the environment
- select and perform action
- get rewards in return
- supervision : include the desired solution called labels
- batch/online learning
- batch : offline, to known new data need to train a new version from scratch one the full dataset
- online : incremental learning : challenge is bad data
- instance-based/model-based
- instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
- model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
- Challenge
- insufficient quantity of training data
- nonrepresentative training data
- poor-quality data
- irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
- overfitting : regularization -> hyperparameter
- underfitting : powerful model; better feature; reduce construct
- Testing and Validating
- 80% of data for training 20% for testing
- validating : best model and hyperparameter for training set unliking perform as well on new data
- train multiple models with various hyperparameters using training data
- to get generatlization error , select the model and hyperparamaters that perform best on the validation set
- cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.
Example 1-1:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

课后练习挺好的
Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章
- Notes : <Hands-on ML with Sklearn & TF> Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
- Book : <Hands-on ML with Sklearn & TF> pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
- H5 Notes:PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
- H5 Notes:Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
- notes:spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...
随机推荐
- ADB文件及文件夹操作
1.创建文件夹: adb shell mkdir /data/local/tmp/local多级的一次只能创建一级adb shell mkdir /data/local/tmp/local/tmp 2 ...
- 1-hadoop、mr
1.HDFS的优缺点: 优点: ① 高容错 ② 可扩展 ③ 适合大文件存储 ④ 可构建在廉价的机器上 缺点: ① 高延迟 ② 文件不能修改 ③ 不适合小文件存储 2.HDFS架构(类似于文件系统): ...
- clear 属性
clear属性:规定元素的哪一侧不允许有其他的浮动元素 Example: <html> <head> <style type="text/css"&g ...
- bower install的时候报错
安装错误提示:C:\Scott>bower install bootstrap bower not-cached git://github.com/twbs/bootstrap.git#* bo ...
- Java内存泄漏定位
Java虚拟机内存分为五个区域:方法区,堆,虚拟机栈,本地方法栈,程序计数器.其中方法区和堆是java虚拟机共享的内存区域,虚拟机栈,本地方法栈,程序计数器是线程私有的. 程序计数器(Program ...
- Source Code Pro 编程字体
Source Code Pro :是 Adobe 公司号称最佳的编程字体,而且还是开源的 它非常适合用于阅读代码,支持 Linux.Mac OS X 和 Windows 等操作系统,而且无论商业或个人 ...
- C++学习基础十六-- 函数学习笔记
C++ Primer 第七章-函数学习笔记 一步一个脚印.循序渐进的学习. 一.参数传递 每次调用函数时,都会重新创建函数所有的形参,此时所传递的实参将会初始化对应的形参. 「如果形参是非引用类型,则 ...
- spring学习1
1.<context:property-placeholder/> :用于从外部属性文件中获取Bean的配置 <context:property-placeholder locati ...
- leetcode42
class Solution: def calLeft(self,height,rightval,left,right): if left>=right: return 0 sums = 0 m ...
- spring的ioc与aop原理
ioc(反向控制) 原理: 在编码阶段,既没有实例化对象,也没有设置依赖关系,而把它交给Spring,由Spring在运行阶段实例化.组装对象.这种做法颠覆了传统的写代码实例化.组装对象.然后一 ...