Notes : <Hands-on ML with Sklearn & TF> Chapter 1
<Hands-on ML with Sklearn & TF> Chapter 1
- what is ml
- from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
- what problems to solve
- exist solution but a lot of hand-tuning/rules
- no good solutions using a traditional approach
- fluctuating environment
- get insight about conplex problem and large data
- type
- whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
- whether or not learn incrementally on the fly(online, batch)
- whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
- (un)supervision learning
- supervision : include the desired solution called labels
- classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
- unsupervision : without labels
- Clustering : k-means, HCA, ecpectation maximization
- Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
- Association rule learning : Apriori, Eclat
- semisupervision
- unsupervision --> supervision
- reinforcement : an agent in context
- observe the environment
- select and perform action
- get rewards in return
- supervision : include the desired solution called labels
- batch/online learning
- batch : offline, to known new data need to train a new version from scratch one the full dataset
- online : incremental learning : challenge is bad data
- instance-based/model-based
- instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
- model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
- Challenge
- insufficient quantity of training data
- nonrepresentative training data
- poor-quality data
- irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
- overfitting : regularization -> hyperparameter
- underfitting : powerful model; better feature; reduce construct
- Testing and Validating
- 80% of data for training 20% for testing
- validating : best model and hyperparameter for training set unliking perform as well on new data
- train multiple models with various hyperparameters using training data
- to get generatlization error , select the model and hyperparamaters that perform best on the validation set
- cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.
Example 1-1:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))

课后练习挺好的
Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章
- Notes : <Hands-on ML with Sklearn & TF> Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- Notes : <Hands-on ML with Sklearn & TF> Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
- Book : <Hands-on ML with Sklearn & TF> pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
- H5 Notes:PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
- H5 Notes:Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
- notes:spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...
随机推荐
- 使用jquery+css实现瀑布流布局
虽然可以直接使用css实现瀑布流布局,但显示的方式有点问题,所以这儿就直接使用jquery+css来实现瀑布流布局,最终效果如下: 思路是通过将每个小块的position设置为relativ ...
- oracle自定义函数返回结果集
首先要弄两个type,不知道什么鬼: 1. create or replace type obj_table as object ( id ), name ), ) ) 2. create or re ...
- ROS--导航、路径规划和SLAM
一.用move_base导航走正方形 1. roscore 2.执行 roslaunch rbx1_bringup fake_turtlebot.launch 然后 roslaunch rbx1_na ...
- Oracle不能并行直接添加主键的方法:先建唯一索引后建主键
环境:Oracle 11.2.0.3 需求:生产一张表由于前期设计不当,没有主键.现需要添加主键,数据量很大,想并行建立. 1.直接添加,提示ora-3001:未实施的功能;只能单线程建立主键 SQL ...
- BBS(第二天) Django之Admin 自动化管理数据页面 与创建一个用户注册的验证码
1.admin的概念 # Admin是Django自带的一个功能强大的自动化数据管理界面 # 被授权的用户可以直接在Admin中操作数据库 # Django提供了许多针对Admin的定制功能 2. 配 ...
- tmp32dll\sha1-586.asm(1432) : error A2070:invalid instruction operands 编译openssl出错
vs命令行工具编译openssl最新版本的时候报perl版本太低. 后来换了openssl 1.0.2的版本旧版本到是可以正常编译了,但是1.0.2应该是版本还是优点新. 编译的时候报了下面的错误: ...
- subString(index,end) 用法
sb = sb.Substring(0, sb.Length - 1); 获取当前字符串的前一部分去掉最后一个字符
- c++符号常量:limits头文件
CHAR_BIT char的位数 CHAR_MAX char的最大值 CHAR_MIN char的最小值 SCHAR_MAX signed char的最大值 SCHR_MIN signedchar的最 ...
- vue单页面处理SEO问题
设置vue 单页面meta info信息 vue-meta-info,(https://github.com/muwoo/vue-meta-info)如果需要单页面SEO,可以和 prerender- ...
- win10系统goole浏览器安装postMan插件
1. 首先是下载PostMan工具,可以通过谷歌插件网站查询下载postman插件工具.解压文件 2. 解压压缩包 3. 修改_metadata文件重命名为metadata文件,保存待用.修改后为: ...