R语言scale与unscale函数

一、scale函数

R语言base库中自带数据标准化接口scale函数，函数介绍如下

Usage

scale(x, center = TRUE, scale = TRUE)

Arguments

x: a numeric matrix(like object).

center: either a logical value or a numeric vector of length equal to the number of columns of x.

scale: either a logical value or a numeric vector of length equal to the number of columns of x.

Details

The value of center determines how column centering is performed. If center is a numeric vector with length equal to the number of columns of x, then each column of x has the corresponding value from center subtracted from it. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.

The value of scale determines how column scaling is performed (after centering). If scale is a numeric vector with length equal to the number of columns of x, then each column of x is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done.

The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)

Value

For scale.default, the centered, scaled matrix. The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale"

scale方法默认进行z-score标准化，先减去均值，再除以标准差

z-score 标准化(zero-mean normalization)

也叫标准差标准化，这种方法给予原始数据的均值（mean）和标准差（standard deviation）进行数据的标准化。

经过处理的数据符合标准正态分布，即均值为0，标准差为1，其转化函数为：

其中μ为所有样本数据的均值，σ为所有样本数据的标准差。

二、unscale函数

DMwR中函数unscale可以根据scale的返回对象，还原数据

Usage

unscale(vals, norm.data, col.ids)

Arguments

vals: A numeric matrix with the values to un-scale

norm.data: A numeric and scaled matrix. This should be an object to which the function scale() was applied.

col.ids: The columns of the vals matrix that are to be un-scaled (defaults to all of them).

Value

An object with the same dimension as the parameter vals

三、使用示例

> df<-data.frame(x=c(1,2,3),y=c(2,4,6),z=c(3,6,9))

> df

x y z

1 1 2 3

2 2 4 6

3 3 6 9

> scaledData<-scale(df)

> scaledData

x y z

[1,] -1 -1 -1

[2,] 0 0 0

[3,] 1 1 1

attr(,"scaled:center")

x y z

2 4 6

attr(,"scaled:scale")

x y z

1 2 3

> unscale(scaledData,scaledData)

x y z

[1,] 1 2 3

[2,] 2 4 6

[3,] 3 6 9

> ndf<-data.frame(x=c(1,2),y=c(2,4),z=c(3,6))

> ndf

x y z

1 1 2 3

2 2 4 6

> scale(ndf,center=attr(scaledData, "scaled:center"),scale=attr(scaledData, "scaled:scale"))

x y z

[1,] -1 -1 -1

[2,] 0 0 0

attr(,"scaled:center")

x y z

2 4 6

attr(,"scaled:scale")

x y z

1 2 3

R语言scale与unscale函数的更多相关文章

python 和 R 语言中的 range() 函数
1.python 中的 range() 函数生成整数序列,常用于 for 循环的迭代. 示例: 2.R 语言中的 range() 函数返回一个数值向量中的最小值和最大中,常用于求极差. 示例: 按语: ...
R语言学习4：函数，流程控制，数据框重塑
本系列是一个新的系列,在此系列中,我将和大家共同学习R语言.由于我对R语言的了解也甚少,所以本系列更多以一个学习者的视角来完成. 参考教材:<R语言实战>第二版(Robert I.Kaba ...
R语言中的循环函数（Grouping Function）
R语言中有几个常用的函数,可以按组对数据进行处理,apply, lapply, sapply, tapply, mapply,等.这几个函数功能有些类似,下面介绍下这几个函数的用法. Apply 这是 ...
R语言kohonen包主要函数介绍
最近准备写一篇关于自组织映射 (Self-organizing map)的文章.SOM的代码很多,研究了一圈之后目前使用最顺手的是R语言的kohonen包. 这个kohonen包功能很丰富,但是接口不 ...
R语言 arules包 apriori()函数中文帮助文档(中英文对照)
apriori(arules) apriori()所属R语言包:arules Mining Associations w ...
R语言 scale()函数
1.scale() 函数 #Usage scale(x, center = TRUE, scale = TRUE) #center中心化,scale标准化 #Arguments x :a numeri ...
R语言笔记-set.seed()函数
今天查了一下R语言中set.seed(),该命令的作用是设定生成随机数的种子,种子是为了让结果具有重复性.如果不设定种子,生成的随机数无法重现. set.seed()用于设定随机数种子,一个特定的种子 ...
R 语言assign 和get 函数用法
assign函数在循环时候,给变量赋值,算是比较方便 1.给变量赋值 for (i in 1:(length(rowSeq)-1)){ assign(paste("nginx_server_ ...
R语言之——字符串处理函数
nchar 取字符数量的函数 length与nchar不同,length是取向量的长度 # nchar表示字符串中的字符的个数 nchar("abcd") [1] 4 # leng ...

随机推荐

JDK 1.8源码阅读 HashSet
一,前言类实现Set接口,由哈希表支持(实际上是一个 HashMap集合).HashSet集合不能保证的迭代顺序与元素存储顺序相同.HashSet集合,采用哈希表结构存储数据,保证元素唯一性的方式依 ...
git pull 冲突拉取不到新的代码
本地文件已经有冲突或者在pull的过程中拉取的文件和本地文件冲突时,拉取不到新的代码,git pull出现报错,如下: 这个时候,如果你有两种选择,如果你需要这些改动,那个你就需要手动解决冲突,然后a ...
Python subprocess.Popen() error (No such file or directory)
这个错误很容易引起误解,一般人都会认为是命令执行了,但是命令找不到作为参数对应的文件或者目录.其实还有一层含义,就是这个命令找不到,命令找不到,也会报没有这个文件或者目录的错误. 为什么找不到这个命令 ...
xslt注入
XSL(可扩展样式表语言)是一种用于转换XML文档的语言,XSLT表示的就是XSL转换,而XSL转换指的就是XML文档本身.转换后得到的一般都是不同的XML文档或其他类型文档,例如HTML文档.CSV ...
#WEB安全基础 : HTTP协议 | 0x5 URI和URL
URI(统一资源标识符)和URL(统一资源定位符)相信大家都知道URL吧,我们看看它们有什么区别 URI 长得就像这样 /images/hackr.jepg URL 长得像这样 http://hack ...
CentOS 7 nginx+tomcat9 session处理方案之session复制
我们的目标是所有服务器上都要保持用户的Session,那么将每个应用服务器中的Session信息复制到其它服务器节点上是不是就可以呢? 这就是Session的第二中处理办法:会话复制 192.168. ...
winfrom进程、线程、用户控件
一.进程一个进程就是一个程序,利用进程可以在一个程序中打开另一个程序. 1.开启某个进程Process.Start("文件缩写名"); 注意:Process要解析命名空间. 2. ...
1333：【例2-2】Blah数集
1333:[例2-2]Blah数集注意是数组,答案数组中不能有重复数字 q数组是存储答案的代码: #include<iostream> #include<cstdio> # ...
CSS 页面布局、后台管理示例
CSS 页面布局.后台管理示例页面布局 1.头部菜单 2.中间内容/中间左侧菜单 3.底部内容 <div class='pg-header'> <div style='width: ...
Docker Kubernetes 常用命令
Docker Kubernetes 常用命令增 # 通过文件名或标准输入创建资源. kubectl create # 读取指定文件内容,进行创建.(配置文件可指定json,yaml文件). kube ...

R语言scale与unscale函数

R语言scale与unscale函数的更多相关文章

随机推荐

热门专题