在R中编写函数

为啥要用函数

为了代码更加可读,减少重复代码==lazy

You can type less code, saving effort and making your analyses more readable.

You can reuse your code from project to project.

所有的可重复执行的操作均可以放到函数里面

in R function are a type of variable ,so you assign functions using left arrow,just like anything else

args(函数名)

查询函数的参数,可以等同于help

> args(rank)
function (x, na.last = TRUE, ties.method = c("average", "first",
"last", "random", "max", "min"))

创建一个函数的步骤

# Update the function so heads have probability p_head
toss_coin <- function(n_flips,p_head) {
coin_sides <- c("head", "tail")
# Define a vector of weights
weights <- c(p_head,1-p_head)
# Modify the sampling to be weighted
sample(coin_sides, n_flips, replace = TRUE,prob=weights)
} # Generate 10 coin tosses
toss_coin(10,0.8)

think of function as verbs

1.default args

在定义函数的时候就已经赋值了

cut_by_quantile <- function(x, n=5, na.rm, labels, interval_type) {
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
} # Remove the n argument from the call
cut_by_quantile(
n_visits, na.rm = FALSE,
labels = c("very low", "low", "medium", "high", "very high"),
interval_type = "(lo, hi]"
)
n是默认参数

若默认参数存在了,那么调用的时候只给形参传值好了

# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL,
interval_type = c("(lo, hi]", "[lo, hi)")) {
# Match the interval_type argument
interval_type <- match.arg(interval_type)
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
} # Remove the interval_type argument from the call
cut_by_quantile(n_visits)

Passing arguments between functions

# From previous steps
get_reciprocal <- function(x) {
1 / x
}
calc_harmonic_mean <- function(x) {
x %>%
get_reciprocal() %>%
mean() %>%
get_reciprocal()
} std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))
calc_harmonic_mean <- function(x, ...) {
x %>%
get_reciprocal() %>%
mean(...) %>%
get_reciprocal()
}
>
> std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,na.rm=TRUE))
# A tibble: 11 x 2
sector hmean_pe_ratio
<chr> <dbl>
1 Communication Services 17.5
2 Consumer Discretionary 15.2
3 Consumer Staples 19.8
4 Energy 13.7
5 Financials 12.9
6 Health Care 26.6
7 Industrials 18.2
8 Information Technology 21.6
9 Materials 16.3
10 Real Estate 32.5
11 Utilities 23.9

Checking arguments

You can use the assert_*() functions from assertive to check inputs and throw errors when they fail.

calc_harmonic_mean <- function(x, na.rm = FALSE) {
# Assert that x is numeric
assert_is_numeric(x)
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}

弹出报错

calc_harmonic_mean <- function(x, na.rm = FALSE) {
assert_is_numeric(x)
# Check if any values of x are non-positive
if(any(is_non_positive(x), na.rm = TRUE)) {
# Throw an error
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
} # See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)

固定函数参数

一直存在的且默认

Returning values from functions

Sometimes, you don't need to run through the whole body of a function to get the answer. In that case you can return early from that function using return().

To check if x is divisible by n, you can use is_divisible_by(x, n) from assertive.

Alternatively, use the modulo operator, %%. x %% n gives the remainder when dividing x by n, so x %% n == 0 determines whether x is divisible by n. Try 1:10 %% 3 == 0 in the console.

To solve this exercise, you need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn't a century (like 1904 but not 1900 or 1905).

is_leap_year <- function(year) {
# If year is div. by 400 return TRUE
if(is_divisible_by(year, 400)) {
return(TRUE)
}
# If year is div. by 100 return FALSE
if(is_divisible_by(year, 100)) {
return(FALSE)
}
# If year is div. by 4 return TRUE
if(is_divisible_by(year, 4)) {
return(TRUE)
}
# Otherwise return FALSE
FALSE
}
# Define a pipeable plot fn with data and formula args
pipeable_plot <- function(data, formula) {
# Call plot() with the formula interface
plot(formula, data)
# Invisibly return the input dataset
invisible(data)
} # Draw the scatter plot of dist vs. speed again
plt_dist_vs_speed <- cars %>%
pipeable_plot(dist ~ speed) # Now the plot object has a value
plt_dist_vs_speed

augments

Given an R statistical model or other non-tidy object, add columns to the original dataset such as predictions, residuals and cluster assignments.

> # Use broom tools to get a list of 3 data frames
> list(
# Get model-level values
model = glance(model),
# Get coefficient-level values
coefficients = tidy(model),
# Get observation-level values
observations = augment(model)
)
$model
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
1 18850. 345 -6425. 12864. 12891. 11529. 339 $coefficients
# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.09 0.0279 146. 0.
2 genderfemale 0.374 0.0212 17.6 2.18e- 69
3 income($25k,$55k] -0.0199 0.0267 -0.746 4.56e- 1
4 income($55k,$95k] -0.581 0.0343 -16.9 3.28e- 64
5 income($95k,$Inf) -0.578 0.0310 -18.7 6.88e- 78
6 travel(0.25h,4h] -0.627 0.0217 -28.8 5.40e-183
7 travel(4h,Infh) -2.42 0.0492 -49.3 0. $observations
# A tibble: 346 x 12
.rownames n_visits gender income travel .fitted .se.fit .resid .hat
<chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 25 2 female ($95k~ (4h,I~ 1.46 0.0502 -1.24 0.0109
2 26 1 female ($95k~ (4h,I~ 1.46 0.0502 -1.92 0.0109
3 27 1 male ($95k~ (0.25~ 2.88 0.0269 -5.28 0.0129
4 29 1 male ($95k~ (4h,I~ 1.09 0.0490 -1.32 0.00711
5 30 1 female ($55k~ (4h,I~ 1.46 0.0531 -1.92 0.0121
6 31 1 male [$0,$~ [0h,0~ 4.09 0.0279 -10.4 0.0465
7 33 80 female [$0,$~ [0h,0~ 4.46 0.0235 -0.710 0.0479
8 34 104 female ($95k~ [0h,0~ 3.88 0.0261 6.90 0.0332
9 35 55 male ($25k~ (0.25~ 3.44 0.0222 3.85 0.0153
10 36 350 female ($25k~ [0h,0~ 4.44 0.0206 21.5 0.0360
# ... with 336 more rows, and 3 more variables: .sigma <dbl>, .cooksd <dbl>,
# .std.resid <dbl>

Magnificent multi-assignment! Returning many values is as easy as collecting them into a list. The groomed model has data frames that are easy to program against.

# From previous step
groom_model <- function(model) {
list(
model = glance(model),
coefficients = tidy(model),
observations = augment(model)
)
} # Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model) # See these individual variables
mdl; cff; obs

Environments

> # From previous step
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
>
> # Convert the list to an environment
> rsa_env <- list2env(rsa_lst )
>
> # List the structure of each variable
> ls.str(rsa_env)
capitals : Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 2 variables:
$ city : chr "Cape Town" "Bloemfontein" "Pretoria"
$ type_of_capital: chr "Legislative" "Judicial" "Administrative"
national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900

查看父类的环境变量

> # From previous steps
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
> rsa_env <- list2env(rsa_lst)
>
> # Find the parent environment of rsa_env
> parent <- parent.env(rsa_env)
>
> # Print its name
> environmentName(parent )
[1] "R_GlobalEnv"

exists

查看是否定义了一个变量exists packages

> # Does population exist in rsa_env?
> exists("population", envir = rsa_env)
[1] TRUE
>
> # Does population exist in rsa_env, ignoring inheritance?
> exists("population", envir = rsa_env,inherits=FALSE)
[1] FALSE

局部变量和全局变量 这个解释可以参考之前的python学习笔记python 函数学习笔记

这里个人觉得编程语言是在某种程度上面是相通的,还是认真的学习好一门吧。不然最后都会放弃了,加油

单位换算

# Write a function to convert sq. meters to hectares
sq_meters_to_hectares <- function(sq_meters) {
sq_meters / 10000
} # Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
} # Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
}

常见农业单位换算

  • There are 4840 square yards in an acre.
  • There are 36 inches in a yard and one inch is 0.0254 meters.
  • There are 10000 square meters in a hectare.
# Write a function to convert sq. yards to sq. meters
sq_yards_to_sq_meters <- function(sq_yards) {
sq_yards %>%
# Take the square root
sqrt() %>%
# Convert yards to meters
yards_to_meters() %>%
# Square it
raise_to_power(2)
}

单位换算有可以直接用的函数

# Load fns defined in previous steps
load_step3() # Write a function to convert bushels to kg
bushels_to_kgs <- function(bushels, crop) {
bushels %>%
# Convert bushels to lbs for this crop
bushels_to_lbs(crop) %>%
# Convert lbs to kgs
lbs_to_kgs()
}
# Wrap this code into a function
fortify_with_metric_units <- function(data, crop) {
data %>%
mutate(
farmed_area_ha = acres_to_hectares(farmed_area_acres),
yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(
yield_bushels_per_acre,
crop = crop
)
)
} # Try it on the wheat dataset
fortify_with_metric_units(wheat, crop = "wheat")

绘图

# Using corn, plot yield (kg/ha) vs. year
> ggplot(corn, aes(year, yield_kg_per_ha)) +
# Add a line layer, grouped by state
geom_line(aes(group = state)) +
# Add a smooth trend layer
geom_smooth()

inner_join()

进行函数合并简书

inner_join连接后的记录数等于”共有的记录数,也就是求交集,作用和merge的作用是一样的

# Wrap this code into a function
fortify_with_census_region<-function(data){
data %>%
inner_join(usa_census_regions, by = "state")
} # Try it on the wheat dataset
fortify_with_census_region(wheat)
# Wrap this code into a function
plot_yield_vs_year_by_region<-function(data){
plot_yield_vs_year(data) +
facet_wrap(vars(census_region))
} # Try it on the wheat dataset
plot_yield_vs_year_by_region(wheat)

Modeling grain yields


# Wrap the model code into a function
run_gam_yield_vs_year_by_region<-function(data){
gam(yield_kg_per_ha ~ s(year) + census_region, data = corn)
} # Try it on the wheat dataset
run_gam_yield_vs_year_by_region(wheat)

# Make predictions in 2050
predict_this <- data.frame(
year = 2050,
census_region = census_regions
) # Predict the yield
pred_yield_kg_per_ha <- predict(corn_model, predict_this, type = "response") predict_this %>%
# Add the prediction as a column of predict_this
mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha)

reference

Introduction to Writing Functions in R的更多相关文章

  1. 【翻译】Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么?

    0.前言 虽然很早就知道R被微软收购,也很早知道R在统计分析处理方面很强大,开始一直没有行动过...直到 直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和 ...

  2. R统计分析处理

    [翻译]Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么? 阅读目录 0.前言 1.集成开发环境 2.语法 3.数据操作 4.图形显示 5.HTML部件 ...

  3. R工具包一网打尽

    这里有很多非常不错的R包和工具. 该想法来自于awesome-machine-learning. 这里是包的导航清单,看起来更方便 >>>导航清单 通过这些翻译了解这些工具包,以后干 ...

  4. Introduction to dnorm, pnorm, qnorm, and rnorm for new biostatisticians

    原文:Introduction todnorm,pnorm,qnorm, andrnormfor new biostatisticians Today I was in Dan’s office ho ...

  5. Analytic Functions in Oracle

    Contents Overview and IntroductionHow Analytic Functions WorkThe SyntaxExamplesCalculate a running T ...

  6. R工具包

    直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和爬一些R的资料,看着看着就入迷了,这就是个大宝库了,以前怎么没发现,看来还是太狭隘了.直到前几天我看到这个 ...

  7. R2—《R in Nutshell》 读书笔记(连载)

    R in Nutshell 前言 例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshe ...

  8. a note of R software write Function

    Functionals “To become significantly more reliable, code must become more transparent. In particular ...

  9. r testifying that your code will behave as you intend.

    https://github.com/stretchr/testify Testify - Thou Shalt Write Tests    Go code (golang) set of pack ...

随机推荐

  1. Android中动态改变Listview中字体的颜色

    效果如下: 账目显示用的是Listview,要实现的功能为使其根据所在Item是“收入”还是“支出”来把数字设置成绿色或红色 方法是自定义适配器,并重写其中getView()函数,实现如下: //自定 ...

  2. vue_day01

    Vue_day01 1. 认识vue 1.1 什么是vue (1)Vue是构建界面的渐进式的js框架 (2)只关注视图层, 采用自底向上增量开发的设计. (3)Vue 的目标是通过尽可能简单的 API ...

  3. 阿里云K8S下玩.NET CORE 3.1

    1. 创建阿里云K8S集群,本文以标准托管集群为例 1.1 创建一个 2台 centos 2core 4G的 k8s 集群 1.2 创建成功的模样 2. 创建 asp.net core webapi项 ...

  4. Java连载87-酒店管理系统练习、哈希表、函数

    一.创建一个酒店的房间管理系统 需求:这个酒店有五层,并且1-2层是标准间,3-4层是双人间,5层是豪华间:我们需要实时的显现各个房间的使用状态,并且我们可以预定某一个房间. package com. ...

  5. 「BZOJ3065」带插入区间K小值 [分块]

    考虑分块,每个块都是用 链表 维护的,并保证 \(size\) 和分块相当. 我们考虑一下怎么去查询,很显然,可以对值域分块,单点修改,记录前缀和,完全ojbk了,对每个块维护一个 \(pre , p ...

  6. MY_0002:导出单位设置

    1,导出模型给canvas工具用时,设置单位为m,再参考模型所占网格大小.

  7. 聊聊智能指针 auto_ptr、shared_ptr、weak_ptr和unique_ptr

    本文为转载:https://www.cnblogs.com/zeppelin5/p/10083597.html,对作者有些地方做了修正. 手写代码是理解C++的最好办法,以几个例子说明C++四个智能指 ...

  8. springboot web - 建立路由

    一. 测试代码 @RestController @RequestMapping("/book") public class BookController { @PostMappin ...

  9. UVA10791-Minimum Sum LCM(唯一分解定理基本应用)

    原题:https://vjudge.net/problem/UVA-10791 基本思路:1.借助唯一分解定理分解数据.2.求和输出 知识点:1.筛法得素数 2.唯一分解定理模板代码 3.数论分析-唯 ...

  10. 题解 AT3717 【[ABC081A] Placing Marbles】

    题目传送门. 分析 我们可以把字符串中所有数加起来输出,那么到底怎么把一个字符变成一个数字呢.我们只需要减去字符\(0\)就可以了. 详细步骤 定义字符串\(s\)并将其输入. string s; c ...