Data Manipulation with dplyr in R

select
The filter and arrange verbs
arrange
filter
fct_relevel {forcats}
- Filtering and arranging
Mutate
The count verb
Summarizing
top_n
Selecting
rename
transmute
Grouped mutates
Window functions

select

select(data，变量名）

The filter and arrange verbs

arrange

counties_selected <- counties %>%

  select(state, county, population, private_work, public_work, self_employed)

# Add a verb to sort in descending order of public_work

counties_selected %>%arrange(desc(public_work))

filter

counties_selected <- counties %>%

  select(state, county, population)

# Filter for counties in the state of California that have a population above 1000000

counties_selected %>%

  filter(state == "California",

         population > 1000000)

#筛选多个变量

filter(id %in% c("a","b","c"...)) 存在

filter(id %in% c("a","b","c"...)) 不存在

fct_relevel {forcats}

Reorder factor levels by hand

排序，order不好使的时候

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))

fct_relevel(f)

fct_relevel(f, "a")

fct_relevel(f, "b", "a")

# Move to the third position

fct_relevel(f, "a", after = 2)

# Relevel to the end

fct_relevel(f, "a", after = Inf)

fct_relevel(f, "a", after = 3)

# Revel with a function

fct_relevel(f, sort)

fct_relevel(f, sample)

fct_relevel(f, rev)

Filtering and arranging

 counties_selected <- counties %>%

    select(state, county, population, private_work, public_work, self_employed)

>

> # Filter for Texas and more than 10000 people; sort in descending order of private_work

> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))

# A tibble: 169 x 6

   state county  population private_work public_work self_employed

   <chr> <chr>        <dbl>        <dbl>       <dbl>         <dbl>

 1 Texas Gregg       123178         84.7         9.8           5.4

 2 Texas Collin      862215         84.1        10             5.8

 3 Texas Dallas     2485003         83.9         9.5           6.4

 4 Texas Harris     4356362         83.4        10.1           6.3

 5 Texas Andrews      16775         83.1         9.6           6.8

 6 Texas Tarrant    1914526         83.1        11.4           5.4

 7 Texas Titus        32553         82.5        10             7.4

 8 Texas Denton      731851         82.2        11.9           5.7

 9 Texas Ector       149557         82          11.2           6.7

10 Texas Moore        22281         82          11.7           5.9

# ... with 159 more rows

Mutate

counties_selected <- counties %>%

  select(state, county, population, public_work)

# Sort in descending order of the public_workers column

counties_selected %>%

  mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))

counties %>%

  # Select the five columns

  select(state, county, population, men, women) %>%

  # Add the proportion_men variable

  mutate(proportion_men = men / population) %>%

  # Filter for population of at least 10,000

  filter(population >= 10000) %>%

  # Arrange proportion of men in descending order

  arrange(desc(proportion_men))

The count verb

counties_selected %>%count(region,sort=TRUE)

counties_selected %>%count(state,wt=citizens,sort=TRUE)

Summarizing

# Summarize to find minimum population, maximum unemployment, and average income

counties_selected %>%summarize(

min_population=min(population),

max_unemployment=max(unemployment),

average_income=mean(income)

)

# Add a density column, then sort in descending order

counties_selected %>%

  group_by(state) %>%

  summarize(total_area = sum(land_area),

            total_population = sum(population),

            density=total_population/total_area) %>%arrange(desc(density))

发现了，归根到底是一种函数关系，看看该怎样处理这个函数比较简单，如果写不出来，可能和小学的时候应用题写不出来有关系

top_n

按照优先级来筛选

# Extract the most populated row for each state

counties_selected %>%

  group_by(state, metro) %>%

  summarize(total_pop = sum(population)) %>%

  top_n(1, total_pop)

Selecting

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs.

The colon (

Data Manipulation with dplyr in R的更多相关文章

Data manipulation primitives in R and Python
Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ...

Best packages for data manipulation in R
dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ...

The dplyr package has been updated with new data manipulation commands for filters, joins and set operations.（转）
dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ...

java.sql.SQLException: Can not issue data manipulation statements with executeQuery().
1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ...

Can not issue data manipulation statements with executeQuery()错误解决
转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ...

数据库原理及应用-SQL数据操纵语言（Data Manipulation Language）和嵌入式SQL&存储过程
2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ...

Can not issue data manipulation statements with executeQuery().解决方案
这个错误提示是说无法发行sql语句到指定的位置错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ...

Can not issue data manipulation statements with executeQuery()的解决方案
Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ...

【转】Hive Data Manipulation Language
Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ...

随机推荐

Redis-异步消息
关于异步消息,大家都知道,如下: 这些用起来都是比较复杂的,RabbitMQ先要创建Exchange,在创建Queue,还要将Queue和Exchange通过某种规则绑定起来.发消息之前要指定rout ...

树莓派搭载CentOS7系统初始配置
系统属性: 树莓派型号:3b SD:32GB 系统:CentOS-Userland-7-armv7hl-RaspberryPI-Minimal-1908-sda.raw 开机配置: 连接树莓派: 配件 ...

堆优化 dijkstra 简介
dijkstra 前言原本我真的不会什么 dijkstra 只用那已死的 spfa ,还有各种玄学优化,可是,我不能相信一个已死的算法,就像我不能相信自己. ps : 虽然他已经活了序我站在镜子 ...

PHP程序员福利“看免费直播，学MySQL索引优化”
六星教育了解到,MySQL是目前所知PHP最流行的关系型数据库管理系统之一,它将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性.之所以它会成为主流使用数据库,这 ...

Excel_b_1
1.Excel简介数据处理软件,lotus兼容(文件,选项,高级,拉到底,lotus,) 2.Excel功能数据存储,数据处理,数据分析,数据呈现 3.具体功能重新认识了Excel,选项,高级选 ...

word中模板的使用
新建一个word文档,修改样式库中的样式,比如各章节的标题正式格式.设计好后,将文件保存为word模板. 一般自定义的模板默认保存在”C:\Users\lizhe\Documents\自定义 Offi ...

Qt读写文件
1.头文件 #include<QFile> #include<QFileDialog> #include<QDataStream> 2.写代码前工作在ui界面拖入 ...

MySQL系列(一)：谈谈MySQL架构
MySQL整体架构与所有服务端软件一样,MySQL采用的也是C/S架构,即客户端(Client)与服务端(Server)架构,我们在使用MySQL的时候,都是以客户端的身份,发送请求连接到运行服务端 ...

C#设计模式学习笔记：(1)单例模式
本笔记摘抄自:https://www.cnblogs.com/PatrickLiu/p/8250985.html,记录一下学习过程以备后续查用. 一.引言设计模式的分类: 1)依目的: 创建型(Cr ...

adb -- cannot connect to x.x.x.x:5555“由于目标计算机积极拒绝，无法连接”
原因安卓系统未打开adb网络调试功能通过USB方式连接到安卓系统设置即可解决先通过USB线连接 adb devices 能看到所连接的设备情况下 adb root 权限提权 adb shell ...

Data Manipulation with dplyr in R

select

The filter and arrange verbs

arrange

filter

fct_relevel {forcats}

Filtering and arranging

Mutate

The count verb

Summarizing

top_n

Selecting

Data Manipulation with dplyr in R的更多相关文章

随机推荐

热门专题