Data Manipulation with dplyr in R
Data Manipulation with dplyr in R
select
select(data,变量名)
The filter and arrange verbs
arrange
counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
# Add a verb to sort in descending order of public_work
counties_selected %>%arrange(desc(public_work))
filter
counties_selected <- counties %>%
select(state, county, population)
# Filter for counties in the state of California that have a population above 1000000
counties_selected %>%
filter(state == "California",
population > 1000000)
#筛选多个变量
filter(id %in% c("a","b","c"...)) 存在
filter(id %in% c("a","b","c"...)) 不存在
fct_relevel {forcats}
Reorder factor levels by hand
排序,order不好使的时候
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a")
# Move to the third position
fct_relevel(f, "a", after = 2)
# Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3)
# Revel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)
Filtering and arranging
counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
>
> # Filter for Texas and more than 10000 people; sort in descending order of private_work
> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))
# A tibble: 169 x 6
state county population private_work public_work self_employed
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Texas Gregg 123178 84.7 9.8 5.4
2 Texas Collin 862215 84.1 10 5.8
3 Texas Dallas 2485003 83.9 9.5 6.4
4 Texas Harris 4356362 83.4 10.1 6.3
5 Texas Andrews 16775 83.1 9.6 6.8
6 Texas Tarrant 1914526 83.1 11.4 5.4
7 Texas Titus 32553 82.5 10 7.4
8 Texas Denton 731851 82.2 11.9 5.7
9 Texas Ector 149557 82 11.2 6.7
10 Texas Moore 22281 82 11.7 5.9
# ... with 159 more rows
Mutate
counties_selected <- counties %>%
select(state, county, population, public_work)
# Sort in descending order of the public_workers column
counties_selected %>%
mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))
counties %>%
# Select the five columns
select(state, county, population, men, women) %>%
# Add the proportion_men variable
mutate(proportion_men = men / population) %>%
# Filter for population of at least 10,000
filter(population >= 10000) %>%
# Arrange proportion of men in descending order
arrange(desc(proportion_men))
The count verb
counties_selected %>%count(region,sort=TRUE)
counties_selected %>%count(state,wt=citizens,sort=TRUE)
Summarizing
# Summarize to find minimum population, maximum unemployment, and average income
counties_selected %>%summarize(
min_population=min(population),
max_unemployment=max(unemployment),
average_income=mean(income)
)
# Add a density column, then sort in descending order
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population),
density=total_population/total_area) %>%arrange(desc(density))
发现了,归根到底是一种函数关系,看看该怎样处理这个函数比较简单,如果写不出来,可能和小学的时候应用题写不出来有关系
top_n
按照优先级来筛选
# Extract the most populated row for each state
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop)
Selecting
Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs. Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ... dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ... dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ... 1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ... 转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ... 2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ... 这个错误提示是说无法发行sql语句到指定的位置 错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ... Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ... Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ... MyISAM与InnoDB的区别是什么? 1. 存储结构 MyISAM:每个MyISAM在磁盘上存储成三个文件.第一个文件的名字以表的名字开始,扩展名指出文件类型..frm文件存储表定义.数据文件的扩 ... 今天打开idea,出现了上面的话,试了网上的很多办法,获取注册码的那个方法是最常见的,那个网站现在不提供注册码了. ----两种方法-----**1)把提示框的x点掉,会自动打开idea**按最开始安 ... 一.简介 BloodHound是一款将域内信息可视化的单页的web应用程序,是一款在域内进行信息收集的免费工具: bloodhound通过图与线的形式,将域内用户.计算机.组.会话.ACL以及域内所有 ... ES6介绍 ES6, 全称 ECMAScript 6.0 ,2015.06 发版. let 和 const命令 let命令 let 命令,用来声明变量.它的用法类似于var,区别在于var声明的变量全 ... /* 题目: 将二叉搜索树转化为排序的双向链表,不能创建新的节点, 只能调整节点的指向,返回双向链表的头节点. */ /* 思路: 递归. 二叉搜索树的中序遍历得到的序列是递增序列. 左子树left& ... 题目描述: 解答出来了上一个题目的你现在可是春风得意,你们走向了下一个题目所处的地方 你一看这个题目傻眼了,这明明是一个数学题啊!!!可是你的数学并不好.扭头看向小鱼,小鱼哈哈一笑 ,让你在学校里面不 ... #开启自动挂载服务 systemctl start autofs #设置开机自动挂载 systemctl enable autofs #光盘自动挂载路径/misc/cd “包含repoda ... 示例程序是同步套接字程序,功能很简单,只是客户端发给服务器一条信息,服务器向客户端返回一条信息,是一个简单示例,也是一个最基本的socket编程流程. 简单步骤说明: 1.用指定的port, ip 建 ... 本篇内容有clip_by_value.clip_by_norm.gradient clipping 1.tf.clip_by_value a = tf.range(10) print(a) # if ... [SDOI2010]粟粟的书架 考虑暴力怎么做 显然是提取出来 (x2-x1+1)*(y2-y1+1) 个数字拿出来 然后从大到小排序 然后就可以按次取数了- 然而接下来看数据范围 \(50\%\ r ...
The colon (
Data Manipulation with dplyr in R的更多相关文章
随机推荐