Data Manipulation with dplyr in R

select

select(data,变量名)

The filter and arrange verbs

arrange

counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed) # Add a verb to sort in descending order of public_work
counties_selected %>%arrange(desc(public_work))

filter

counties_selected <- counties %>%
select(state, county, population) # Filter for counties in the state of California that have a population above 1000000
counties_selected %>%
filter(state == "California",
population > 1000000)
#筛选多个变量
filter(id %in% c("a","b","c"...)) 存在
filter(id %in% c("a","b","c"...)) 不存在

fct_relevel {forcats}

Reorder factor levels by hand

排序,order不好使的时候

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a") # Move to the third position
fct_relevel(f, "a", after = 2) # Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3) # Revel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)

Filtering and arranging

 counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
>
> # Filter for Texas and more than 10000 people; sort in descending order of private_work
> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))
# A tibble: 169 x 6
state county population private_work public_work self_employed
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Texas Gregg 123178 84.7 9.8 5.4
2 Texas Collin 862215 84.1 10 5.8
3 Texas Dallas 2485003 83.9 9.5 6.4
4 Texas Harris 4356362 83.4 10.1 6.3
5 Texas Andrews 16775 83.1 9.6 6.8
6 Texas Tarrant 1914526 83.1 11.4 5.4
7 Texas Titus 32553 82.5 10 7.4
8 Texas Denton 731851 82.2 11.9 5.7
9 Texas Ector 149557 82 11.2 6.7
10 Texas Moore 22281 82 11.7 5.9
# ... with 159 more rows

Mutate

counties_selected <- counties %>%
select(state, county, population, public_work) # Sort in descending order of the public_workers column
counties_selected %>%
mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))
counties %>%
# Select the five columns
select(state, county, population, men, women) %>%
# Add the proportion_men variable
mutate(proportion_men = men / population) %>%
# Filter for population of at least 10,000
filter(population >= 10000) %>%
# Arrange proportion of men in descending order
arrange(desc(proportion_men))

The count verb

counties_selected %>%count(region,sort=TRUE)
counties_selected %>%count(state,wt=citizens,sort=TRUE)

Summarizing

# Summarize to find minimum population, maximum unemployment, and average income
counties_selected %>%summarize(
min_population=min(population),
max_unemployment=max(unemployment),
average_income=mean(income)
)
# Add a density column, then sort in descending order
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population),
density=total_population/total_area) %>%arrange(desc(density))

发现了,归根到底是一种函数关系,看看该怎样处理这个函数比较简单,如果写不出来,可能和小学的时候应用题写不出来有关系

top_n

按照优先级来筛选

# Extract the most populated row for each state
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop)

Selecting

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs.

The colon (

Data Manipulation with dplyr in R的更多相关文章

  1. Data manipulation primitives in R and Python

    Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ...

  2. Best packages for data manipulation in R

    dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ...

  3. The dplyr package has been updated with new data manipulation commands for filters, joins and set operations.(转)

    dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ...

  4. java.sql.SQLException: Can not issue data manipulation statements with executeQuery().

    1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ...

  5. Can not issue data manipulation statements with executeQuery()错误解决

    转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ...

  6. 数据库原理及应用-SQL数据操纵语言(Data Manipulation Language)和嵌入式SQL&存储过程

    2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ...

  7. Can not issue data manipulation statements with executeQuery().解决方案

    这个错误提示是说无法发行sql语句到指定的位置 错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ...

  8. Can not issue data manipulation statements with executeQuery()的解决方案

     Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ...

  9. 【转】Hive Data Manipulation Language

    Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ...

随机推荐

  1. C#设计模式学习笔记:(3)抽象工厂模式

    本笔记摘抄自:https://www.cnblogs.com/PatrickLiu/p/7596897.html,记录一下学习过程以备后续查用. 一.引言 接上一篇C#设计模式学习笔记:简单工厂模式( ...

  2. day 9 深浅拷贝

    浅copy 现有数据 data = { "name":"alex", "age":18, "scores":{ &quo ...

  3. libgdiplus安装配置

    1.下载安装包:wget http://download.mono-project.com/sources/libgdiplus/libgdiplus0-6.0.4.tar.gz2.解压缩.编译安装 ...

  4. Unbuntu--安装VMware Tools

    实现虚拟机Ubuntu窗口自适应,以及与本地主机粘贴复制 一.安装VMware Tools 1.首先在虚拟机点击安装VMware tools,会在个人home目录下生成VMwareTools-10.3 ...

  5. LeetCode 144. 二叉树的前序遍历 (非递归)

    题目链接:https://leetcode-cn.com/problems/binary-tree-preorder-traversal/ 给定一个二叉树,返回它的 前序 遍历. /** * Defi ...

  6. sublime修改快捷键样式

    样式----------------{ "always_show_minimap_viewport": true, "auto_find_in_selection&quo ...

  7. Spark学习之路 (四)Spark的广播变量和累加器[转]

    概述 在spark程序中,当一个传递给Spark操作(例如map和reduce)的函数在远程节点上面运行时,Spark操作实际上操作的是这个函数所用变量的一个独立副本.这些变量会被复制到每台机器上,并 ...

  8. Mac苹果电脑如何格式化?

    一般而言,我们想要在Windows系统上实现格式化操作是非常容易的.然而在苹果电脑上,我们则需要通过launchpad下的磁盘工具来进行,相对而言比较麻烦.关于“苹果电脑怎么格式化”的问题也困扰着无数 ...

  9. Git常用命令简记

    创建仓库 添加需要版本控制的文件到仓库中 提交到版本库 修改位于顶端的commit的日志 分支管理 版本回退 切换与合并分支 本地版本库与远程关联 克隆 Tag的使用 问题与解决 创建git仓库 gi ...

  10. 关闭 APIPA

    遇到的问题:我在网卡2上设置了静态ip,可是出现了一个奇怪的ip地址169.254.*.*,如下图. 解决方法:关闭APIPA功能 按照下述的做法,自己在win7企业版上尝试了下,有效.不再出现169 ...