Data Manipulation with dplyr in R

select

select(data,变量名)

The filter and arrange verbs

arrange

counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed) # Add a verb to sort in descending order of public_work
counties_selected %>%arrange(desc(public_work))

filter

counties_selected <- counties %>%
select(state, county, population) # Filter for counties in the state of California that have a population above 1000000
counties_selected %>%
filter(state == "California",
population > 1000000)
#筛选多个变量
filter(id %in% c("a","b","c"...)) 存在
filter(id %in% c("a","b","c"...)) 不存在

fct_relevel {forcats}

Reorder factor levels by hand

排序,order不好使的时候

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a") # Move to the third position
fct_relevel(f, "a", after = 2) # Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3) # Revel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)

Filtering and arranging

 counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)
>
> # Filter for Texas and more than 10000 people; sort in descending order of private_work
> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))
# A tibble: 169 x 6
state county population private_work public_work self_employed
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Texas Gregg 123178 84.7 9.8 5.4
2 Texas Collin 862215 84.1 10 5.8
3 Texas Dallas 2485003 83.9 9.5 6.4
4 Texas Harris 4356362 83.4 10.1 6.3
5 Texas Andrews 16775 83.1 9.6 6.8
6 Texas Tarrant 1914526 83.1 11.4 5.4
7 Texas Titus 32553 82.5 10 7.4
8 Texas Denton 731851 82.2 11.9 5.7
9 Texas Ector 149557 82 11.2 6.7
10 Texas Moore 22281 82 11.7 5.9
# ... with 159 more rows

Mutate

counties_selected <- counties %>%
select(state, county, population, public_work) # Sort in descending order of the public_workers column
counties_selected %>%
mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))
counties %>%
# Select the five columns
select(state, county, population, men, women) %>%
# Add the proportion_men variable
mutate(proportion_men = men / population) %>%
# Filter for population of at least 10,000
filter(population >= 10000) %>%
# Arrange proportion of men in descending order
arrange(desc(proportion_men))

The count verb

counties_selected %>%count(region,sort=TRUE)
counties_selected %>%count(state,wt=citizens,sort=TRUE)

Summarizing

# Summarize to find minimum population, maximum unemployment, and average income
counties_selected %>%summarize(
min_population=min(population),
max_unemployment=max(unemployment),
average_income=mean(income)
)
# Add a density column, then sort in descending order
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population),
density=total_population/total_area) %>%arrange(desc(density))

发现了,归根到底是一种函数关系,看看该怎样处理这个函数比较简单,如果写不出来,可能和小学的时候应用题写不出来有关系

top_n

按照优先级来筛选

# Extract the most populated row for each state
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop)

Selecting

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs.

The colon (

Data Manipulation with dplyr in R的更多相关文章

  1. Data manipulation primitives in R and Python

    Data manipulation primitives in R and Python Both R and Python are incredibly good tools to manipula ...

  2. Best packages for data manipulation in R

    dplyr and data.table are amazing packages that make data manipulation in R fun. Both packages have t ...

  3. The dplyr package has been updated with new data manipulation commands for filters, joins and set operations.(转)

    dplyr 0.4.0 January 9, 2015 in Uncategorized I’m very pleased to announce that dplyr 0.4.0 is now av ...

  4. java.sql.SQLException: Can not issue data manipulation statements with executeQuery().

    1.错误描写叙述 java.sql.SQLException: Can not issue data manipulation statements with executeQuery(). at c ...

  5. Can not issue data manipulation statements with executeQuery()错误解决

    转: Can not issue data manipulation statements with executeQuery()错误解决 2012年03月27日 15:47:52 katalya 阅 ...

  6. 数据库原理及应用-SQL数据操纵语言(Data Manipulation Language)和嵌入式SQL&存储过程

    2018-02-19 18:03:54 一.数据操纵语言(Data Manipulation Language) 数据操纵语言是指插入,删除和更新语言. 二.视图(View) 数据库三级模式,两级映射 ...

  7. Can not issue data manipulation statements with executeQuery().解决方案

    这个错误提示是说无法发行sql语句到指定的位置 错误写法: 正确写法: excuteQuery是查询语句,而我要调用的是更新的语句,所以这样数据库很为难到底要干嘛,实际我想用的是更新,但是我写成了查询 ...

  8. Can not issue data manipulation statements with executeQuery()的解决方案

     Can not issue data manipulation statements with executeQuery() 报错的解决方案: 把“ResultSet rs = statement. ...

  9. 【转】Hive Data Manipulation Language

    Hive Data Manipulation Language Hive Data Manipulation Language Loading files into tables Syntax Syn ...

随机推荐

  1. springboot + mybatis 支持oracle和mysql切换含源码

    1.springboot 启动类加入bean 如下 // DatabaseIdProvider元素主要是为了支持不同的数据库@Beanpublic DatabaseIdProvider getData ...

  2. MSSQL sqlserver 统计"一个字符串"在"另一个字符串"中出现的次数的方法

    转自 http://www.maomao365.com/?p=9858  摘要: 下文讲述sqlserver中最快获取一个字符串在另一个字符串中出现个数的方法分享 实验环境:sql server 20 ...

  3. mysql常见问题解决方案

    属性顺序错误 一般情况下字段类型要放在前面,限制参数放在后面,UNSIGNEDZEROFILL 之间没有先后顺序,主键 KEY 和 auto_increment 要放在UNSIGNED ZEROFIL ...

  4. CF1230E Kamil and Making a Stream

    题目大意是求 \(\sum_{v,fa,lca(v,fa)=fa}gcd(v \to fa)\) 容易发现 \(\gcd\) 只会变小,所以根据这玩意是从上到下的,每次暴力一下就可以了,\(\gcd\ ...

  5. 面试题32 - III. 从上到下打印二叉树 III

    面试题32 - III. 从上到下打印二叉树 III 请实现一个函数按照之字形顺序打印二叉树,即第一行按照从左到右的顺序打印,第二层按照从右到左的顺序打印,第三行再按照从左到右的顺序打印,其他行以此类 ...

  6. 吴裕雄--天生自然HADOOP操作实验学习笔记:hbase的shell应用v2.0

    HRegion 当表的大小超过设置值的时候,HBase会自动地将表划分为不同的区域,每个区域包含所有行的一个子集.对用户来说,每个表是一堆数据的集合,靠主键来区分.从物理上来说,一张表被拆分成了多块, ...

  7. PHP中根据二维数组中某个字段实现排序

    想要实现二维数组中根据某个字段排序,一般可以通过数组循环对比的方式实现.这里介绍一种更简单的方法,直接通过PHP函数实现.array_multisort() :可以用来一次对多个数组进行排序,或者根据 ...

  8. Linux下快速删除大量小文件引起的磁盘inode(目录索引)过满

    1)首先建立一个空白文件夹. mkdir /tmp/empty 然后安装下rsync yum install -y rsync 2)之后使用以下语句即可快速的删除文件. rsync --delete- ...

  9. UVA10791-Minimum Sum LCM(唯一分解定理基本应用)

    原题:https://vjudge.net/problem/UVA-10791 基本思路:1.借助唯一分解定理分解数据.2.求和输出 知识点:1.筛法得素数 2.唯一分解定理模板代码 3.数论分析-唯 ...

  10. JavaDay9(上)

    Java learning_Day9(上) 本人学习视频用的是马士兵的,也在这里献上 <链接:https://pan.baidu.com/s/1qKNGJNh0GgvlJnitTJGqgA> ...