Importing data in R 学习笔记1
- flat files:CSV
- txt文件

Importing data in R 学习笔记1

flat files:CSV

# Import swimming_pools.csv correctly: pools

pools<-read.csv("swimming_pools.csv",stringsAsFactors=FALSE)

txt文件

read.delim("name.txt",header=TRUE)

转化为table

# Path to the hotdogs.txt file: path

> path <- file.path("data", "hotdogs.txt")

>

> # Import the hotdogs.txt file: hotdogs

> hotdogs <- read.table(path,

                        sep = "\t",

                        col.names = c("type", "calories", "sodium"))

>

> # Call head() on hotdogs

> head(hotdogs)

  type calories sodium

1 Beef      186    495

2 Beef      181    477

3 Beef      176    425

4 Beef      149    322

5 Beef      184    482

6 Beef      190    587

tibble:简单数据框

read_对比read.

前者产生一个简单的数据框，并且会展示每一列的数据类型

packages：readr

read_csv()

读入csv格式

read_csv and read_tsv are special cases of the general read_delim. They're useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively. read_csv2 uses ; for separators, instead of ,. This is common in European countries which use , as the decimal separator

read_tsv

读入txt格式

> # readr is already loaded

>

> # Column names

> properties <- c("area", "temp", "size", "storage", "method",

                  "texture", "flavor", "moistness")

>

> # Import potatoes.txt: potatoes

读入数据并指定行名

> potatoes<-read_tsv("potatoes.txt",col_names=properties)

Parsed with column specification:

cols(

  area = col_integer(),

  temp = col_integer(),

  size = col_integer(),

  storage = col_integer(),

  method = col_integer(),

  texture = col_double(),

  flavor = col_double(),

  moistness = col_double()

)

> col_names=properties

>

> # Call head() on potatoes

> head(potatoes)

# A tibble: 6 x 8

   area  temp  size storage method texture flavor moistness

  <int> <int> <int>   <int>  <int>   <dbl>  <dbl>     <dbl>

1     1     1     1       1      1     2.9    3.2       3

2     1     1     1       1      2     2.3    2.5       2.6

3     1     1     1       1      3     2.5    2.8       2.8

4     1     1     1       1      4     2.1    2.9       2.4

5     1     1     1       1      5     1.9    2.8       2.2

6     1     1     1       2      1     1.8    3         1.7

read_delim()

# Column names

> properties <- c("area", "temp", "size", "storage", "method",

                  "texture", "flavor", "moistness")

>

> # Import potatoes.txt using read_delim(): potatoes

> potatoes <- read_delim("potatoes.txt", delim = "\t", col_names = properties)

Parsed with column specification:

cols(

  area = col_integer(),

  temp = col_integer(),

  size = col_integer(),

  storage = col_integer(),

  method = col_integer(),

  texture = col_double(),

  flavor = col_double(),

  moistness = col_double()

)

>

> # Print out potatoes

> potatoes

# A tibble: 160 x 8

    area  temp  size storage method texture flavor moistness

   <int> <int> <int>   <int>  <int>   <dbl>  <dbl>     <dbl>

 1     1     1     1       1      1     2.9    3.2       3

 2     1     1     1       1      2     2.3    2.5       2.6

 3     1     1     1       1      3     2.5    2.8       2.8

 4     1     1     1       1      4     2.1    2.9       2.4

 5     1     1     1       1      5     1.9    2.8       2.2

 6     1     1     1       2      1     1.8    3         1.7

 7     1     1     1       2      2     2.6    3.1       2.4

 8     1     1     1       2      3     3      3         2.9

 9     1     1     1       2      4     2.2    3.2       2.5

10     1     1     1       2      5     2      2.8       1.9

# ... with 150 more rows

data.table()

fread

make up some column names itself

more convenience

 # Import columns 6 and 8 of potatoes.csv: potatoes

> potatoes<-fread("potatoes.csv",select=c(6,8))

>

> # Plot texture (x) and moistness (y) of potatoes

> plot(potatoes$texture,potatoes$moistness)

readxl

excel_sheets()

library(readxl)

# Print the names of all worksheets

excel_sheets("urbanpop.xlsx")

# Read all Excel sheets with lapply(): pop_list

pop_list<- lapply(excel_sheets("urbanpop.xlsx"),

                      read_excel,

                      path = "urbanpop.xlsx")

# Display the structure of pop_list

str(pop_list)

read_excel()

# Import the second sheet of urbanpop.xlsx, skipping the first 21 rows: urbanpop_sel

urbanpop_sel <- read_excel("urbanpop.xlsx", sheet = 2, col_names = FALSE, skip = 21)

# Print out the first observation from urbanpop_sel

urbanpop_sel[1,]

gdata

read.xls()

读入xls格式的数据

# Column names for urban_pop

> columns <- c("country", paste0("year_", 1967:1974))

>

> # Finish the read.xls call

> urban_pop <- read.xls("urbanpop.xls", sheet = 2,

                        skip = 50, header = FALSE, stringsAsFactors = FALSE,

                        col.names = columns)

>

> # Print first 10 observation of urban_pop

> head(urban_pop,n=10)

              country   year_1967   year_1968   year_1969   year_1970

1              Cyprus   231929.74   237831.38   243983.34   250164.52

2      Czech Republic  6204409.91  6266304.50  6326368.97  6348794.89

3             Denmark  3777552.62  3826785.08  3874313.99  3930042.97

4            Djibouti    77788.04    84694.35    92045.77    99845.22

5            Dominica    27550.36    29527.32    31475.62    33328.25

6  Dominican Republic  1535485.43  1625455.76  1718315.40  1814060.00

7             Ecuador  2059355.12  2151395.14  2246890.79  2345864.41

8               Egypt 13798171.00 14248342.19 14703858.22 15162858.52

9         El Salvador  1345528.98  1387218.33  1429378.98  1472181.26

10  Equatorial Guinea    75364.50    77295.03    78445.74    78411.07

     year_1971   year_1972   year_1973   year_1974

1    261213.21   272407.99   283774.90   295379.83

2   6437055.17  6572632.32  6718465.53  6873458.18

3   3981360.12  4028247.92  4076867.28  4120201.43

4    107799.69   116098.23   125391.58   136606.25

5     34761.52    36049.99    37260.05    38501.47

6   1915590.38  2020157.01  2127714.45  2238203.87

7   2453817.78  2565644.81  2681525.25  2801692.62

8  15603661.36 16047814.69 16498633.27 16960827.93

9   1527985.34  1584758.18  1642098.95  1699470.87

10    77055.29    74596.06    71438.96    68179.26

getSheets()

查看一个excel文件有多少的sheet，输出每个sheet的名字

XLConnect

loadWorkbook()

主要是加载excel文件

When working with XLConnect, the first step will be to load a workbook in your R session with loadWorkbook(); this function will build a "bridge" between your Excel file and your R session.

library("XLConnect")

>

> # Build connection to urbanpop.xlsx: my_book

> my_book<-loadWorkbook("urbanpop.xlsx")

>

> # Print out the class of my_book

> class(my_book)

[1] "workbook"

attr(,"package")

[1] "XLConnect"

readWorksheet（）

读取excel文件

所以顺序肯定是先加载再读取啊。

# Import columns 3, 4, and 5 from second sheet in my_book: urbanpop_sel

urbanpop_sel <- readWorksheet(my_book, sheet = 2,startCol=3,endCol=5)

# Import first column from second sheet in my_book: countries

countries<-readWorksheet(my_book, sheet = 2,startCol=1,endCol=1)

# cbind() urbanpop_sel and countries together: selection

selection<-cbind(countries,urbanpop_sel)

createSheet()

在已经有的excel中创建一个sheet，创建一个空的sheet

# Build connection to urbanpop.xlsx

> my_book <- loadWorkbook("urbanpop.xlsx")

>

> # Add a worksheet to my_book, named "data_summary"

> createSheet(my_book,"data_summary")

>

> # Use getSheets() on my_book

> getSheets(my_book)

[1] "1960-1966"    "1967-1974"    "1975-2011"    "data_summary"

writeWorksheet()

Writes data to worksheets of a '>workbook.

saveWorkbook

保存工作表，就是存到磁盘上

# Build connection to urbanpop.xlsx

my_book <- loadWorkbook("urbanpop.xlsx")

# Add a worksheet to my_book, named "data_summary"

createSheet(my_book, "data_summary")

# Create data frame: summ

sheets <- getSheets(my_book)[1:3]

dims <- sapply(sheets, function(x) dim(readWorksheet(my_book, sheet = x)), USE.NAMES = FALSE)

summ <- data.frame(sheets = sheets,

                   nrows = dims[1, ],

                   ncols = dims[2, ])

# Add data in summ to "data_summary" sheet

writeWorksheet(my_book,summ,"data_summary")

# Save workbook as summary.xlsx

 saveWorkbook(my_book,"summary.xlsx")

renameSheet()

给sheet表重命名

# Rename "data_summary" sheet to "summary"

renameSheet(my_book, "data_summary", "summary")

# Print out sheets of my_book

getSheets(my_book)

# Save workbook to "renamed.xlsx"

saveWorkbook(my_book, file = "renamed.xlsx")

我发现我自己真的很容易丢参数哦，然后死活调不出来。。。===。。。苦恼的人儿

removeSheet()

删除指定sheet

library(XLConnect)

# Build connection to renamed.xlsx: my_book

my_book<-loadWorkbook("renamed.xlsx")

# Remove the fourth sheet

removeSheet(my_book,sheet="summary")

# Save workbook to "clean.xlsx"

saveWorkbook(my_book,"clean.xlsx")

Importing data in R 1的更多相关文章

(转) 6 ways of mean-centering data in R
6 ways of mean-centering data in R 怎么scale我们的数据? 还是要看我们自己数据的特征. 如何找到我们数据的中心? Cluster analysis with K ...
Analyzing Microarray Data with R
1) 熟悉CEL file 从 NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24460)下载GSE24460. 将得到 ...
R0—New packages for reading data into R — fast
小伙伴儿们有福啦,2015年4月10日,Hadley Wickham大牛(开发了著名的ggplots包和plyr包等)和RStudio小组又出新作啦,新作品readr包和readxl包分别用于R读取t ...
扩增子分析QIIME2-2数据导入Importing data
# 激活工作环境 source activate qiime2-2017.8 # 建立工作目录 mkdir -p qiime2-importing-tutorial cd qiime2-importi ...
Cleaning Data in R
目录 R 中清洗数据常见三种查看数据的函数 Exploring raw data 使用dplyr包里面的glimpse函数查看数据结构 \(提取指定元素 ```{r} # Histogram of ...
tensorflow Importing Data
tf.data API可以建立复杂的输入管道.它可以从分布式文件系统中汇总数据,对每个图像数据施加随机扰动,随机选择图像组成一个批次训练.一个文本模型的管道可能涉及提取原始文本数据的符号,使用查询表将 ...
Visualization data using R and bioconductor.--NCBI
csharp:asp.net Importing or Exporting Data from Worksheets using aspose cell
using System; using System.Data; using System.Configuration; using System.Collections; using System. ...
Tutorial: Importing and analyzing data from a Web Page using Power BI Desktop
In this tutorial, you will learn how to import a table of data from a Web page and create a report t ...

随机推荐

checkbox 样式重写
css样式 .me-checkbox:checked { background: #1673ff } .me-checkbox { outline: none;/*轮廓*/ width: 25px; ...
MySQL概述及入门(二)
MySql概述及入门(二) MySQL架构逻辑架构图: 执行流程图: MySQL的存储引擎查询数据库支持的存储引擎执行: show engines: 多存储引擎是mysql有别于其他数据库的一大 ...
剑指offer-面试题33-二叉搜索树的后序遍历序列-二叉树遍历
/* 题目: 给定一个序列,判断它是否为某个二叉搜索树的后序遍历. */ /* 思路: 二叉搜索树:左子树<根节点<右子树. 序列的最右端为根节点,小于根节点的左半部分为左子树,大于根节点 ...
opencv —— morphologyEx 开运算、闭运算、形态学梯度、顶帽、黑帽
开运算:先腐蚀后膨胀. 能够排除小亮点. 闭运算:先膨胀后腐蚀. 能够排除小黑点. 形态学梯度:膨胀图 — 腐蚀图. 对二值图像进行这一操作,可将图块的边缘突出出来,故可用来保留物体边缘轮廓. 顶帽: ...
mac 电脑画图软件相关
sketchbook 免费但是不太好用 sketch, https://www.newasp.net/soft/327640.html 注意:安装前,请开启任何来源.OS X 10.12 及以上版本请 ...
[USACO19FEB]Painting the Barn G
题意 \(n\)个矩阵\((0\le x_1,y_1,x_2,y_2\le 200)\),可交,可以再放最多两个矩阵(这两个矩阵彼此不交),使得恰好被覆盖\(k\)次的位置最大.\(n,k\le 10 ...
exsi 6.7u2 不能向winows虚拟机发送ctrl+alt+del
1. 遇到过可以安装它的浏览器插件启动控制台登录就可以了. 2. 下载官方的客户机远程工具“VMware vSphere Client”才行. 3. 直接选择alt+del+insert 键盘即可代替 ...
七月在线spark教程
链接:https://pan.baidu.com/s/1Ir5GMuDqJQBmSavHC-hDgQ 提取码:qd2e
python 队列、栈
队列常规队列双端队列优先级队列栈
mybatis入门案例2
1. 笔记:1.配置了typeAlias之后,在其他需要写com.itheima.domain.User的地方都可以用user代替 2.先用properties指定了jdbcConfig.proper ...

Importing data in R 1