R语言学习笔记(二十二):字符串处理中的函数对比(代码实现)
字符串处理中基本函数的使用
R自带函数与stringr包函数对比
> states <- row.names(USArrests)
> # 提取字符串子集
> substr(x = states, start = 1, stop = 4)
[1] "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" "Conn" "Dela" "Flor" "Geor" "Hawa" "Idah" "Illi" "Indi" "Iowa" "Kans" "Kent"
[18] "Loui" "Main" "Mary" "Mass" "Mich" "Minn" "Miss" "Miss" "Mont" "Nebr" "Neva" "New " "New " "New " "New " "Nort" "Nort"
[35] "Ohio" "Okla" "Oreg" "Penn" "Rhod" "Sout" "Sout" "Tenn" "Texa" "Utah" "Verm" "Virg" "Wash" "West" "Wisc" "Wyom"
> abbreviate(states, minlength = 5)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
"Alabm" "Alask" "Arizn" "Arkns" "Clfrn" "Colrd" "Cnnct" "Delwr"
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
"Flord" "Georg" "Hawai" "Idaho" "Illns" "Indin" "Iowa" "Kanss"
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
"Kntck" "Lousn" "Maine" "Mryln" "Mssch" "Mchgn" "Mnnst" "Mssss"
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
"Missr" "Montn" "Nbrsk" "Nevad" "NwHmp" "NwJrs" "NwMxc" "NwYrk"
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
"NrthC" "NrthD" "Ohio" "Oklhm" "Oregn" "Pnnsy" "RhdIs" "SthCr"
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
"SthDk" "Tnnss" "Texas" "Utah" "Vrmnt" "Virgn" "Wshng" "WstVr"
Wisconsin Wyoming
"Wscns" "Wymng"
> # 计算字符串长度
> nchar(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_count(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_length(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> # 大写和小写
> tolower(states) # 变为小写
[1] "alabama" "alaska" "arizona" "arkansas" "california" "colorado" "connecticut"
[8] "delaware" "florida" "georgia" "hawaii" "idaho" "illinois" "indiana"
[15] "iowa" "kansas" "kentucky" "louisiana" "maine" "maryland" "massachusetts"
[22] "michigan" "minnesota" "mississippi" "missouri" "montana" "nebraska" "nevada"
[29] "new hampshire" "new jersey" "new mexico" "new york" "north carolina" "north dakota" "ohio"
[36] "oklahoma" "oregon" "pennsylvania" "rhode island" "south carolina" "south dakota" "tennessee"
[43] "texas" "utah" "vermont" "virginia" "washington" "west virginia" "wisconsin"
[50] "wyoming"
> toupper(states) # 变为大写
[1] "ALABAMA" "ALASKA" "ARIZONA" "ARKANSAS" "CALIFORNIA" "COLORADO" "CONNECTICUT"
[8] "DELAWARE" "FLORIDA" "GEORGIA" "HAWAII" "IDAHO" "ILLINOIS" "INDIANA"
[15] "IOWA" "KANSAS" "KENTUCKY" "LOUISIANA" "MAINE" "MARYLAND" "MASSACHUSETTS"
[22] "MICHIGAN" "MINNESOTA" "MISSISSIPPI" "MISSOURI" "MONTANA" "NEBRASKA" "NEVADA"
[29] "NEW HAMPSHIRE" "NEW JERSEY" "NEW MEXICO" "NEW YORK" "NORTH CAROLINA" "NORTH DAKOTA" "OHIO"
[36] "OKLAHOMA" "OREGON" "PENNSYLVANIA" "RHODE ISLAND" "SOUTH CAROLINA" "SOUTH DAKOTA" "TENNESSEE"
[43] "TEXAS" "UTAH" "VERMONT" "VIRGINIA" "WASHINGTON" "WEST VIRGINIA" "WISCONSIN"
[50] "WYOMING"
> # 符号替换
> chartr("Tt", "Uu", "AgCTcctTagct")
[1] "AgCUccuUagcu"
> str_replace_all("AgCTcctTagct", pattern = "T", replacement = "U")
[1] "AgCUcctUagct"
> # 字符串连接
> paste("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> str_c("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> x <- c("I love R", "I'm fascinated by Statisitcs", "I")
> # 包含匹配
> grep(pattern = "love", x = x)
[1] 1
> grep(pattern = "love", x = x, value = TRUE)
[1] "I love R"
> grepl(pattern = "love", x = x)
[1] TRUE FALSE FALSE
> str_detect(string = x, pattern = "love")
[1] TRUE FALSE FALSE
> # match返回第一个完全匹配的位置
> match(x = "I",table = x)
[1] 3
> "I" %in% x
[1] TRUE
> # 字符串拆分
> text <- "I love R.\nI'm fascinated by Statisitcs."
> cat(text)
I love R.
I'm fascinated by Statisitcs.
> strsplit(text, split = " ")
[[1]]
[1] "I" "love" "R.\nI'm" "fascinated" "by" "Statisitcs."
> strsplit(text, split = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> str_split(text, pattern = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> # 匹配替换
> test_vector3 <- c("Without the vowels,We can still read the word.")
> sub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-thout the vowels,We can still read the word."
> gsub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> str_replace_all(string = test_vector3, pattern = "[aeiou]",
+ replacement = "-")
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> # 字符串定制输出
> string <- "Each character string in the input is first split into\n paragraphs
+ (or lines containing whitespace)"
> strwrap(x = string, width = 30)
[1] "Each character string in the" "input is first split into" "paragraphs (or lines" "containing whitespace)"
> str_wrap(string = string, width = 30)
[1] "Each character string in\nthe input is first split\ninto paragraphs (or lines\ncontaining whitespace)"
> cat(str_wrap(string = string, width = 30))
Each character string in
the input is first split
into paragraphs (or lines
containing whitespace)
R语言学习笔记(二十二):字符串处理中的函数对比(代码实现)的更多相关文章
- R语言学习笔记(十二):零碎知识点(31-35)
31--round(),floor()和ceiling() round()四舍五入取整 floor()向下取整 ceiling()向上取整 > round(3.5) [1] 4 > flo ...
- R语言学习笔记(十五):获取文件和目录信息
file.info() 参数是表示文件名称的字符串向量,函数会给出每个文件的大小.创建时间.是否为目录等信息. > file.info("z.txt") size isdir ...
- R语言学习笔记(十九):字符串处理中预定义字符组(表格介绍)
R中预定义的字符组 代码 含义说明 [:digit:]或\\d 数字; [0-9] [^[:digit:]]或\\D 非数字; 等价于[^0-9] [:lower:] 小写字母; [a-z] [:up ...
- R语言学习笔记(十四):零碎知识点(41-45)
41--ls( ) ls()可以用来列出现存的所有对象. pattern是一个具名参数,可以列出所有名称中含有字符串"s"的对象. > ls() [1] "s&qu ...
- R语言学习笔记(十):零碎知识点(21-25)
21--assign() assign函数可以通过变量名的字符串来赋值 > assign('a', 1:3) > a [1] 1 2 3 > b <- c('a') > ...
- R语言学习笔记(十八):零碎知识点46-50
seq_along与seq_len函数的使用 在for循环中有用 > seq_along(c(2,3,5)) [1] 1 2 3 > seq_len(3) [1] 1 2 3
- R语言学习笔记(十六):构建分割点函数
选取预测概率的分割点 cutoff<- function(n,p){ pp<-1 i<-0 while (pp>=0.02) { model.predfu<-rep(&q ...
- 汇编入门学习笔记 (十二)—— int指令、port
疯狂的暑假学习之 汇编入门学习笔记 (十二)-- int指令.port 參考: <汇编语言> 王爽 第13.14章 一.int指令 1. int指令引发的中断 int n指令,相当于引 ...
- VSTO 学习笔记(十二)自定义公式与Ribbon
原文:VSTO 学习笔记(十二)自定义公式与Ribbon 这几天工作中在开发一个Excel插件,包含自定义公式,根据条件从数据库中查询结果.这次我们来做一个简单的测试,达到类似的目的. 即在Excel ...
随机推荐
- sql in interview for a job
1.mysql下建表及插入数据 /* Navicat MySQL Data Transfer Source Server : mysql Source Server Version : 50640 S ...
- CentOS 中 配置 Nginx 支持 https
一.基础设置: .yum -y update .yum -y install openssl* .cd /usr/local/nginx/conf .mkdir ./ssl .cd ./ssl # 在 ...
- shell_script2
一.函数 1.简介 Shell函数类似于Shell脚本,里面存放了一系列的指令 不过,Shell的函数存在于内存,而不是硬盘文件,所以速度很快 另外,Shell还能对函数进行预处理,所以函数的启动比脚 ...
- LeetCode题解之Reverse Bits
1.题目描述 2.题目分析 使用bitset 类的方法 3.代码 uint32_t reverseBits(uint32_t n) { bitset<> b(n); string b_s ...
- Breathing During Sleep
TPO24-2 Breathing During Sleep Of all the physiological differences in human sleep compared with wak ...
- Oracle案例11——Oracle表空间数据库文件收缩
我们经常会遇到数据库磁盘空间爆满的问题,或由于归档日志突增.或由于数据文件过多.大导致磁盘使用紧俏.这里主要说的场景是磁盘空间本身很大,但表空间对应的数据文件初始化的时候就直接顶满了磁盘空间,导致经常 ...
- C# Redis的操作
Nuget添加StackExchange.Redis的引用 由于Redis封装类同时使用了Json,需要添加JSON.NET引用(Newtonsoft.Json) Redis封装类 /// <s ...
- JS 触发不同ifram控件,实现刷新,关闭标签(H+框架)
例: //前台页面事件处理模块var EventOperation = { Refresh: function (data_id) { var a = (window.top); var ele = ...
- 进程分析之CPU
进程分析之CPU 进程分析之CPU 本文转载自:https://github.com/ColZer/DigAndBuried/blob/master/system/cpu.md 在<进程分析之内 ...
- 解决windows 10英文版操作系统中VS2017控制台程序打印中文乱码问题
当您在windows 10英文版的操作系统中运行Vs2017控制台应用程序时,程序可能无法正常显示中文,中文都变成了乱码.这是由于大部分中文程序所使用的文字编码与Windows 英文系统的文字编码不同 ...