字符串处理中基本函数的使用

R自带函数与stringr包函数对比

> states <- row.names(USArrests)
> # 提取字符串子集
> substr(x = states, start = 1, stop = 4)
[1] "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" "Conn" "Dela" "Flor" "Geor" "Hawa" "Idah" "Illi" "Indi" "Iowa" "Kans" "Kent"
[18] "Loui" "Main" "Mary" "Mass" "Mich" "Minn" "Miss" "Miss" "Mont" "Nebr" "Neva" "New " "New " "New " "New " "Nort" "Nort"
[35] "Ohio" "Okla" "Oreg" "Penn" "Rhod" "Sout" "Sout" "Tenn" "Texa" "Utah" "Verm" "Virg" "Wash" "West" "Wisc" "Wyom"
> abbreviate(states, minlength = 5)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
"Alabm" "Alask" "Arizn" "Arkns" "Clfrn" "Colrd" "Cnnct" "Delwr"
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
"Flord" "Georg" "Hawai" "Idaho" "Illns" "Indin" "Iowa" "Kanss"
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
"Kntck" "Lousn" "Maine" "Mryln" "Mssch" "Mchgn" "Mnnst" "Mssss"
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
"Missr" "Montn" "Nbrsk" "Nevad" "NwHmp" "NwJrs" "NwMxc" "NwYrk"
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
"NrthC" "NrthD" "Ohio" "Oklhm" "Oregn" "Pnnsy" "RhdIs" "SthCr"
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
"SthDk" "Tnnss" "Texas" "Utah" "Vrmnt" "Virgn" "Wshng" "WstVr"
Wisconsin Wyoming
"Wscns" "Wymng"
> # 计算字符串长度
> nchar(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_count(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_length(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> # 大写和小写
> tolower(states) # 变为小写
[1] "alabama" "alaska" "arizona" "arkansas" "california" "colorado" "connecticut"
[8] "delaware" "florida" "georgia" "hawaii" "idaho" "illinois" "indiana"
[15] "iowa" "kansas" "kentucky" "louisiana" "maine" "maryland" "massachusetts"
[22] "michigan" "minnesota" "mississippi" "missouri" "montana" "nebraska" "nevada"
[29] "new hampshire" "new jersey" "new mexico" "new york" "north carolina" "north dakota" "ohio"
[36] "oklahoma" "oregon" "pennsylvania" "rhode island" "south carolina" "south dakota" "tennessee"
[43] "texas" "utah" "vermont" "virginia" "washington" "west virginia" "wisconsin"
[50] "wyoming"
> toupper(states) # 变为大写
[1] "ALABAMA" "ALASKA" "ARIZONA" "ARKANSAS" "CALIFORNIA" "COLORADO" "CONNECTICUT"
[8] "DELAWARE" "FLORIDA" "GEORGIA" "HAWAII" "IDAHO" "ILLINOIS" "INDIANA"
[15] "IOWA" "KANSAS" "KENTUCKY" "LOUISIANA" "MAINE" "MARYLAND" "MASSACHUSETTS"
[22] "MICHIGAN" "MINNESOTA" "MISSISSIPPI" "MISSOURI" "MONTANA" "NEBRASKA" "NEVADA"
[29] "NEW HAMPSHIRE" "NEW JERSEY" "NEW MEXICO" "NEW YORK" "NORTH CAROLINA" "NORTH DAKOTA" "OHIO"
[36] "OKLAHOMA" "OREGON" "PENNSYLVANIA" "RHODE ISLAND" "SOUTH CAROLINA" "SOUTH DAKOTA" "TENNESSEE"
[43] "TEXAS" "UTAH" "VERMONT" "VIRGINIA" "WASHINGTON" "WEST VIRGINIA" "WISCONSIN"
[50] "WYOMING"
> # 符号替换
> chartr("Tt", "Uu", "AgCTcctTagct")
[1] "AgCUccuUagcu"
> str_replace_all("AgCTcctTagct", pattern = "T", replacement = "U")
[1] "AgCUcctUagct"
> # 字符串连接
> paste("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> str_c("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> x <- c("I love R", "I'm fascinated by Statisitcs", "I")
> # 包含匹配
> grep(pattern = "love", x = x)
[1] 1
> grep(pattern = "love", x = x, value = TRUE)
[1] "I love R"
> grepl(pattern = "love", x = x)
[1] TRUE FALSE FALSE
> str_detect(string = x, pattern = "love")
[1] TRUE FALSE FALSE
> # match返回第一个完全匹配的位置
> match(x = "I",table = x)
[1] 3
> "I" %in% x
[1] TRUE
> # 字符串拆分
> text <- "I love R.\nI'm fascinated by Statisitcs."
> cat(text)
I love R.
I'm fascinated by Statisitcs.
> strsplit(text, split = " ")
[[1]]
[1] "I" "love" "R.\nI'm" "fascinated" "by" "Statisitcs."
> strsplit(text, split = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> str_split(text, pattern = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> # 匹配替换
> test_vector3 <- c("Without the vowels,We can still read the word.")
> sub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-thout the vowels,We can still read the word."
> gsub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> str_replace_all(string = test_vector3, pattern = "[aeiou]",
+ replacement = "-")
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> # 字符串定制输出
> string <- "Each character string in the input is first split into\n paragraphs
+ (or lines containing whitespace)"
> strwrap(x = string, width = 30)
[1] "Each character string in the" "input is first split into" "paragraphs (or lines" "containing whitespace)"
> str_wrap(string = string, width = 30)
[1] "Each character string in\nthe input is first split\ninto paragraphs (or lines\ncontaining whitespace)"
> cat(str_wrap(string = string, width = 30))
Each character string in
the input is first split
into paragraphs (or lines
containing whitespace)

R语言学习笔记(二十二):字符串处理中的函数对比(代码实现)的更多相关文章

  1. R语言学习笔记(十二):零碎知识点(31-35)

    31--round(),floor()和ceiling() round()四舍五入取整 floor()向下取整 ceiling()向上取整 > round(3.5) [1] 4 > flo ...

  2. R语言学习笔记(十五):获取文件和目录信息

    file.info() 参数是表示文件名称的字符串向量,函数会给出每个文件的大小.创建时间.是否为目录等信息. > file.info("z.txt") size isdir ...

  3. R语言学习笔记(十九):字符串处理中预定义字符组(表格介绍)

    R中预定义的字符组 代码 含义说明 [:digit:]或\\d 数字; [0-9] [^[:digit:]]或\\D 非数字; 等价于[^0-9] [:lower:] 小写字母; [a-z] [:up ...

  4. R语言学习笔记(十四):零碎知识点(41-45)

    41--ls( ) ls()可以用来列出现存的所有对象. pattern是一个具名参数,可以列出所有名称中含有字符串"s"的对象. > ls() [1] "s&qu ...

  5. R语言学习笔记(十):零碎知识点(21-25)

    21--assign() assign函数可以通过变量名的字符串来赋值 > assign('a', 1:3) > a [1] 1 2 3 > b <- c('a') > ...

  6. R语言学习笔记(十八):零碎知识点46-50

    seq_along与seq_len函数的使用 在for循环中有用 > seq_along(c(2,3,5)) [1] 1 2 3 > seq_len(3) [1] 1 2 3

  7. R语言学习笔记(十六):构建分割点函数

    选取预测概率的分割点 cutoff<- function(n,p){ pp<-1 i<-0 while (pp>=0.02) { model.predfu<-rep(&q ...

  8. 汇编入门学习笔记 (十二)—— int指令、port

    疯狂的暑假学习之  汇编入门学习笔记 (十二)--  int指令.port 參考: <汇编语言> 王爽 第13.14章 一.int指令 1. int指令引发的中断 int n指令,相当于引 ...

  9. VSTO 学习笔记(十二)自定义公式与Ribbon

    原文:VSTO 学习笔记(十二)自定义公式与Ribbon 这几天工作中在开发一个Excel插件,包含自定义公式,根据条件从数据库中查询结果.这次我们来做一个简单的测试,达到类似的目的. 即在Excel ...

随机推荐

  1. zabbix系列之七——安装后配置二Userparameters

    1User parameters(用户自定义参数) 1.1配置 描述 详细 备注 简介 1执行zabbix中未预定义的agent check时使用 配置 1)    zabbix agent的配置文件 ...

  2. springboot学习网站及博客

    1关于Spring Boot的博客集合https://www.jianshu.com/p/7e2e5e7b32ab 2泥瓦匠BYSocket的Spring Boot系列 https://www.bys ...

  3. 从本机构建Linux应用程序VHD映像

    下图描述了总体的虚拟机映像的VHD生成,上传以及发布到 Azure 镜像市场的全过程: 具体步骤如下: 在本地计算机(Windows平台)上安装Hyper-V,并安装您所需要的虚拟机操作系统 在此操作 ...

  4. mysql-sql-standard

    https://github.com/zhishutech/mysql-sql-standard

  5. django项目设计

    我们以前是只建立一个项目只建立一个app,如果我们要建立多个app的时候 并且这个app要写很多额视图的函数views内函数,要是建立很多种的时候就会造成很冗杂,不美观  我们未来增强解耦性,就把那个 ...

  6. Elasticsearch安装记录

    一 安装部分 1.新建用户 elasticsearch不能使用root身份执行 adduser esuser passwd esuser 2.赋予权限 切换到root chown -R esuser ...

  7. 用字典给Model赋值

    用字典给Model赋值 此篇教程讲述通过runtime扩展NSObject,可以直接用字典给Model赋值,这是相当有用的技术呢. 源码: NSObject+Properties.h 与 NSObje ...

  8. 【转载】http和socket之长连接和短连接

    TCP/IP TCP/IP是个协议组,可分为三个层次:网络层.传输层和应用层. 在网络层有IP协议.ICMP协议.ARP协议.RARP协议和BOOTP协议. 在传输层中有TCP协议与UDP协议. 在应 ...

  9. 企业级NFS网络文件共享服务_【all】

    1.1. 什么是NFS(1台机器提供服务) Network File System(网络文件系统)通过局域网让不同的主机系统之间共享文件或目录. NFS客户端可以通过挂载的方式将NFS服务器端共享的数 ...

  10. controller断点进入失败:包路径问题

    controller 接受前端参数的方法(前端要有传值给controller的方法,后台要有接收值得方法) 1.@RequestParam 接收表单参数 2.@RequestBody 接收json字符 ...