字符串处理中基本函数的使用

R自带函数与stringr包函数对比

> states <- row.names(USArrests)
> # 提取字符串子集
> substr(x = states, start = 1, stop = 4)
[1] "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" "Conn" "Dela" "Flor" "Geor" "Hawa" "Idah" "Illi" "Indi" "Iowa" "Kans" "Kent"
[18] "Loui" "Main" "Mary" "Mass" "Mich" "Minn" "Miss" "Miss" "Mont" "Nebr" "Neva" "New " "New " "New " "New " "Nort" "Nort"
[35] "Ohio" "Okla" "Oreg" "Penn" "Rhod" "Sout" "Sout" "Tenn" "Texa" "Utah" "Verm" "Virg" "Wash" "West" "Wisc" "Wyom"
> abbreviate(states, minlength = 5)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
"Alabm" "Alask" "Arizn" "Arkns" "Clfrn" "Colrd" "Cnnct" "Delwr"
Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas
"Flord" "Georg" "Hawai" "Idaho" "Illns" "Indin" "Iowa" "Kanss"
Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi
"Kntck" "Lousn" "Maine" "Mryln" "Mssch" "Mchgn" "Mnnst" "Mssss"
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
"Missr" "Montn" "Nbrsk" "Nevad" "NwHmp" "NwJrs" "NwMxc" "NwYrk"
North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina
"NrthC" "NrthD" "Ohio" "Oklhm" "Oregn" "Pnnsy" "RhdIs" "SthCr"
South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia
"SthDk" "Tnnss" "Texas" "Utah" "Vrmnt" "Virgn" "Wshng" "WstVr"
Wisconsin Wyoming
"Wscns" "Wymng"
> # 计算字符串长度
> nchar(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_count(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> str_length(states)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12
[42] 9 5 4 7 8 10 13 9 7
> # 大写和小写
> tolower(states) # 变为小写
[1] "alabama" "alaska" "arizona" "arkansas" "california" "colorado" "connecticut"
[8] "delaware" "florida" "georgia" "hawaii" "idaho" "illinois" "indiana"
[15] "iowa" "kansas" "kentucky" "louisiana" "maine" "maryland" "massachusetts"
[22] "michigan" "minnesota" "mississippi" "missouri" "montana" "nebraska" "nevada"
[29] "new hampshire" "new jersey" "new mexico" "new york" "north carolina" "north dakota" "ohio"
[36] "oklahoma" "oregon" "pennsylvania" "rhode island" "south carolina" "south dakota" "tennessee"
[43] "texas" "utah" "vermont" "virginia" "washington" "west virginia" "wisconsin"
[50] "wyoming"
> toupper(states) # 变为大写
[1] "ALABAMA" "ALASKA" "ARIZONA" "ARKANSAS" "CALIFORNIA" "COLORADO" "CONNECTICUT"
[8] "DELAWARE" "FLORIDA" "GEORGIA" "HAWAII" "IDAHO" "ILLINOIS" "INDIANA"
[15] "IOWA" "KANSAS" "KENTUCKY" "LOUISIANA" "MAINE" "MARYLAND" "MASSACHUSETTS"
[22] "MICHIGAN" "MINNESOTA" "MISSISSIPPI" "MISSOURI" "MONTANA" "NEBRASKA" "NEVADA"
[29] "NEW HAMPSHIRE" "NEW JERSEY" "NEW MEXICO" "NEW YORK" "NORTH CAROLINA" "NORTH DAKOTA" "OHIO"
[36] "OKLAHOMA" "OREGON" "PENNSYLVANIA" "RHODE ISLAND" "SOUTH CAROLINA" "SOUTH DAKOTA" "TENNESSEE"
[43] "TEXAS" "UTAH" "VERMONT" "VIRGINIA" "WASHINGTON" "WEST VIRGINIA" "WISCONSIN"
[50] "WYOMING"
> # 符号替换
> chartr("Tt", "Uu", "AgCTcctTagct")
[1] "AgCUccuUagcu"
> str_replace_all("AgCTcctTagct", pattern = "T", replacement = "U")
[1] "AgCUcctUagct"
> # 字符串连接
> paste("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> str_c("control", 1:3, sep = "_")
[1] "control_1" "control_2" "control_3"
> x <- c("I love R", "I'm fascinated by Statisitcs", "I")
> # 包含匹配
> grep(pattern = "love", x = x)
[1] 1
> grep(pattern = "love", x = x, value = TRUE)
[1] "I love R"
> grepl(pattern = "love", x = x)
[1] TRUE FALSE FALSE
> str_detect(string = x, pattern = "love")
[1] TRUE FALSE FALSE
> # match返回第一个完全匹配的位置
> match(x = "I",table = x)
[1] 3
> "I" %in% x
[1] TRUE
> # 字符串拆分
> text <- "I love R.\nI'm fascinated by Statisitcs."
> cat(text)
I love R.
I'm fascinated by Statisitcs.
> strsplit(text, split = " ")
[[1]]
[1] "I" "love" "R.\nI'm" "fascinated" "by" "Statisitcs."
> strsplit(text, split = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> str_split(text, pattern = "\\s")
[[1]]
[1] "I" "love" "R." "I'm" "fascinated" "by" "Statisitcs."
> # 匹配替换
> test_vector3 <- c("Without the vowels,We can still read the word.")
> sub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-thout the vowels,We can still read the word."
> gsub(pattern = "[aeiou]",replacement = "-",x = test_vector3)
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> str_replace_all(string = test_vector3, pattern = "[aeiou]",
+ replacement = "-")
[1] "W-th--t th- v-w-ls,W- c-n st-ll r--d th- w-rd."
> # 字符串定制输出
> string <- "Each character string in the input is first split into\n paragraphs
+ (or lines containing whitespace)"
> strwrap(x = string, width = 30)
[1] "Each character string in the" "input is first split into" "paragraphs (or lines" "containing whitespace)"
> str_wrap(string = string, width = 30)
[1] "Each character string in\nthe input is first split\ninto paragraphs (or lines\ncontaining whitespace)"
> cat(str_wrap(string = string, width = 30))
Each character string in
the input is first split
into paragraphs (or lines
containing whitespace)

R语言学习笔记(二十二):字符串处理中的函数对比(代码实现)的更多相关文章

  1. R语言学习笔记(十二):零碎知识点(31-35)

    31--round(),floor()和ceiling() round()四舍五入取整 floor()向下取整 ceiling()向上取整 > round(3.5) [1] 4 > flo ...

  2. R语言学习笔记(十五):获取文件和目录信息

    file.info() 参数是表示文件名称的字符串向量,函数会给出每个文件的大小.创建时间.是否为目录等信息. > file.info("z.txt") size isdir ...

  3. R语言学习笔记(十九):字符串处理中预定义字符组(表格介绍)

    R中预定义的字符组 代码 含义说明 [:digit:]或\\d 数字; [0-9] [^[:digit:]]或\\D 非数字; 等价于[^0-9] [:lower:] 小写字母; [a-z] [:up ...

  4. R语言学习笔记(十四):零碎知识点(41-45)

    41--ls( ) ls()可以用来列出现存的所有对象. pattern是一个具名参数,可以列出所有名称中含有字符串"s"的对象. > ls() [1] "s&qu ...

  5. R语言学习笔记(十):零碎知识点(21-25)

    21--assign() assign函数可以通过变量名的字符串来赋值 > assign('a', 1:3) > a [1] 1 2 3 > b <- c('a') > ...

  6. R语言学习笔记(十八):零碎知识点46-50

    seq_along与seq_len函数的使用 在for循环中有用 > seq_along(c(2,3,5)) [1] 1 2 3 > seq_len(3) [1] 1 2 3

  7. R语言学习笔记(十六):构建分割点函数

    选取预测概率的分割点 cutoff<- function(n,p){ pp<-1 i<-0 while (pp>=0.02) { model.predfu<-rep(&q ...

  8. 汇编入门学习笔记 (十二)—— int指令、port

    疯狂的暑假学习之  汇编入门学习笔记 (十二)--  int指令.port 參考: <汇编语言> 王爽 第13.14章 一.int指令 1. int指令引发的中断 int n指令,相当于引 ...

  9. VSTO 学习笔记(十二)自定义公式与Ribbon

    原文:VSTO 学习笔记(十二)自定义公式与Ribbon 这几天工作中在开发一个Excel插件,包含自定义公式,根据条件从数据库中查询结果.这次我们来做一个简单的测试,达到类似的目的. 即在Excel ...

随机推荐

  1. mysql 安装版

    安装 1.MySQL的安装类型选择: 在“Choose Setup Type”对话框有“Typical”默认安装类型:“complete"完全安装类型:Custom自定义安装类型. 我们选择 ...

  2. 【转】ubuntu右键在当前位置打开终端

    ubuntu右键在当前位置打开终端   ubuntu增加右键命令:   在终端中打开   软件中心:   搜索nautilus-open-terminal安装   命令行:   sudo apt-ge ...

  3. LeetCode 题解之Add Binary

    1.题目描述 2.题目分析 使用string 的逆向指针,做二进制加法,注意进位问题就可以. 3.代码 string addBinary(string a, string b) { string::r ...

  4. Oracle EBS AP更新供应商地址

    SELECT pvs.vendor_site_id, pvs.party_site_id, hps.party_site_name, hps.object_version_number, hps.pa ...

  5. MsSQL使用加密连接SSL/TLS

    说明 应用程序通过未加密的通道与数据库服务器通信, 这可能会造成重大的安全风险.在这种情况下, 攻击者可以修改用户输入的数据, 甚至对数据库服务器执行任意 SQL 命令. 例如,当您使用以下连接字符串 ...

  6. spring配置datasource

    1.使用org.springframework.jdbc.datasource.DriverManagerDataSource  说明:DriverManagerDataSource建立连接是只要有连 ...

  7. Office 365 Pass-through身份验证及Seamless Single Sign-On

    Hello 小伙伴们, 这篇文章将视点聚焦在传递身份验证(Pass-through Authentication)上,将分享如何安装,配置和测试Azure Active Directory(Azure ...

  8. MySql报2006error错误的解决方法(数据过大)

    最近迁移项目中发现,转移数据库出现的几个问题,其中之一就是 2006 error,解决过程如下: 首先贴出报错结果 [Msg] Finished - Unsuccessfully 出现这个结果,首先检 ...

  9. September 09th 2017 Week 36th Saturday

    Don't wait to be lonely, to recognize the value of a friend. 不要等到孤独了,才明白朋友的价值. Don't wait to be left ...

  10. Mac环境下WingIDE切换python版本

    https://www.cnblogs.com/fastLearn/p/6514442.html