Oracle正则表达式实战
原文链接:http://oracle-base.com/articles/misc/regular-expressions-support-in-oracle.php
- Introduction
- Example 1 : REGEXP_SUBSTR
- Example 2 : REGEXP_SUBSTR
- Example 3 : REGEXP_SUBSTR
- Example 4 : REGEXP_REPLACE
- Example 5 : REGEXP_INSTR
- Example 6 : REGEXP_LIKE and REGEXP_SUBSTR
- Example 7 : REGEXP_COUNT
- Example 8 : REGEXP_LIKE
相关文章:
- PL/SQL Enhancements in Oracle Database 10g - Regular Expressions
- PL/SQL New Features and Enhancements in Oracle Database 11g Release 1 - Enhancements to Regular Expression Built-in SQL Functions
介绍
Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.
Oracle 10g开始支持在SQL和PLSQL中使用以下正则表达式:
- REGEXP_INSTR - Similar to INSTR except it uses a regular expression rather than a literal as the search string. 类似INSTR函数
- REGEXP_LIKE - Similar to LIKE except it uses a regular expression as the search string. REGEXP_LIKE is really an operator, not a function. 类似LIKE条件
- REGEXP_REPLACE - Similar to REPLACE except it uses a regular expression as the search string. 类似REPLACE函数
- REGEXP_SUBSTR - Returns the string matching the regular expression. Not really similar to SUBSTR. 返回匹配正则表达式的字符串,和SUBSTR有点类似
Oracle 11g introduced two new features related to regularexpressions.
11g开始引入2个新特性:
- REGEXP_COUNT - Returns the number of occurrences of the regular expression in the string. 返回符合正则表达式的字符串出现的次数。
- Sub-expression support was added to all regular expression functions by adding a parameter to each function to specify the sub-expression in the pattern match.
- 子表达式在所有正则表达式函数都支持,可通过增加一个参数实现。
Learning to write regular expressions takes a little time. If youdon't do it regularly, it can be a voyage of discovery each time. The generalrules for writing regular expressions are available here.You can read the Oracle Regular Expression Support here.
Rather than trying to repeat the formal definitions, I'll presenta number of problems I've been asked to look at over the years, where asolution using a regular expression has been appropriate.
此处不重复正则表达式的定义,代之以一组问题导向的正则表达式应用实例:
Example 1 : REGEXP_SUBSTR
The data in a column is free text, but may include a 4 digit year.
数据在字段中以自由文本存放,但是可能包含4个精度的年份数据。
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('FALL 2014');
INSERT INTO t1 VALUES ('2014 CODE-B');
INSERT INTO t1 VALUES ('CODE-A 2014 CODE-D');
INSERT INTO t1 VALUES ('ADSHLHSALK');
INSERT INTO t1 VALUES ('FALL 2004');
INSERT INTO t1 VALUES ('FALL 2015');
COMMIT;
SELECT * FROM t1;
DATA
---------------------------------------------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
ADSHLHSALK
FALL 2004
5 rows selected.
SQL>
If we needed to return rows containing a specific year we coulduse the LIKE
operator (WHERE data LIKE '%2014%'
),but how do we return rows using a comparison (<, <=, >, >=,<>)?
One way to approach this is to pull out the 4 figure year andconvert it to a number, so we don't accidentally do an ASCII comparison. That'spretty easy using regular expressions.
如果我们需要返回包含指定年份的数据我们可以使用LIKE操作符(…),但是如何通过不等操作符返回行?一条路是抽出4个数字的年份并转换为数字。通过正则表达式可以很容易实现。
We can identify digits using the "\d" or"[0-9]" operators. We want a group of four of them, which isrepresented by the "{4}" operator. So our regular expression will be"\d{4}" or "[0-9]{4}". The REGEXP_SUBSTR
functionreturns the string matching the regular expression, so that can be used toextract the text of interest. We then just need to convert it to a number andperform our comparison.
我们使用\d或者[0-9]来识别数字。我们需要4个一组,可以使用{4}表示。至此,我们的正则表达式为:\d{4}或者[0-9]{4}。REGEXP_SUBSTR函数返回匹配指定正式表达式的字符串,所以可以用来提取我们感兴趣的文本。然后我们只需将其转换为数字并执行比较即可。
SELECT *
FROM t1
WHERE TO_NUMBER(REGEXP_SUBSTR(data, '\d{4}')) >= 2014;
DATA
---------------------------------------------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
FALL 2015
4 rows selected.
SQL>
Example 2 : REGEXP_SUBSTR
Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.
给定一个元字符串,要求按照指定规则(基于字母大小写和字母到数字的变化)分割为多个列:
ArtADB1234567e9876540
Becomes this. 分割后:
Art ADB 1234567 e 9876540
The source data is set up like this. 元数据如下:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('ArtADB1234567e9876540');
COMMIT;
The first part of the string is an initcap word, so it starts witha capital letter between "A" and "Z". We identify a singlecharacter using the "[]" operator, and ranges are represented using"-", like "A-Z", "a-z" or "0-9". So ifwe are looking for a single character that is a capital letter, we need to lookfor "[A-Z]". That needs to be followed by lower case letters, whichwe now know is "[a-z]", but we need 1 or more of them, which issignified by the "+" operator. So to find an initcap word, we need tosearch for "[A-Z][a-z]+". Since we want the first occurrence of this,we can use the following.
字符串第一部分为大写字母,可能为A-Z。我们使用[]操作符识别单个字符,至于范围则用“-”,例如“A-Z”,"a-z"或"0-9"。所以如果我们需要找大写的首字母则用“[A-Z]”。其后紧跟着的是若干小写字母,可以用+表示若干(1个或多个)。组合起来的正则表达式即为:[A-Z][a-z]+,这样拆分出的第一列方法有了。
REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1)
The second part of the string is a group of 1 or more uppercaseletters. We know we need to use the "[A-Z]+" pattern, but we need tomake sure we don't get the first capital letter, so we look for the secondoccurrence.
第二部分是一组包含1个或多个大写字母。我们知道需要用模式:[A-Z]+,但是为了不和第一部分冲突,我们指明匹配其第2次出现的文本。
REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)
The next part is the first occurrence of a group of numbers.
下一部分是一组纯数字。
REGEXP_SUBSTR(data, '[0-9]+', 1, 1)
The next part is a group of lower case letters. We don't to pickup those from the initcap word, so we must look for the second occurrence oflower case letters.
下一部分是一组小写字母,同样考虑了不和第一部分冲突:
REGEXP_SUBSTR(data, '[a-z]+', 1, 2)
Finally, we have a group of numbers, which is the secondoccurrence of this pattern.
最后,是一组数字:
REGEXP_SUBSTR(data, '[0-9]+', 1, 2)
Putting that all together, we have the following query, whichsplits the data into separate columns.
将以上每一部分正则表达式的输出分别作为独立字段:
COLUMN col1 FORMAT A15
COLUMN col2 FORMAT A15
COLUMN col3 FORMAT A15
COLUMN col4 FORMAT A15
COLUMN col5 FORMAT A15
SELECT REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1) col1,
REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,
REGEXP_SUBSTR(data, '[0-9]+', 1, 1) col3,
REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,
REGEXP_SUBSTR(data, '[0-9]+', 1, 2) col5
FROM t1;
COL1 COL2 COL3 COL4 COL5
--------- ---------- ---------- ----------- ------------
Art ADB 1234567 e 9876540
1 row selected.
SQL>
Example 3 : REGEXP_SUBSTR
We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.
我们需要从一个字符串(含有分隔字符/和双引号” ”)中提取一组字符,原始数据如下:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('978/955086/GZ120804/10-FEB-12');
INSERT INTO t1 VALUES ('97/95508/BANANA/10-FEB-12');
INSERT INTO t1 VALUES ('97/95508/"APPLE"/10-FEB-12');
COMMIT;
We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of this pattern.
我们要找1个或多个非“/“字符,可以使用”[^/]+“。^在方括号中表示NOT。我们还需要移除可选的双引号所以需要使用[^/”]+。所以如果我们需要获取第3次出现的字符串:
SELECT REGEXP_SUBSTR(data, '[^/"]+', 1, 3) AS element3
FROM t1;
ELEMENT3
---------------------------------------------------------------------
GZ120804
BANANA
APPLE
3 rows selected.
SQL>
Example 4 : REGEXP_REPLACE
We need to take an initcap string and separate the words. The datalooks like this.
我们需要提取首字母大写的字符串并将其分离。原始数据如下:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('SocialSecurityNumber');
INSERT INTO t1 VALUES ('HouseNumber');
COMMIT;
We need to find each uppercase character "[A-Z]". Wewant to keep that character we find, so we will make that pattern asub-expression "([A-Z])", allowing us to refer to it later. For eachmatch, we want to replace it with a space, plus the matching character. Thespace is pretty obvious, but we need to use "\1" to signify the textmatching the first sub expression. So we will replace the matching pattern witha space and itself, " \1". We don't want to replace the first letterof the string, so we will start at the second occurrence.
我们需要使用[A-Z]找到每个大写字符。我们需要保留找到的字符,所以我们使用一个子表达式([A-Z]),以便后续对其引用。对于每一个匹配,我们想使用一个空格替换,加上匹配到的字符。空格是相当明显的,但我们需要使用”\1”表示第一个子表达式匹配的文本。所以我们替换匹配模式使用一个空格和其自身,即”\1”。我们不想替换字符串的第一个字母,所以我们从第2个字符开始:
SELECT REGEXP_REPLACE(data, '([A-Z])', ' \1', 2) AS hyphen_text
FROM t1;
HYPHEN_TEXT
--------------------------------------------------------------------
Social Security Number
House Number
2 rows selected.
SQL>
Example 5 : REGEXP_INSTR
We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.
我们有一个指定数字模式(999:99:99)并且我们想知道模式在我们数据中所处位置。
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('1 01:01:01');
INSERT INTO t1 VALUES ('.2 02:02:02');
INSERT INTO t1 VALUES ('..3 03:03:03');
COMMIT;
We know we are looking for groups of numbers, so we can use"[0-9]" or "\d". We know the amount of digits in eachgroup, which we can indicate using the "{n}" operator, so we simplydescribe the pattern we are looking for.
我们知道我们正在找一组数字,所以使用"[0-9]"或"\d"。我们知道每一组数字的数量,所以可以使用{n}操作符,所以我们简单描述一下模式:
SELECT REGEXP_INSTR(data, '[0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}') AS string_loc_1,
REGEXP_INSTR(data, '\d \d{2}:\d{2}:\d{2}') AS string_loc_2
FROM t1;
STRING_LOC_1 STRING_LOC_2
------------ ------------
1 1
2 2
3 3
3 rows selected.
SQL>
Example 6 : REGEXP_LIKE andREGEXP_SUBSTR
We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.
我们有包含在括号内的字符串。我们想只想返回括号内的字符串。
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('This is some text (with parentheses) in it.');
INSERT INTO t1 VALUES ('This text has no parentheses.');
INSERT INTO t1 VALUES ('This text has (parentheses too).');
COMMIT;
The basic pattern for text between parentheses is"\(.*\)". The "\" characters are escapes for theparentheses, making them literals. Without the escapes they would be assumed todefine a sub-expression. That pattern alone is fine to identify the rows of interestusing a REGEXP_LIKE
operator,but it is not appropriate in a REGEXP_SUBSTR
, as itwould return the parentheses also. To omit the parentheses we need to include asub-expression inside the literal parentheses "\((.*)\)". We can then REGEXP_SUBSTR
using thefirst sub expression.
匹配括号内文本的模式基本写法为:“\(.*\)”。\是转义字符,使跟在其后的字符变为字面值。但是这个模式用在REGEXP_SUBSTR时会连括号一起返回。为了忽略括号我们需要在字面括号内部包含子表达式:"\((.*)\)".
COLUMN with_parentheses FORMAT A20
COLUMN without_parentheses FORMAT A20
SELECT data,
REGEXP_SUBSTR(data, '\(.*\)') AS with_parentheses,
REGEXP_SUBSTR(data, '\((.*)\)', 1, 1, 'i', 1) AS without_parentheses
FROM t1
WHERE REGEXP_LIKE(data, '\(.*\)');
DATA WITH_PARENTHESES WITHOUT_PARENTHESES
-------------------------------------------------- -------------------- --------------------
This is some text (with parentheses) in it. (with parentheses) with parentheses
This text has (parentheses too). (parentheses too) parentheses too
2 rows selected.
SQL>
注意:REGEXP_SUBSTR(data,'\((.*)\)', 1, 1, 'i', 1) 中最后的i代码不区分大小写,最后1个“1”代表返回哪个子表达式匹配的文本。(范围0-9)
Example 7 : REGEXP_COUNT
We need to know how many times a block of 4 digits appears intext. The data looks like this.
我们需要知道4个数字的块在字符串中出现的次数。看原始数据:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('1234');
INSERT INTO t1 VALUES ('1234 1234');
INSERT INTO t1 VALUES ('1234 1234 1234');
COMMIT;
We can identify digits using "\d" or "[0-9]"and the "{4}" operator signifies 4 of them, so using"\d{4}" or "[0-9]{4}" with the REGEXP_COUNT
functionseems to be a valid option.
我们可以用表达式:\d 或[0-9]和{4}操作符识别4个数字的块。
SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,
REGEXP_COUNT(data, '\d{4}') AS pattern_count_2
FROM t1;
PATTERN_COUNT_1 PATTERN_COUNT_2
--------------- ---------------
1 1
2 2
3 3
3 rows selected.
SQL>
Example 8 : REGEXP_LIKE
We need to identify invalid email addresses. The data looks likethis.
我们需要校验邮箱地址,原始数据如下:
DROP TABLE t1;
CREATE TABLE t1 (
data VARCHAR2(50)
);
INSERT INTO t1 VALUES ('me@example.com');
INSERT INTO t1 VALUES ('me@example');
INSERT INTO t1 VALUES ('@example.com');
INSERT INTO t1 VALUES ('me.me@example.com');
INSERT INTO t1 VALUES ('me.me@ example.com');
INSERT INTO t1 VALUES ('me.me@example-example.com');
COMMIT;
The following test gives us email addresses that approximate toinvalid email address formats.
下列测试给我们近似不合法的邮箱。
SELECT data
FROM t1
WHERE NOT REGEXP_LIKE(data, '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}', 'i');
DATA
--------------------------------------------------
me@example
@example.com
me.me@ example.com
3 rows selected.
SQL>
-----------------------------
Dylan Presents.
Oracle正则表达式实战的更多相关文章
- oracle正则表达式的用法
<SPAN style="FONT-SIZE: 18px">Oracle 正则表达式函数-REGEXP_SUBSTR 使用例子 .5个参数 第一个是输入的字符串 第二个 ...
- Oracle正则表达式函数:regexp_like、regexp_substr、regexp_instr、regexp_replace
Oracle正则表达式函数:regexp_like.regexp_substr.regexp_instr.regexp_replace --去掉所有特殊字符,只剩字母 SELECT REGEXP ...
- Oracle正则表达式
Oracle正则表达式 正则表达式具有强大.便捷.高效的文本处理功能.能够添加.删除.分析.叠加.插入和修整各种类型的文本和数据.Oracle从10g开始支持正则表达式. 下面通过一些例子来说明 ...
- oracle正则表达式regexp_like的用法详解
oracle正则表达式regexp_like的用法详解 /*ORACLE中的支持正则表达式的函数主要有下面四个:1,REGEXP_LIKE :与LIKE的功能相似2,REGEXP_INSTR :与IN ...
- Oracle 正则表达式函数-REGEXP_REPLACE
背景 当初写oracle的一个存储过程,以前不知道sql里也有正则表达式,关于正则表达式教程很多了,这里只是记录下Oracle也有这个功能,下次再有类似需求用这个处理的确方便很多. 想起存储过程,就想 ...
- oracle 正则表达式 匹配
oracle 正则表达式 在实际应用中,想排除带有中文的字段值: select h.froomnumber from t_broker_house h where REGEXP_LIKE(froomn ...
- oracle正则表达式语法介绍及实现手机号码匹配方法
Oracle10g提供了在查询中使用正则表达的功能,它是通过各种支持正则表达式的函数在where子句中实现的.本文将简单的介绍oracle正则表达式常用语法,并通过一个手机特号匹配的例子演示正则表达式 ...
- Oracle 正则表达式使用示例
正则表达式的基本例子 在使用这个新功能之前,您需要了解一些元字符的含义.句号 (.) 匹配一个正规表达式中的任意字符(除了换行符).例如,正规表达式 a.b 匹配的字符串中首先包含字母 a,接着是其它 ...
- 转 oracle 正则表达式和查询最大文件号 SQL
###sample 1 https://www.cnblogs.com/lxl57610/p/8227599.html Oracle使用正则表达式离不开这4个函数: 1.regexp_like 2.r ...
- Oracle 正则表达式函数-REGEXP_SUBSTR 使用例子
原文在这 戳 REGEXP_SUBSTR 5个参数 第一个是输入的字符串 第二个是正则表达式 第三个是标识从第几个字符开始正则表达式匹配.(默认为1) 第四个是标识第几个匹配组.(默认为1) 第五个是 ...
随机推荐
- [转帖]Kafka之ISR机制的理解
Kafka对于producer发来的消息怎么保证可靠性? 每个partition都给配上副本,做数据同步,保证数据不丢失. 副本数据同步策略 和zookeeper不同的是,Kafka选择的是全部完成同 ...
- [转帖] Linux命令拾遗-使用blktrace分析io情况
https://www.cnblogs.com/codelogs/p/16060775.html 原创:打码日记(微信公众号ID:codelogs),欢迎分享,转载请保留出处. 简介# 一般来说,想检 ...
- HTTPS下tomcat与nginx的前端性能比较
HTTPS下tomcat与nginx的前端性能比较 摘要 之前比较http的web服务器的性能. 发现nginx 比 tomcat 要好 50% 然后想到, https的情况下不知道两者有什么区别 所 ...
- [转帖]网卡多队列:RPS、RFS、RSS、Flow Director(DPDK支持)
Table of Contents 多队列简介 RPS介绍(Receive Packet Steering) RFS介绍(Receive flow steering) RSS介绍(receive si ...
- [转帖]银河麒麟高级服务器操作系统V10SP1安装Docker管理工具(Portainer+DockerUI)
文章目录 一.系统环境配置 二.安装Docker 三.安装Docker管理工具 Docker管理工具之Portainer Portainer简介 Portainer安装 Portainer访问测试 D ...
- NativeMemoryTracking的再学习
摘要 最近一段时间学习jvm比较多. 为了能够更加深入的进行一些调优和峰值性能的配置. 看了很多像是NMT,inline,堆区方法区以及分层编译等知识. 但是看到华为毕昇社区说的codecache相关 ...
- 查看dmesg 里面部分内容的精确时间
for i in `dmesg |grep "stuck for" |awk '{print $1}' |awk -F "." '{print $1}' |aw ...
- 京音平台-一起玩转SCRM之电销系统
作者:京东科技 李良文 一.前言 电销是什么?就是坐席拿着电话给客户打电话吗?no no no,让我们一起走进京音平台之电销系统. 京音平台2020年初开始建设,过去的两年多的时间里,经历了跌宕起伏, ...
- 你不知道的Linux shell操作
Linux Shell 脚本入门教程 Linux Shell 脚本是一种强大的工具,它允许您自动化日常任务和复杂操作.在本教程中,我们将逐步介绍几个实用的 Shell 脚本示例.每个示例都将详细说明, ...
- express学会CRUD
使用express 搭建项目 1==> express 项目名 -e 2==> 然后按照提示就可以了 cd 项目名 3==>进入项目 下载依赖 cnpm i 4==>启动项目 ...