leetcode 10 Regular Expression Matching(简单正则表达式匹配)
最近代码写的少了,而leetcode一直想做一个python,c/c++解题报告的专题,c/c++一直是我非常喜欢的,c语言编程练习的重要性体现在linux内核编程以及一些大公司算法上机的要求,python主要为了后序转型数据分析和机器学习,所以今天来做一个难度为hard 的简单正则表达式匹配。
做了很多leetcode题目,我们来总结一下套路:
首先一般是检查输入参数是否正确,然后是处理算法的特殊情况,之后就是实现逻辑,最后就是返回值。
当编程成为一种解决问题的习惯,我们就成为了一名纯粹的程序员
leetcode 10 Regular Expression Matching
(简单正则表达式匹配)
题目描述
Implement regular expression matching with support for ‘.’ and ‘*’.
‘.’ Matches any single character.
‘*’ Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)Some examples:
isMatch(“aa”,”a”) → false
isMatch(“aa”,”aa”) → true
isMatch(“aaa”,”aa”) → false
isMatch(“aa”, “a*”) → true
isMatch(“aa”, “.*”) → true
isMatch(“ab”, “.*”) → true
isMatch(“aab”, “c*a*b”) → true
题目意义及背景
It might seem deceptively easy even you know the general idea, but programming it correctly with all the details require careful thought.
在程序设计中,规划所有的细节问题都需要认真思考。
题目分析以及需要注意的问题
为什么aab可以匹配模式c*a*b呢?
● It is a regular expression.Not a wild card.So the ” * ” does not mean any string.And the cab should be split like this “c * a * b” which means N “c”,N “a” and One “b”.
’ * ’ Matches zero or more of the preceding element, so ” c* ” could match nothing.
此题不能使用贪心法
考虑情况:
s = “ac”, p = “ab*c”
After the first ‘a’, we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.
It seems that being greedy is good. But how about this case?
s = “abbc”, p = “ab*bbc”
When we see “b*” in p, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.
可能还有人说,如果碰到这种情况可以先看一下*后面的内容,但是碰见下面的情况就不好办了。
s = “abcbcd”, p = “a.*c.*d”
Here, “.*” in p means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in p matches the first or second ‘c’ in s?
所以:
Unfortunately, there is no way to tell without using some kind of exhaustive search(穷举搜索).
Hints:
A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX’s grep tool. Please note that this post does not attempt to descibe a solution using DFA.
什么是DFA?
Solution
主要解决方案是回溯法,使用递归或者dp
We need some kind of backtracking mechanism (回溯法)such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.
The recursion mainly breaks down elegantly to the following two cases: 主要考虑两种递归情况
1. If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.
2. If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.
You would need to consider the base case carefully too. That would be left as an exercise to the reader.
Below is the extremely concise code (Excluding comments and asserts, it’s about 10 lines of code).
解题过程如下:
1、考虑特殊情况即*s字符串或者*p字符串结束。
(1)s字符串结束,要求*p也结束或者间隔‘’ (例如p=”a*b*c……”),否则无法匹配
(2)*s字符串未结束,而*p字符串结束,则无法匹配
2、*s字符串与*p字符串均未结束
(1)(p+1)字符不为’‘,则只需比较s字符与*p字符,若相等则递归到(s+1)字符串与*(p+1)字符串的比较,否则无法匹配。
(2)(p+1)字符为’‘,则p字符可以匹配*s字符串中从0开始任意多(记为i)等于*p的字符,然后递归到(s+i+1)字符串与*(p+2)字符串的比较,
只要匹配一种情况就算完全匹配。
bool isMatch(const char *s,const char *p)
{
//判断参数合法,以及程序正常结束
assert( s && p);
if(*p == '\0') return *s == '\0';
//next char is not '*'; must match current character
if(*(p+1) != '*')
{
assert(*p != '*');//考虑情况isMatch('aa','a*');
return ((*p == *s) ||(*p == '.' && *s != '\0')) && isMatch(s + 1, p + 1);
}
//next char is '*' 继续递归匹配,不能写成*(p+1) == '*' 考虑情况isMatch('ab','.*c')
while((*p == *s)|| (*p == '.' && *s != '\0'))
{
if (isMatch(s, p+2)) return true;
s++;
}
//匹配下一个模式
return isMatch(s,p+2);
}
此代码运行时间:18ms
Further Thoughts:
Some extra exercises to this problem:
1. If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
2. Try to implement partial matching instead of full matching. In addition, add ‘^’ and ‘$’ to the rule. ‘^’ matches the starting position within the string, while ‘$’ matches the ending position of the string.
3. Try to implement wildcard matching where ‘*’ means any sequence of zero or more characters.
For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read this article.
Rating: 4.8/5 (107 votes cast)
Regular Expression Matching, 4.8 out of 5 based on 107 ratings
leetcode的 解题报告提醒我们说:
leetcode的解答报告中说的If you are stuck, recursion is your friend.
// 递归版,时间复杂度O(n),空间复杂度O(1)
class Solution {
public:
bool isMatch(const char *s, const char *p)
{
if (*p == '\0') return *s == '\0';
// next char is not '*', then must match current character
if (*(p + 1) != '*')
{
if (*p == *s || (*p == '.' && *s != '\0'))
return isMatch(s + 1, p + 1);
else
return false;
}
else
{ // next char is '*'
while (*p == *s || (*p == '.' && *s != '\0'))
{
if (isMatch(s, p + 2))
return true;
s++;
}
return isMatch(s, p + 2);
}
}
};
c++解决方案:
My concise recursive and DP solutions with full explanation in C++
●
Please refer to my blog post if you have any comment. Wildcard matching problem can be solved similarly.
class Solution {
public:
bool isMatch(string s, string p) {
if (p.empty()) return s.empty();
if ('*' == p[1])
// x* matches empty string or at least one character: x* -> xx*
// *s is to ensure s is non-empty
return (isMatch(s, p.substr(2)) || !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p));
else
return !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p.substr(1));
}
};
class Solution {
public:
bool isMatch(string s, string p) {
/**
* f[i][j]: if s[0..i-1] matches p[0..j-1]
* if p[j - 1] != '*'
* f[i][j] = f[i - 1][j - 1] && s[i - 1] == p[j - 1]
* if p[j - 1] == '*', denote p[j - 2] with x
* f[i][j] is true iff any of the following is true
* 1) "x*" repeats 0 time and matches empty: f[i][j - 2]
* 2) "x*" repeats >= 1 times and matches "x*x": s[i - 1] == x && f[i - 1][j]
* '.' matches any single character
*/
int m = s.size(), n = p.size();
vector<vector<bool>> f(m + 1, vector<bool>(n + 1, false));
f[0][0] = true;
for (int i = 1; i <= m; i++)
f[i][0] = false;
// p[0.., j - 3, j - 2, j - 1] matches empty iff p[j - 1] is '*' and p[0..j - 3] matches empty
for (int j = 1; j <= n; j++)
f[0][j] = j > 1 && '*' == p[j - 1] && f[0][j - 2];
for (int i = 1; i <= m; i++)
for (int j = 1; j <= n; j++)
if (p[j - 1] != '*')
f[i][j] = f[i - 1][j - 1] && (s[i - 1] == p[j - 1] || '.' == p[j - 1]);
else
// p[0] cannot be '*' so no need to check "j > 1" here
f[i][j] = f[i][j - 2] || (s[i - 1] == p[j - 2] || '.' == p[j - 2]) && f[i - 1][j];
return f[m][n];
}
};
The shortest AC code.
1.’.’ is easy to handle. if p has a ‘.’, it can pass any single character in s except ‘\0’.
2.” is a totally different problem. if p has a ” character, it can pass any length of first-match characters in s including ‘\0’.
class Solution {
public:
bool matchFirst(const char *s, const char *p){
return (*p == *s || (*p == '.' && *s != '\0'));
}
bool isMatch(const char *s, const char *p) {
if (*p == '\0') return *s == '\0'; //empty
if (*(p + 1) != '*') {//without *
if(!matchFirst(s,p)) return false;
return isMatch(s + 1, p + 1);
} else { //next: with a *
if(isMatch(s, p + 2)) return true; //try the length of 0
while ( matchFirst(s,p) ) //try all possible lengths
if (isMatch(++s, p + 2))return true;
}
}
};
a shorter one (14 lines of code) with neatly format:
class Solution {
public:
bool isMatch(const char *s, const char *p) {
for( char c = *p; c != 0; ++s, c = *p ) {
if( *(p+1) != '*' )
p++;
else if( isMatch( s, p+2 ) )
return true;
if( (*s==0) || ((c!='.') && (c!=*s)) )
return false;
}
return *s == 0;
}
};
9-lines 16ms C++ DP Solutions with Explanations
●
This problem has a typical solution using Dynamic Programming. We define the state P[i][j] to be true if s[0..i) matches p[0..j) and false otherwise. Then the state equations are:
a. P[i][j] = P[i - 1][j - 1], if p[j - 1] != ‘*’ && (s[i - 1] == p[j - 1] || p[j - 1] == ‘.’);
b. P[i][j] = P[i][j - 2], if p[j - 1] == ‘*’ and the pattern repeats for 0 times;
c. P[i][j] = P[i - 1][j] && (s[i - 1] == p[j - 2] || p[j - 2] == ‘.’), if p[j - 1] == ‘*’ and the pattern repeats for at least 1 times.
Putting these together, we will have the following code.
class Solution {
public:
bool isMatch(string s, string p) {
int m = s.length(), n = p.length();
vector<vector<bool> > dp(m + 1, vector<bool> (n + 1, false));
dp[0][0] = true;
for (int i = 0; i <= m; i++)
for (int j = 1; j <= n; j++)
if (p[j - 1] == '*')
dp[i][j] = dp[i][j - 2] || (i > 0 && (s[i - 1] == p[j - 2] || p[j - 2] == '.') && dp[i - 1][j]);
else dp[i][j] = i > 0 && dp[i - 1][j - 1] && (s[i - 1] == p[j - 1] || p[j - 1] == '.');
return dp[m][n];
}
};
2 years agoreply quote
python解决方案
使用re库,叼炸天!
import re
class Solution:
# @return a boolean
def isMatch(self, s, p):
return re.match('^' + p + '$', s) != None
# debug
s = Solution()
print (s.isMatch("aaa", ".*"))
Python DP solution in 36 ms
def isMatch(self, s, p):
m = len(s)
n = len(p)
dp = [[True] + [False] * m]
for i in xrange(n):
dp.append([False]*(m+1))
for i in xrange(1, n + 1):
x = p[i-1]
if x == '*' and i > 1:
dp[i][0] = dp[i-2][0]
for j in xrange(1, m+1):
if x == '*':
dp[i][j] = dp[i-2][j] or dp[i-1][j] or (dp[i-1][j-1] and p[i-2] == s[j-1]) or (dp[i][j-1] and p[i-2]=='.')
elif x == '.' or x == s[j-1]:
dp[i][j] = dp[i-1][j-1]
return dp[n][m]
about a year agoreply quote
class Solution(object):
def isMatch(self, s, p, memo={("",""):True}):
if not p and s: return False
if not s and p: return set(p[1::2]) == {"*"} and not (len(p) % 2)
if (s,p) in memo: return memo[s,p]
char, exp, prev = s[-1], p[-1], 0 if len(p) < 2 else p[-2]
memo[s,p] =\
(exp == '*' and ((prev in {char, '.'} and self.isMatch(s[:-1], p, memo)) or self.isMatch(s, p[:-2], memo)))\
or\
(exp in {char, '.'} and self.isMatch(s[:-1], p[:-1], memo))
return memo[s,p]
# 445 / 445 test cases passed.
# Status: Accepted
# Runtime: 72 ms
8ms backtracking solution C++
//regular expression matching
//first solution: using recursive version
class Solution {
public:
bool isMatch(string s, string p) {
int m = s.length(), n = p.length();
return backtracking(s, m, p, n);
}
bool backtracking(string& s, int i, string& p, int j) {
if (i == 0 && j == 0) return true;
if (i != 0 && j == 0) return false;
if (i == 0 && j != 0) {
//in this case only p == "c*c*c*" this pattern can match null string
if (p[j-1] == '*') {
return backtracking(s, i, p, j-2);
}
return false;
}
//now both i and j are not null
if (s[i-1] == p[j-1] || p[j-1] == '.') {
return backtracking(s, i - 1, p, j - 1);
} else if (p[j-1] == '*') {
//two cases: determines on whether p[j-2] == s[i-1]
//first p[j-2]* matches zero characters of p
if (backtracking(s, i, p, j - 2)) return true;
//second consider whether p[j-2] == s[i-1], if true, then s[i-1] is matched, move to backtracking(i - 1, j)
if (p[j-2] == s[i-1] || p[j-2] == '.') {
return backtracking(s, i - 1, p, j);
}
return false;
}
return false;
}
};
c语言参考解决方案:
3ms C solution using O(mn) time and O(n) space
bool isMatch(char *s, char *p){
int i;
int ls = strlen(s);
int lp = strlen(p);
bool* m = malloc((ls + 1) * sizeof(bool));
// init
m[0] = true;
for (i = 1; i <= ls; i++) {
m[i] = false;
}
int ip;
for (ip = 0; ip < lp; ip++) {
if (ip + 1 < lp && p[ip + 1] == '*') {
// do nothing
}
else if (p[ip] == '*') {
char c = p[ip - 1];
for (i = 1; i <= ls; i++) {
m[i] = m[i] || (m[i - 1] && (s[i - 1] == c || c == '.'));
}
}
else {
char c = p[ip];
for (i = ls; i > 0; i--) {
m[i] = m[i - 1] && (s[i - 1] == c || c == '.');
}
m[0] = false;
}
}
bool ret = m[ls];
free(m);
return ret;
}
简短的代码:
bool isMatch(char* s, char* p) {
while (*s) {
if (*p&&*(p+1)=='*') {
if (!(*p==*s||*p=='.')) {p+=2;continue;}
if (!isMatch(s,p+2)) {s++;continue;} else return true;
}
if (*p==*s||*p=='.') {s++;p++;continue;}
return false;
}
while(*p&&*(p+1)=='*') p+=2;
return !*p;
}
参考文献
http://articles.leetcode.com/regular-expression-matching
http://blog.csdn.net/gatieme/article/details/51049244
http://www.jianshu.com/p/85f3e5a9fcda
leetcode 10 Regular Expression Matching(简单正则表达式匹配)的更多相关文章
- 10. Regular Expression Matching[H]正则表达式匹配
题目 Given an input string(s) and a pattern(p), implement regular expression matching with support for ...
- Leetcode 10. Regular Expression Matching(递归,dp)
10. Regular Expression Matching Hard Given an input string (s) and a pattern (p), implement regular ...
- leetcode 10. Regular Expression Matching 、44. Wildcard Matching
10. Regular Expression Matching https://www.cnblogs.com/grandyang/p/4461713.html class Solution { pu ...
- [LeetCode] 10. Regular Expression Matching 正则表达式匹配
Given an input string (s) and a pattern (p), implement regular expression matching with support for ...
- [LeetCode]10. Regular Expression Matching正则表达式匹配
Given an input string (s) and a pattern (p), implement regular expression matching with support for ...
- [leetcode]10. Regular Expression Matching正则表达式的匹配
Given an input string (s) and a pattern (p), implement regular expression matching with support for ...
- LeetCode OJ:Regular Expression Matching(正则表达式匹配)
Implement regular expression matching with support for '.' and '*'. '.' Matches any single character ...
- LeetCode (10): Regular Expression Matching [HARD]
https://leetcode.com/problems/regular-expression-matching/ [描述] Implement regular expression matchin ...
- 蜗牛慢慢爬 LeetCode 10. Regular Expression Matching [Difficulty: Hard]
题目 Implement regular expression matching with support for '.' and '*'. '.' Matches any single charac ...
随机推荐
- sqlmap 实战漏洞平台dvwa进行密码破解
2016-05-24 (1)实验的具体的环境极其思路 首先我们要检测我们的漏洞平台是否有sql注入 ,进行简单的测试发现在用户userid 上存在注入的漏洞 使用抓包工具对其cookie 进行获取如下 ...
- LVM 镜像硬盘更换、数据恢复(centos7.4 redhat7.5)
案例说明 Centos7 VG:vg LV:vg-lvRedhat 7.5VG:vgtest LV:lvtest 目的:模拟硬盘 /dev/sdb损坏.在线添加新硬盘/dev/sdc,lv镜像数据 ...
- Python格式化字符串、占位符、合并数组
合并数组 参考链接:https://www.cnblogs.com/chaihy/p/7243143.html >>> a=[2] >>> b=[3] >&g ...
- Java基础学习总结(8)——super关键字
一.super关键字 在JAVA类中使用super来引用父类的成分,用this来引用当前对象,如果一个类从另外一个类继承,我们new这个子类的实例对象的时候,这个子类对象里面会有一个父类对象.怎么去引 ...
- TOJ 3517 The longest athletic track
3517. The longest athletic track Time Limit: 1.0 Seconds Memory Limit: 65536KTotal Runs: 880 A ...
- C++容器(五):set类型
set类型 map容器是键-值对的集合,好比以任命为键的地址和电话号码.而set容器只是单纯的键的集合.当只想知道一个值是否存在时,使用set容器是最适合. 使用set容器必须包含set头文件: #i ...
- ACM数学知识体系
在盛情收到学弟邀请给他们整理ACM数学方面的知识体系,作为学长非常认真的弄了好久,希望各学弟不辜负学长厚爱!!!非常抱歉因为电脑全盘格式化好多word.PPT都丢失,我尽量具体地给大家找到各知识点学习 ...
- [iOS]iOS获取设备信息经常用法
郝萌主倾心贡献.尊重作者的劳动成果.请勿转载. 假设文章对您有所帮助.欢迎给作者捐赠.支持郝萌主.捐赠数额任意,重在心意^_^ 我要捐赠: 点击捐赠 Cocos2d-X源代码下载:点我传送 游戏官方下 ...
- django 笔记9 分页知识整理
感谢老男孩 自定义分页 XSS:攻击 默认字符串返回 {{page_str|safe}} 前端 from django.utils.safestring import mark_safe page_s ...
- 实时监控Cat之旅~对请求是否正常结束做监控(分布式的消息树)
对基于请求的分布式消息树的分析 在MVC时有过滤器System.Web.Mvc.ActionFilterAttribute,它可以对action执行的整个过程进行拦截,执行前与执行后我们可以注入自己的 ...