To clarify, I’m looking for a decent regular expression to validate URLs that were entered as user input with. I have no interest in parsing a list of URLs from a given string of text (even though some of the regexes on this page are capable of doing that). I also don’t want to allow every possible technically valid URL — quite the opposite. See the URL Standard if you’re looking to parse URLs in the same way that browsers do.

Assume that this regex will be used for a public URL shortener written in PHP, so URLs like http://localhost///foo.bar/://foo.bar/data:text/plain;charset=utf-8,OHAI and tel:+1234567890 shouldn’t pass (even though they’re technically valid). Also, in this case I only want to allow the HTTP, HTTPS and FTP protocols.

Also, single weird leading and/or trailing characters aren’t tested for. Just imagine you’re doing this before testing $url with the regex:

$url = trim($url, '!"#$%&\'()*+,-./@:;<=>[\\]^_`{|}~');

Note that I’ve added the S modifier to all the regexes to speed up the tests. In real-world usage, this modifier can be omitted.

Here’s a plain text list of all the URLs used in the test.

Diego Perini posted his version as a gist.

URL Spoon Library @krijnhoetmer @gruber @gruber v2 @cowboy Jeffrey Friedl @mattfarina @stephenhay @scottgonzales @rodneyrehm @imme_emosol @diegoperini Using filter_var()
These URLs should match (1 → correct)
http://foo.com/blah_blah 1 1 1 1 1 1 1 1 1 1 1 1 1
http://foo.com/blah_blah/ 1 1 1 1 1 1 1 1 1 1 1 1 1
http://foo.com/blah_blah_(wikipedia) 1 1 1 1 1 1 1 1 1 0 1 1 1
http://foo.com/blah_blah_(wikipedia)_(again) 1 1 1 1 1 1 1 1 1 0 1 1 1
http://www.example.com/wpstyle/?p=364 1 1 1 1 1 1 1 1 1 1 1 1 1
https://www.example.com/foo/?bar=baz&inga=42&quux 1 1 1 1 1 1 1 1 1 1 1 1 1
http://✪df.ws/123 0 0 1 1 1 1 0 1 1 1 1 1 0
http://userid:password@example.com:8080 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid:password@example.com:8080/ 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid@example.com 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid@example.com/ 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid@example.com:8080 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid@example.com:8080/ 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid:password@example.com 0 1 1 1 1 0 1 1 1 1 1 1 1
http://userid:password@example.com/ 0 1 1 1 1 0 1 1 1 1 1 1 1
http://142.42.1.1/ 0 1 1 1 1 1 1 1 1 1 1 1 1
http://142.42.1.1:8080/ 0 1 1 1 1 1 1 1 1 1 1 1 1
http://➡.ws/䨹 0 0 1 1 1 0 0 1 1 0 1 1 0
http://⌘.ws 0 0 1 1 1 0 0 1 1 1 1 1 0
http://⌘.ws/ 0 0 1 1 1 0 0 1 1 1 1 1 0
http://foo.com/blah_(wikipedia)#cite-1 1 1 1 1 1 1 1 1 1 1 1 1 1
http://foo.com/blah_(wikipedia)_blah#cite-1 1 1 1 1 1 1 1 1 1 1 1 1 1
http://foo.com/unicode_(✪)_in_parens 1 1 1 1 1 1 0 1 1 1 1 1 0
http://foo.com/(something)?after=parens 1 1 1 1 1 1 1 1 1 1 1 1 1
http://☺.damowmow.com/ 0 1 1 1 1 0 0 1 1 1 1 1 0
http://code.google.com/events/#&product=browser 1 1 1 1 1 1 1 1 1 1 1 1 1
http://j.mp 1 1 1 1 1 1 1 1 1 1 1 1 1
ftp://foo.bar/baz 0 0 1 1 1 1 1 1 1 1 1 1 1
http://foo.bar/?q=Test%20URL-encoded%20stuff 0 1 1 1 1 1 1 1 1 1 1 1 1
http://مثال.إختبار 0 0 1 1 1 0 0 1 1 0 1 1 0
http://例子.测试 0 0 1 1 1 0 0 1 1 0 1 1 0
http://उदाहरण.परीक्षा 0 0 1 1 1 0 0 1 1 0 1 1 0
http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com 0 1 0 1 1 0 0 1 1 1 1 1 1
http://1337.net 1 1 1 1 1 1 1 1 1 1 1 1 1
http://a.b-c.de 1 1 1 1 1 1 1 1 1 1 0 1 1
http://223.255.255.254 0 1 1 1 1 1 1 1 1 1 1 1 1
These URLs should fail (0 → correct)
http:// 0 0 0 0 0 0 0 0 1 0 0 0 0
http://. 0 0 0 0 1 0 1 0 1 0 0 0 0
http://.. 0 0 0 0 1 0 1 0 1 0 0 0 0
http://../ 0 1 1 1 1 0 1 0 1 1 0 0 0
http://? 0 0 0 0 1 0 0 0 1 0 0 0 0
http://?? 0 0 0 0 1 0 0 0 1 0 0 0 0
http://??/ 0 1 1 1 1 0 0 0 1 1 0 0 0
http://# 0 0 0 1 1 0 0 0 1 1 0 0 0
http://## 0 1 0 1 1 0 0 0 1 1 0 0 0
http://##/ 0 1 1 1 1 0 0 0 1 1 0 0 0
http://foo.bar?q=Spaces should be encoded 0 1 1 1 1 1 0 0 1 1 0 0 0
// 0 0 0 0 0 0 0 0 0 0 0 0 0
//a 0 0 0 0 0 0 0 0 0 0 0 0 0
///a 0 0 0 0 0 0 0 0 0 0 0 0 0
/// 0 0 0 0 0 0 0 0 0 0 0 0 0
http:///a 0 1 1 1 1 0 0 0 1 1 0 0 0
foo.com 0 0 0 0 1 0 0 0 0 0 0 0 0
rdar://1234 0 0 1 1 1 0 1 0 1 0 0 0 1
h://test 0 0 1 0 1 0 1 0 1 0 0 0 1
http:// shouldfail.com 0 0 0 0 1 0 0 0 1 1 0 0 0
:// should fail 0 0 0 0 0 0 0 0 0 0 0 0 0
http://foo.bar/foo(bar)baz quux 0 1 1 1 1 1 0 0 1 1 0 0 0
ftps://foo.bar/ 0 0 1 1 1 0 1 0 1 0 0 0 1
http://-error-.invalid/ 0 1 1 1 1 1 1 1 1 1 0 0 0
http://a.b--c.de/ 1 1 1 1 1 1 1 1 1 1 0 0 1
http://-a.b.co 1 1 1 1 1 1 1 1 1 1 0 0 0
http://a.b-.co 1 1 1 1 1 1 1 1 1 1 0 0 1
http://0.0.0.0 0 1 1 1 1 1 1 1 1 1 1 0 1
http://10.1.1.0 0 1 1 1 1 1 1 1 1 1 1 0 1
http://10.1.1.255 0 1 1 1 1 1 1 1 1 1 1 0 1
http://224.1.1.1 0 1 1 1 1 1 1 1 1 1 1 0 1
http://1.1.1.1.1 0 1 1 1 1 1 1 1 1 1 1 0 1
http://123.123.123 0 1 1 1 1 1 1 1 1 1 1 0 1
http://3628126748 0 1 1 1 1 0 1 1 1 1 1 0 1
http://.www.foo.bar/ 0 1 1 1 1 0 1 0 1 1 0 0 0
http://www.foo.bar./ 0 1 1 1 1 1 1 1 1 1 1 0 0
http://.www.foo.bar./ 0 1 1 1 1 0 1 0 1 1 0 0 0
http://10.1.1.1 0 1 1 1 1 1 1 1 1 1 1 0 1
http://10.1.1.254 0 1 1 1 1 1 1 1 1 1 1 0 1

Spoon Library (979 chars)

/(((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+(aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mn|mo|mp|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|nom|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ra|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw|arpa)(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%@\.\/_-]+))?(\?[0-9a-zA-Z\+\%@\/&\[\];=_-]+)?)?))\b/imuS

@krijnhoetmer (115 chars)

_(^|[\s.:;?\-\]<\(])(https?://[-\w;/?:@&=+$\|\_.!~*\|'()\[\]%#,☺]+[\w/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])_i

@gruber (71 chars)

#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#iS

@gruber v2 (218 chars)

#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))#iS

@cowboy (1241 chars)

~(?:\b[a-z\d.-]+://[^<>\s]+|\b(?:(?:(?:[^\s!@#$%^&*()_=+[\]{}\|;:'",.<>/?]+)\.)+(?:ac|ad|aero|ae|af|ag|ai|al|am|an|ao|aq|arpa|ar|asia|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|biz|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|cat|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|coop|com|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|edu|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|info|int|in|io|iq|ir|is|it|je|jm|jobs|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mil|mk|ml|mm|mn|mobi|mo|mp|mq|mr|ms|mt|museum|mu|mv|mw|mx|my|mz|name|na|nc|net|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pro|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tel|tf|tg|th|tj|tk|tl|tm|tn|to|tp|travel|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|xn--0zwm56d|xn--11b5bs3a9aj6g|xn--80akhbyknj4f|xn--9t4b11yi5a|xn--deba0ad|xn--g6w251d|xn--hgbk6aj7f53bba|xn--hlcj6aya9esc7a|xn--jxalpdlp|xn--kgbechtv|xn--zckzah|ye|yt|yu|za|zm|zw)|(?:(?:[0-9]|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(?:[0-9]|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]))(?:[;/][^#?<>\s]*)?(?:\?[^#<>\s]*)?(?:#[^<>\s]*)?(?!\w))~iS

Jeffrey Friedl (241 chars)

@\b((ftp|https?)://[-\w]+(\.\w[-\w]*)+|(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\.)+(?: com\b|edu\b|biz\b|gov\b|in(?:t|fo)\b|mil\b|net\b|org\b|[a-z][a-z]\b))(\:\d+)?(/[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*(?:[.!,?]+[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+)*)?@iS

@mattfarina (287 chars)

/^([a-z][a-z0-9\*\-\.]*):\/\/(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*(?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@)?(?:(?:[a-z0-9\-\.]|%[0-9a-f]{2})+|(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\]))(?::[0-9]+)?(?:[\/|\?](?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})*)?$/xiS

@stephenhay (38 chars)

@^(https?|ftp)://[^\s/$.?#].[^\s]*$@iS

@scottgonzales (1347 chars)

#([a-z]([a-z]|\d|\+|-|\.)*):(\/\/(((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?((\[(|(v[\da-f]{1,}\.(([a-z]|\d|-|\.|_|~)|[!\$&'\(\)\*\+,;=]|:)+))\])|((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|(([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=])*)(:\d*)?)(\/(([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*|(\/((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)|((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)|((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)){0})(\?((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\xE000-\xF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?#iS

@rodneyrehm (109 chars)

#((https?://|ftp://|www\.|[^\s:=]+@www\.).*?[a-z_\/0-9\-\#=&])(?=(\.|,|;|\?|\!)?("|'|«|»|\[|\s|\r|\n|$))#iS

@imme_emosol (54 chars)

@(https?|ftp)://(-\.)?([^\s/?\.#-]+\.?)+(/[^\s]*)?$@iS

@diegoperini (502 chars)

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

— Mathias

In search of the perfect URL validation regex的更多相关文章

  1. URL validation failed. The error could have been caused through the use of the browser&#39;s navigation

    URL validation failed. The error could have been caused through the use of the browser's navigation ...

  2. java中url正则regex匹配

    String regex = "^(?:https?://)?[\\w]{1,}(?:\\.?[\\w]{1,})+[\\w-_/?&=#%:]*$"; 解释说明: ^ : ...

  3. R12.2 URL Validation failed. The error could have been caused through the use of the browser's navigation buttons

    EBS升级到R12.2.4后,进入系统操作老是报以下错误: 通过谷歌发现有人遇到相同的问题,并提供了解决方案. 原文地址:http://onlineappsdbaoracle.blogspot.com ...

  4. [PHP]正则表达式判断网址

    来源:https://segmentfault.com/q/1010000000584340/a-1020000000584362 Markdown 的作者之一写的正则表达式(原文在这) (?i)\b ...

  5. 使用jquery.validation+jquery.poshytip做表单验证--未完待续

    jqueryValidate的具体使用方法很多,这里就不在赘述,这一次只谈一下怎样简单的实现表单验证. 整片文章目的,通过JQvalidation按表单属性配置规则验证,并将验证结果通过poshyti ...

  6. [转]nginx下的url rewrite

    转:http://zhengdl126.iteye.com/blog/698206 if (!-e $request_filename){rewrite "^/index\.html&quo ...

  7. AngularJS通过$location获取及改变当前页面的URL

    本文中获取与修改的URL以 ‘http://172.16.0.88:8100/#/homePage?id=10&a=100' 这个路径为例: 一. 获取url的相关方法(不修改URL): 1. ...

  8. Angular 通过注入 $location 获取与修改当前页面URL

    //1.获取当前完整的url路径 var absurl = $location.absUrl(); //http://172.16.0.88:8100/#/homePage?id=10&a=1 ...

  9. AngularJS如何修改URL中的参数

    一. 获取url的相关方法(不修改URL): 1.获取当前完整的url路径 var absurl = $location.absUrl(); //http://172.16.0.88:8100/#/h ...

随机推荐

  1. 【java读书笔记】JSTL,高速精通

    JSTL并非什么新颖的技术并且非常easy,甚至有人觉得JSTL已经过时了.可是我觉得它既然存在,就有存在的道理.作为技术人员就应该知道它们是什么,怎么使用,有什么长处. JSTL包括两部分:标签库和 ...

  2. 使用Fiddler抓包调试https下的页面

    众所周知https技术诞生以来,一个很重要的作用就是加密通信内容.所以在项目团队将业务站点实施完https改造以后,原先使用fiddler进行抓包的美好生活到头了.其实fiddler本身是支持对于ht ...

  3. java面试第十天

    JFC:java基础类库(具体的类可以查看API文档) 观察者模式: 事件监听者对事件源进行监听,事件源会发生某些事件,监听者需要对事件作出相应的处理. 事件监听者(Observer): 处理事件 事 ...

  4. Redis学习(7)-通用命令

    keys pattern: 获取所有与pattern匹配的key,返回所有与该key匹配的keys. 通配符: *表示任意一个或多个字符串. ?表示一个字符. 例如: 查询所有的key:keys * ...

  5. 深入PHP内核之全局变量

    在阅读PHP源码的时候,会遇到很多诸如:CG(),EG() ,PG(),FG()这样的宏,如果不了解这些宏的意义,会给理解源码造成很大困难 EG().这个宏可以用来访问符号表,函数,资源信息和常量 C ...

  6. MM 算法与 EM算法概述

    1.MM 算法: MM算法是一种迭代优化方法,利用函数的凸性来寻找它们的最大值或最小值. MM表示 “majorize-minimize MM 算法” 或“minorize maximize MM 算 ...

  7. JavaScript 设计模式之单例模式

    一.单例模式概念解读 1.单例模式概念文字解读 单例就是保证一个类只有一个实例,实现的方法一般是先判断实例存在与否,如果存在直接返回,如果不存在就创建了再返回,这就确保了一个类只有一个实例对象.在Ja ...

  8. 微信小程序四(设置底部导航)

    好了 小程序的头部标题 设置好了,我们来说说底部导航栏是如何实现的. 我们先来看个效果图 这里,我们添加了三个导航图标,因为我们有三个页面,微信小程序最多能加5个. 那他们是怎么出现怎么着色的呢?两步 ...

  9. js LINQ教程

    在说LINQ之前必须先说说几个重要的C#语言特性 一:与LINQ有关的语言特性 1.隐式类型 (1)源起 在隐式类型出现之前, 我们在声明一个变量的时候, 总是要为一个变量指定他的类型 甚至在fore ...

  10. RHCE7 管理II-2 通过grep使用正则表达式

    grep -i:忽略大小写 -n:表示行数 找出含有root的行 # grep root /etc/passwd root:x:::root:/root:/bin/bash ::operator:/r ...