preg_replace

(PHP 3 >= 3.0.9, PHP 4, PHP 5)

preg_replace -- 执行正则表达式的搜索和替换

说明

mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )


subject 中搜索
pattern 模式的匹配项并替换为
replacement。如果指定了
limit,则仅替换
limit 个匹配,如果省略
limit 或者其值为 -1,则所有的匹配项都会被替换。

replacement 可以包含
\\n
形式或(自 PHP 4.0.4 起)$n
形式的逆向引用,首选使用后者。每个此种引用将被替换为与第
n
个被捕获的括号内的子模式所匹配的文本。n
可以从 0 到 99,其中
\\0$0
指的是被整个模式所匹配的文本。对左圆括号从左到右计数(从 1 开始)以取得子模式的数目。

对替换模式在一个逆向引用后面紧接着一个数字时(即:紧接在一个匹配的模式后面的数字),不能使用熟悉的
\\1 符号来表示逆向引用。举例说
\\11,将会使
preg_replace() 搞不清楚是想要一个
\\1 的逆向引用后面跟着一个数字
1 还是一个
\\11 的逆向引用。本例中的解决方法是使用
\${1}1。这会形成一个隔离的
$1 逆向引用,而使另一个
1 只是单纯的文字。

例子 1. 逆向引用后面紧接着数字的用法


<?php
$string = "April 15, 2003";
$pattern = "/(\w+) (\d+), (\d+)/i";
$replacement = "\${1}1,\$3";
print preg_replace($pattern, $replacement, $string);

/* Output
   ======

April1,2003

*/
?>

如果搜索到匹配项,则会返回被替换后的
subject,否则返回原来不变的
subject

preg_replace() 的每个参数(除了
limit)都可以是一个数组。如果
pattern
replacement 都是数组,将以其键名在数组中出现的顺序来进行处理。这不一定和索引的数字顺序相同。如果使用索引来标识哪个
pattern 将被哪个
replacement 来替换,应该在调用
preg_replace() 之前用
ksort() 对数组进行排序。

例子 2. 在 preg_replace() 中使用索引数组


<?php
$string = "The quick brown fox jumped over the lazy dog.";

$patterns[] = "/quick/";
$patterns[] = "/brown/";
$patterns[] = "/fox/";

$replacements[] = "bear";
$replacements[] = "black";
$replacements[] = "slow";

print preg_replace($patterns, $replacements, $string);

/* Output
   ======

The bear black slow jumped over the lazy dog.

*/

/* By ksorting patterns and replacements,
   we should get what we wanted. */

ksort($patterns);
ksort($replacements);

print preg_replace($patterns, $replacements, $string);

/* Output
   ======

The slow black bear jumped over the lazy dog.

*/

?>

如果
subject 是个数组,则会对
subject 中的每个项目执行搜索和替换,并返回一个数组。

如果
pattern
replacement 都是数组,则
preg_replace() 会依次从中分别取出值来对
subject 进行搜索和替换。如果
replacement 中的值比
pattern 中的少,则用空字符串作为余下的替换值。如果
pattern 是数组而
replacement 是字符串,则对
pattern 中的每个值都用此字符串作为替换值。反过来则没有意义了。

/e 修正符使
preg_replace()
replacement 参数当作
PHP 代码(在适当的逆向引用替换完之后)。提示:要确保
replacement 构成一个合法的
PHP 代码字符串,否则
PHP 会在报告在包含
preg_replace() 的行中出现语法解析错误。

例子 3. 替换数个值


<?php
$patterns = array ("/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/",
                   "/^\s*{(\w+)}\s*=/");
$replace = array ("\\3/\\4/\\1\\2", "$\\1 =");
print preg_replace ($patterns, $replace, "{startDate} = 1999-5-27");
?>

本例将输出:

$startDate = 5/27/1999

例子 4. 使用 /e 修正符

<?php
preg_replace ("/(<\/?)(\w+)([^>]*>)/e",
              "'\\1'.strtoupper('\\2').'\\3'",
              $html_body);
?>

这将使输入字符串中的所有 HTML 标记变成大写。

例子 5. 将 HTML 转换成文本


<?php
// $document 应包含一个 HTML 文档。
// 本例将去掉 HTML 标记,javascript 代码
// 和空白字符。还会将一些通用的
// HTML 实体转换成相应的文本。

$search = array ("'<script[^>]*?>.*?</script>'si",  // 去掉 javascript
                 "'<[\/\!]*?[^<>]*?>'si",           // 去掉 HTML 标记
                 "'([\r\n])[\s]+'",                 // 去掉空白字符
                 "'&(quot|#34);'i",                 // 替换 HTML 实体
                 "'&(amp|#38);'i",
                 "'&(lt|#60);'i",
                 "'&(gt|#62);'i",
                 "'&(nbsp|#160);'i",
                 "'&(iexcl|#161);'i",
                 "'&(cent|#162);'i",
                 "'&(pound|#163);'i",
                 "'&(copy|#169);'i",
                 "'&#(\d+);'e");                    // 作为 PHP 代码运行

$replace = array ("",
                  "",
                  "\\1",
                  "\"",
                  "&",
                  "<",
                  ">",
                  " ",
                  chr(),
                  chr(),
                  chr(),
                  chr(),
                  "chr(\\1)");

$text = preg_replace ($search, $replace, $document);
?>

注:
limit 参数是 PHP 4.0.1pl2 之后加入的。

参见 preg_match()preg_match_all()
preg_split()

add a note
User Contributed Notes

Sune Rievers
25-May-2006 01:58


Updated version of the link script, since the other version didn't work with links in beginning of line, links without http:// and emails. Oh, and a bf2:// detection too for all you gamers ;)

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}

sep16 at psu dot edu
19-May-2006 11:28


Re: preg_replace() with the /e modifier; handling escaped quotes.

I was writing a replacement pattern to parse HTML text which sometimes
contained PHP variable-like strings. Various initial solutions yeilded
either escaped quotes or fatal errors due to these variable-like strings
being interpreted as poorly formed variables.

"Tim K." and "steven -a-t- acko dot net" provide some detailed
discussion of preg_replace's quote escaping in the comments below,
including the use of str_replace() to remove the preg_replace added
slash-quote.  However, this suggestion is applied to the entire text
AFTER the preg_match.  This isn't a robust solution in that it is
conceivable that the text unaffected by the preg_replace() may contain
the string \\" which should not be fixed.  Furthermore, the addition of
escaped quotes within preg_replaces with multiple patterns/replacements
(with arrays) may break one of the following patterns.

The solution, then, must fix the quote-escaped text BEFORE replacing it
in the target, and possibly before it is passed to a function within the
replacement code.  Since the replacement string is interpreted as PHP
code, just use str_replace('\\"','"','$1') where you need an
unadulterated $1 to appear.  The key is to properly escape the necessary
characters.  Three variations appear in the examples below, as well as a
set of incorrect examples.  I haven't seen this solution posted before,
so hopefully this will be helpful rather than covering old ground.

Try this example code:

<?php
/*
   Using preg_replace with the /e modifier on ANY text, regardless of single
   quotes, double quotes, dollar signs, backslashes, and variable interpolation.

Tested on PHP 5.0.4 (cli), PHP 5.1.2-1+b1 (cli), and PHP 5.1.2 for Win32.

Solution?
       1.  Use single quotes for the replacement string.
       2.  Use escaped single quotes around the captured text variable (\\'$1\\').
       3.  Use str_replace() to remove the escaped double quotes
           from within the replacement code (\\" -> ").
*/

function _prc_function1($var1,$var2,$match) {
   $match = str_replace('\\"','"',$match);
   // ... do other stuff ...
   return $var1.$match.$var2;
}
function _prc_function2($var1,$var2,$match) {
   // ... do other stuff ...
   return $var1.$match.$var2;
}

$v1 = '(';
$v2 = ')';
// Lots of tricky characters:
$text = "<xxx>...\\$varlike_text[_'\\\\\\""\\"'...</xxx>";
$pattern = '/<xxx>(.*?)</xxx>/e';

echo $text . " Original.<br>\\n";

// Example #1 - Processing in place.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
   $pattern,
   '$v1 . str_replace(\\'\\\\"',\\'"\\',\\'$1\\') . $v2',
   $text) . " Escaped double quotes replaced with str_replace. (Good.)<br>n";

// Example #2 - Processing within a function.
// returns (...$varlike_text['\\"""'...)
echo preg_replace(
   $pattern,
   '_prc_function1($v1,$v2,'$1\\')',
   $text) . " Escaped double quotes replaced in a function. (Good.)<br>\\n";

// Example #3 - Preprocessing before a function.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
   $pattern,
   '_prc_function2($v1,$v2,str_replace(\\'\\\\"',\\'"\\',\\'$1\\'))',
   $text) . " Escaped double quotes replaced with str_replace before sending match to a function. (Good.)<br>n";

// Example #4 - INCORRECT implementations
//  a. returns (...$varlike_text[_'\\\\""\\"'...)
//  b. returns (...$varlike_text[_'"\\\\""\\'...)
//  c. returns (...$varlike_text[_'\\\\""\\"'...)
//  d. Causes a syntax+fatal error, unexpected T_BAD_CHARACTER...
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes, $ escaped. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . '$1\\' . $v2', $text)," Enclosed in single/single quotes. (Wrong!  Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . "$1" . $v2', $text)," Enclosed in single quotes. (Wrong!  Dollar sign in text is interpreted as variable interpolation.)<br>\\n";

?>

klemens at ull dot at
16-May-2006 05:24


See as well the excellent tutorial at http://www.tote-taste.de/X-Project/regex/index.php

;-) Klemens

robvdl at gmail dot com
21-Apr-2006 08:15


For those of you that have ever had the problem where clients paste text
from msword into a CMS, where word has placed all those fancy quotes
throughout the text, breaking the XHTML validator... I have created a
nice regular expression, that replaces ALL high UTF-8 characters with
HTML entities, such as ’.

Note that most user examples on php.net I have read, only replace
selected characters, such as single and double quotes. This replaces all
high characters, including greek characters, arabian characters,
smilies, whatever.

It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.

$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' .
((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128))
. ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' .
((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) -
128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);

heppa(at)web(dot)de
20-Apr-2006 11:37


I just wanted to give an example for some people that have the problem, that their match is taking away too much of the string.

I wanted to have a function that extracts only wanted parameters out of a
http query string, and they had to be flexible, eg 'updateItem=1'
should be replaced, as well as 'updateCategory=1', but i sometimes ended
up having too much replaced from the query.

example:

my query string: 'updateItem=1&itemID=14'

ended up in a query string like this: '4' , which was not really covering the plan ;)

i was using this regexp:

preg_replace('/&?update.*=1&?/','',$query_string);

i discovered, that preg_replace matches the longest possible string,
which means that it replaces everything from the first u up to the 1
after itemID=

I assumed, that it would take the shortest possible match.

Ritter
19-Apr-2006 05:08


for those of you with multiline woes like I was having, try:

$str = preg_replace('/<tag[^>](.*)>(.*)<\/tag>/ims','<!-- edited -->', $str);

Eric
10-Apr-2006 02:54


Here recently I needed a way to replace links (<a
href="blah.com/blah.php">Blah</a>) with their anchor text, in
this case Blah. It might seem simple enough for some..or most, but at
the benefit of helping others:

<?php

$value = '<a href="http://www.domain.com/123.html">123</a>';

echo preg_replace('/<a href="(.*?)">(.*?)<\\/a>/i', '$2', $value);

//Output
// 123

?>

sesha_srinivas at yahoo dot com
08-Apr-2006 04:13


If you have a form element displaying the amounts using "$" and ",". Before posting it to the db you can use the following:

$search = array('/,/','/\$/');

$replace = array('','');

$data['amount_limit'] = preg_replace($search,'',$data['amount_limit']);

ciprian dot amariei Mtaiil gmail * com
06-Apr-2006 01:21


I found some situations that my function bellow doesn't
perform as expected. Here is the new version.

<?php
function make_links_blank( $text )
{
 return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\'])((?:https?|ftp):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">'
       ),
       $text
   );
}

?>

This function replaces links (http(s)://, ftp://) with respective html anchor tag, and also makes all anchors open in a new window.

ae at instinctive dot de
28-Mar-2006 11:40


Something innovative for a change ;-) For a news system, I have a special format for links:

"Go to the [Blender3D Homepage|http://www.blender3d.org] for more Details"

To get this into a link, use:

$new = preg_replace('/\[(.*?)\|(.*?)\]/', '<a href="$2" target="_blank">$1</a>', $new);

c_stewart0a at yahoo dot com
18-Mar-2006 06:35


In response to elaineseery at hotmail dot com

[quote]if you're new to this function, and getting an error like  'delimiter must not alphanumeric backslash ...[/quote]

Note that if you use arrays for search and replace then you will want to
quote your searching expression with / or you will get this error.

However, if you use a single string to search and replace then you will
not recieve this error if you do not quote your regular expression in /

Graham Dawson <graham at imdanet dot com>
16-Mar-2006 06:46


I said there was a better way. There is!

The regexp is essentially the same but now I deal with problems that it
couldn't handle, such as urls, which tended to screw things up, and the
odd placement of a : or ; in the body text, by using functions. This
makes it easier to expand to take account of all the things I know I've
not taken account of. But here it is in its essential glory. Or
mediocrity. Take your pick.

<?php

define('PARSER_ALLOWED_STYLES_',
'text-align,font-family,font-size,text-decoration');

function strip_styles($source=NULL) {
  $exceptions = str_replace(',', '|', @constant('PARSER_ALLOWED_STYLES_'));

/* First we want to fix anything that might potentially break the styler stripper, sow e try and replace
   * in-text instances of : with its html entity replacement.
   */

function Replacer($text) {
   $check = array (
       '@:@s',
   );
   $replace = array(
       ':',
   );

return preg_replace($check, $replace, $text[0]);
  }

$source = preg_replace_callback('@>(.*)<@Us', 'Replacer', $source);

$regexp =

'@([^;"]+)?(?<!'. $exceptions. ')(?<!\>\w):(?!\/\/(.+?)\/|<|>)((.*?)[^;"]+)(;)?@is';

$source = preg_replace($regexp, '', $source);

$source = preg_replace('@[a-z]*=""@is', '', $source);

return $source;
}

?>

rybasso
16-Mar-2006 05:33


"Document contains no data" message in FF and 'This page could not be
found' in IE occures when you pass too long <i>subject</i>
string to preg_replace() with default <i>limit</i>.

Increment the limit to be sure it's larger than a subject lenght.

Ciprian Amariei
16-Mar-2006 06:50


Here is a function that replaces the links (http(s)://, ftp://) with respective html anchor, and also makes all anchors open in a new window.

function make_links_blank( $text )
{
 
 return  preg_replace( array(
               "/[^\"'=]((http|ftp|https):\/\/[^\s\"']+)/i",
               "/<a([^>]*)target=\"?[^\"']+\"?/i",
               "/<a([^>]+)>/i"
       ),
         array(
               "<a href=\"\\1\">\\1</a>",
               "<a\\1",
               "<a\\1 target=\"_blank\" >"
           ),
       $text
       );
}

felipensp at gmail dot com
13-Mar-2006 01:02


Sorry, I don't know English.

Replacing letters of badword for a definite character.
View example:

<?php

function censured($string, $aBadWords, $sChrReplace) {

foreach ($aBadWords as $key => $word) {

// Regexp for case-insensitive and use the functions
       $aBadWords[$key] = "/({$word})/ie";

}

// to substitue badwords for definite character
   return preg_replace($aBadWords,
                       "str_repeat('{$sChrReplace}', strlen('\\1'))",
                       $string
                       );

}

// To show modifications
print censured('The nick of my friends are rand, v1d4l0k4, P7rk, ferows.',
               array('RAND', 'V1D4L0K4', 'P7RK', 'FEROWS'),
               '*'
               );
  
?>

Graham Dawson graham_at_imdanet_dot_com
07-Mar-2006 05:32


Inspired by the query-string cleaner from greenthumb at 4point-webdesign
dot com and istvan dot csiszar at weblab dot hu. This little bit of
code cleans up any "style" attributes in your tags, leaving behind only
styles that you have specifically allowed. Also conveniently strips out
nonsense styles. I've not fully tested it yet so I'm not sure if it'll
handle features like url(), but that shouldn't be a difficulty.

<?php

/* The string would normally be a form-submitted html file or text string */

$string = '<span
style="font-family:arial; font-size:20pt; text-decoration:underline;
sausage:bueberry;" width="200">Hello there</span> This is some
<div style="display:inline;">test text</div>';

/* Array of styles to allow. */

$except = array('font-family', 'text-decoration');

$allow = implode($except, '|');

/* The monster beast regexp. I was up all night trying to figure this one out. */

$regexp = '@([^;"]+)?(?<!'.$allow.'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';

print str_replace('<', '<', $regexp).'<br/><br/>';

$out = preg_replace($regexp, '', $string);

/* Now lets get rid of any unwanted empty style attributes */

$out = preg_replace('@[a-z]*=""@is', '', $out);

print $out;

?>

This should produce the following:

<span style="font-family:arial; text-decoration:underline;"
width="200">Hello there</span> This is some <div >test
text</div>

Now, I'm a relative newbie at this so I'm sure there's a better way to do it. There's *always* a better way.

elaineseery at hotmail dot com
15-Feb-2006 10:44


if you're new to this function, and getting an error like
'delimiter must not alphanumeric backslash ...

note that whatever is in $pattern (and only $pattern, not $string, or
$replacement) must be enclosed by '/  /' (note the forward slashes)

e.g.
$pattern = '/and/';
$replacement = 'sandy';
$string = 'me and mine';

generates 'me sandy mine'

seems to be obvious to everyone else, but took me a while to figure out!!

jsirovic at gmale dot com
08-Feb-2006 01:23


If the lack of &$count is aggravating in PHP 4.x, try this:

$replaces = 0;

$return .= preg_replace('/(\b' . $substr . ')/ie', '"<$tag>$1<$end_tag>" . (substr($replaces++,0,0))', $s2, $limit);

no-spam@idiot^org^ru
05-Feb-2006 04:21


decodes ie`s escape() result

<?

function unicode_unescape(&$var, $convert_to_cp1251 = false){
   $var = preg_replace(
       '#%u([\da-fA-F]{4})#mse',
       $convert_to_cp1251 ? '@iconv("utf-16","windows-1251",pack("H*","\1"))' : 'pack("H*","\1")',
       $var
   );
}

//

$str = 'to %u043B%u043E%u043F%u0430%u0442%u0430 or not to %u043B%u043E%u043F%u0430%u0442%u0430';

unicode_unescape($str, true);

echo $str;

?>

leandro[--]ico[at]gm[--]ail[dot]com
05-Feb-2006 01:40


I've found out a really odd error.

When I try to use the 'empty' function in the replacement string (when
using the 'e' modifier, of course) the regexp interpreter get stucked at
that point.

An examplo of this failure:

<?php
echo $test = preg_replace( "/(bla)/e", "empty(123)", "bla bla ble" );

# it should print something like:
# "1 1 ble"
?>

Very odd, huh?

04-Feb-2006 12:00


fairly useful script to replace normal html entities with ordinal-value
entities.  Useful for writing to xml documents where entities aren't
defined.
<?php
$p='#(\&[\w]+;)#e';
$r="'&#'.ord(html_entity_decode('$1')).';'";
$text=preg_replace($p,$r,$_POST['data']);
?>

Rebort
03-Feb-2006 03:51


Following up on pietjeprik at gmail dot com's great string to parse [url] bbcode:
<?php
$url = '[url=http://www.foo.org]The link[/url]';
$text = preg_replace("/\[url=(\W?)(.*?)(\W?)\](.*?)\[\/url\]/", '<a href="$2">$4</a>', $url);
?>

This allows for the user to enter variations:

[url=http://www.foo.org]The link[/url]
[url="http://www.foo.org"]The link[/url]
[url='http://www.foo.org']The link[/url]

or even

[url=#http://www.foo.org#]The link[/url]
[url=!http://www.foo.org!]The link[/url]

01-Feb-2006 02:23


Uh-oh. When I looked at the text in the preview, I had to double the number of backslashes to make it look right.
I'll try again with my original text:

$full_text = preg_replace('/\[p=(\d+)\]/e',
  "\"<a href=\\\"./test.php?person=$1\\\">\"
   .get_name($1).\"</a>\"",
   $short_text);

I hope that it comes out correctly this time :-)

leif at solumslekt dot org
01-Feb-2006 12:24


I've found a use for preg_replace. If you've got eg. a database with
persons assiciated with numbers, you may want to input links in a kind
of shorthand, like [p=12345], and have it expanded to a full url with a
name in it.

This is my solution:

$expanded_text = preg_replace('/\\[p=(\d+)\\]/e',
   "\\"<a href=\\\\\\"./test.php?person=$1\\\\\\">\\".get_name($1).\\"</a&>\\"",
       $short_text);

It took me some time to work out the proper number of quotes and backslashes.

regards, Leif.

SG_01
20-Jan-2006 08:43


Re: wcc at techmonkeys dot org

You could put this in 1 replace for faster execution as well:

<?php

/*
 * Removes all blank lines from a string.
 */
function removeEmptyLines($string)
{
   return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}

?>

05-Jan-2006 06:09


First, I have no idea about regexp, all I did has been through trial and error,
I wrote this function which tries to clean crappy ms word html, I use it
to clean user pasted code to online wysiwyg online editors from ms
word.
Theres a huge space for improvement, I post it here because after
searching I could not find any pure php solution, the best alternative
however, is tidy, but for those of us who are still using PHP 4 and do
not have access to the server, this could be an alternative, use it
under your own risk... once again, it was a quickie and I know there can
be much better ways to do this:

function decraper($htm, $delstyles=false) {
   $commoncrap = array('&quot;'
   ,'font-weight: normal;'
   ,'font-style: normal;'
   ,'line-height: normal;'
   ,'font-size-adjust: none;'
   ,'font-stretch: normal;');
   $replace = array("'");
   $htm = str_replace($commoncrap, $replace, $htm);
     $pat = array();
   $rep = array();
   $pat[0] = '/(<table\s.*)(width=)(\d+%)(\D)/i';
   $pat[1] = '/(<td\s.*)(width=)(\d+%)(\D)/i';
   $pat[2] = '/(<th\s.*)(width=)(\d+%)(\D)/i';
   $pat[3] = '/<td( colspan="[0-9]+")?( rowspan="[0-9]+")?
( width="[0-9]+")?( height="[0-9]+")?.*?>/i';
   $pat[4] = '/<tr.*?>/i';
   $pat[5]=
'/<\/st1:address>(<\/st1:\w*>)?
<\/p>[\n\r\s]*<p[\s\w="\']*>/i';
   $pat[6] = '/<o:p.*?>/i';
   $pat[7] = '/<\/o:p>/i';
   $pat[8] = '/<o:SmartTagType[^>]*>/i';
   $pat[9] = '/<st1:[\w\s"=]*>/i';
   $pat[10] = '/<\/st1:\w*>/i';
   $pat[11] = '/<p[^>]*>(.*?)<\/p>/i';
   $pat[12] = '/ style="margin-top: 0cm;"/i';
   $pat[13] = '/<(\w[^>]*) class=([^ |>]*)([^>]*)/i';
   $pat[14] = '/<ul(.*?)>/i';
   $pat[15] = '/<ol(.*?)>/i';
   $pat[17] = '/<br \/>&nbsp;<br \/>/i';
   $pat[18] = '/&nbsp;<br \/>/i';
   $pat[19] = '/<!-.*?>/';
   $pat[20] = '/\s*style=(""|\'\')/';
   $pat[21] = '/ style=[\'"]tab-interval:[^\'"]*[\'"]/i';
   $pat[22] = '/behavior:[^;\'"]*;*(\n|\r)*/i';
   $pat[23] = '/mso-[^:]*:"[^"]*";/i';
   $pat[24] = '/mso-[^;\'"]*;*(\n|\r)*/i';
   $pat[25] = '/\s*font-family:[^;"]*;?/i';
   $pat[26] = '/margin[^"\';]*;?/i';
   $pat[27] = '/text-indent[^"\';]*;?/i';
   $pat[28] = '/tab-stops:[^\'";]*;?/i';
   $pat[29] = '/border-color: *([^;\'"]*)/i';
   $pat[30] = '/border-collapse: *([^;\'"]*)/i';
   $pat[31] = '/page-break-before: *([^;\'"]*)/i';
   $pat[32] = '/font-variant: *([^;\'"]*)/i';
   $pat[33] = '/<span [^>]*><br \/><\/span><br \/>/i';
   $pat[34] = '/" "/';
   $pat[35] = '/[\t\r\n]/';
   $pat[36] = '/\s\s/s';
   $pat[37] = '/ style=""/';
   $pat[38] = '/<span>(.*?)<\/span>/i';
//empty (no attribs) spans
   $pat[39] = '/<span>(.*?)<\/span>/i';
//twice, nested spans
   $pat[40] = '/(;\s|\s;)/';
   $pat[41] = '/;;/';
   $pat[42] = '/";/';
   $pat[43] = '/<li(.*?)>/i';
   $pat[44] =
'/(<\/b><b>|<\/i><i>|<\/em><em>|
<\/u><u>|<\/strong><strong>)/i';
   $rep[0] = '$1$2"$3"$4';
   $rep[1] = '$1$2"$3"$4';
   $rep[2] = '$1$2"$3"$4';
   $rep[3] = '<td$1$2$3$4>';
   $rep[4] = '<tr>';
   $rep[5] = '<br />';
   $rep[6] = '';
   $rep[7] = '<br />';
   $rep[8] = '';
   $rep[9] = '';
   $rep[10] = '';
   $rep[11] = '$1<br />';
   $rep[12] = '';
   $rep[13] = '<$1$3';
   $rep[14] = '<ul>';
   $rep[15] = '<ol>';
   $rep[17] = '<br />';
   $rep[18] = '<br />';
   $rep[19] = '';
   $rep[20] = '';
   $rep[21] = '';
   $rep[22] = '';
   $rep[23] = '';
   $rep[24] = '';
   $rep[25] = '';
   $rep[26] = '';
   $rep[27] = '';
   $rep[28] = '';
   $rep[29] = '';
   $rep[30] = '';
   $rep[31] = '';
   $rep[32] = '';
   $rep[33] = '<br />';
   $rep[34] = '""';
   $rep[35] = '';
   $rep[36] = '';
   $rep[37] = '';
   $rep[38] = '$1';
   $rep[39] = '$1';
   $rep[40] = ';';
   $rep[41] = ';';
   $rep[42] = '"';
   $rep[43] = '<li>';
   $rep[44] = '';
   if($delstyles===true){
       $pat[50] = '/ style=".*?"/';
       $rep[50] = '';
   }
   ksort($pat);
   ksort($rep);
   return $htm;
}

Hope it helps, critics are more than welcome.

kyle at vivahate dot com
23-Dec-2005 04:08


Here is a regular expression to "slashdotify" html links.  This has
worked well for me, but if anyone spots errors, feel free to make
corrections.

<?php
$url = '<a attr="garbage" href="http://us3.php.net/preg_replace">preg_replace - php.net</a>';
$url = preg_replace( '/<.*href="?(.*:\/\/)?([^ \/]*)([^ >"]*)"?[^>]*>(.*)(<\/a>)/', '<a href="$1$2$3">$4</a> [$2]', $url );
?>

Will output:

<a href="http://us3.php.net/preg_replace">preg_replace - php.net</a> [us3.php.net]

istvan dot csiszar at weblab dot hu
21-Dec-2005 05:53


This is an addition to the previously sent removeEvilTags function. If
you don't want to remove the style tag entirely, just certain style
attributes within that, then you might find this piece of code useful:

<?php

function removeEvilStyles($tagSource)
{
   // this will leave everything else, but:
   $evilStyles = array('font', 'font-family', 'font-face', 'font-size', 'font-size-adjust', 'font-stretch', 'font-variant');

$find = array();
   $replace = array();
  
   foreach ($evilStyles as $v)
   {
       $find[]    = "/$v:.*?;/";
       $replace[] = '';
   }
  
   return preg_replace($find, $replace, $tagSource);
}

function removeEvilTags($source)
{
   $allowedTags = '<h1><h2><h3><h4><h5><a><img><label>'.
       '<p><br><span><sup><sub><ul><li><ol>'.
       '<table><tr><td><th><tbody><div><hr><em><b><i>';
   $source = strip_tags(stripslashes($source), $allowedTags);
   return trim(preg_replace('/<(.*?)>/ie', "'<'.removeEvilStyles('\\1').'>'", $source));
}

?>

triphere
18-Dec-2005 01:13


to remove Bulletin Board Code (remove bbcode)

$body = preg_replace("[\[(.*?)\]]", "", $body);

jcheger at acytec dot com
09-Dec-2005 04:16


Escaping quotes may be very tricky. Magic quotes and preg_quote are not
protected against double escaping. This means that an escaped quote will
get a double backslash, or even more. preg_quote ("I\'m using regex")
will return "I\\'m using regex".

The following example escapes only unescaped single quotes:

<?php
$a = "I'm using regex";
$b = "I\'m using regex";

$patt = "/(?<!\\\)\'/";
$repl = "\\'";

print "a:  ".preg_replace ($patt, $repl, $a)."\n";
print "b:  ".preg_replace ($patt, $repl, $b)."\n";
?>

and prints:
a:  I\'m using regex
b:  I\'m using regex

Remark: matching a backslashe require a triple backslash (\\\).

urbanheroes {at} gmail {dot} com
16-Aug-2005 04:00


Here are two functions to trim a string down to a certain size.

"wordLimit" trims a string down to a certain number of words, and adds
an ellipsis after the last word, or returns the string if the limit is
larger than the number of words in the string.

"stringLimit" trims a string down to a certain number of characters, and
adds an ellipsis after the last word, without truncating any words in
the middle (it will instead leave it out), or returns the string if the
limit is larger than the string size. The length of a string will
INCLUDE the length of the ellipsis.

<?php

function wordLimit($string, $length = 50, $ellipsis = '...') {
   return count($words = preg_split('/\s+/', ltrim($string), $length + 1)) > $length ?
       rtrim(substr($string, 0, strlen($string) - strlen(end($words)))) . $ellipsis :
       $string;
}

function stringLimit($string, $length = 50, $ellipsis = '...') {
   return strlen($fragment = substr($string, 0, $length + 1 - strlen($ellipsis))) < strlen($string) + 1 ?
       preg_replace('/\s*\S*$/', '', $fragment) . $ellipsis : $string;
}

echo wordLimit('  You can limit a string to only so many words.', 6);
// Output: "You can limit a string to..."
echo stringLimit('Or you can limit a string to a certain amount of characters.', 32);
// Output: "Or you can limit a string to..."

?>

avizion at relay dot dk
25-Apr-2005 03:04


Just a note for all FreeBSD users wondering why this function is not
present after installing php / mod_php (4 and 5) from ports.

Remember to install:

/usr/ports/devel/php4-pcre (or 5 for -- 5 ;)

That's all... enjoy - and save 30 mins. like I could have used :D

jhm at cotren dot net
19-Feb-2005 06:04


It took me a while to figure this one out, but here is a nice way to use
preg_replace to convert a hex encoded string back to clear text

<?php
   $text = "PHP rocks!";
   $encoded = preg_replace(
           "'(.)'e"
         ,"dechex(ord('\\1'))"
         ,$text
   );
   print "ENCODED: $encoded\n";
?>
ENCODED: 50485020726f636b7321
<?php
   print "DECODED: ".preg_replace(
       "'([\S,\d]{2})'e"
     ,"chr(hexdec('\\1'))"
     ,$encoded)."\n";
?>
DECODED: PHP rocks!

gbaatard at iinet dot net dot au
15-Feb-2005 01:56


on the topic of implementing forum code ([b][/b] to <b></b> etc), i found this worked well...

<?php
$body = preg_replace('/\[([biu])\]/i', '<\\1>', $body);
$body = preg_replace('/\[\/([biu])\]/i', '</\\1>', $body);
?>

First line replaces [b] [B] [i] [I] [u] [U] with the appropriate html tags(<b>, <i>, <u>)

Second one does the same for closing tags...

For urls, I use...

<?php
$body = preg_replace('/\s(\w+:\/\/)(\S+)/', ' <a href="\\1\\2" target="_blank">\\1\\2</a>', $body);
?>

and for urls starting with www., i use...

<?php
$body = preg_replace('/\s(www\.)(\S+)/', ' <a href="http://\\1\\2" target="_blank">\\1\\2</a>', $body);
?>

Pop all these lines into a function that receives and returns the text you want 'forum coded' and away you go:)

tash at quakersnet dot com
30-Jan-2005 08:25


A better way for link & email conversaion, i think. :)

<?php
function change_string($str)
   {
     $str = trim($str);
     $str = htmlspecialchars($str);
     $str = preg_replace('#(.*)\@(.*)\.(.*)#','<a href="mailto:\\1@\\2.\\3">Send email</a>',$str);
     $str = preg_replace('=([^\s]*)(www.)([^\s]*)=','<a href="http://\\2\\3" target=\'_new\'>\\2\\3</a>',$str);
     return $str;
   }
?>

jw-php at valleyfree dot com
26-Jan-2005 12:28


note the that if you want to replace all backslashes in a string with
double backslashes (like addslashes() does but just for backslashes and
not quotes, etc), you'll need the following:

$new = preg_replace('/\\\\/','\\\\\\\\',$old);

note the pattern uses 4 backslashes and the replacement uses 8!  the
reason for 4 slashses in the pattern part has already been explained on
this page, but nobody has yet mentioned the need for the same logic in
the replacement part in which backslashes are also doubly parsed, once
by PHP and once by the PCRE extension.  so the eight slashes break down
to four slashes sent to PCRE, then two slashes put in the final output.

Nick
21-Jan-2005 07:05


Here is a more secure version of the link conversion code which hopefully make cross site scripting attacks more difficult.

<?php
function convert_links($str) {
       $replace = <<<EOPHP
'<a href="'.htmlentities('\\1').htmlentities('\\2').//remove line break
'">'.htmlentities('\\1').htmlentities('\\2').'</a>'
EOPHP;
   $str = preg_replace('#(http://)([^\s]*)#e', $replace, $str);
   return $str;
}
?>

ignacio paz posse
22-Oct-2004 04:22


I needed to treat exclusively long urls and not shorter ones for which
my client prefered to have their complete addresses displayed. Here's
the function I end up with:

<?php

function auto_url($txt){

# (1) catch those with url larger than 71 characters

  $pat = '/(http|ftp)+(?:s)?:(\\/\\/)'

       .'((\\w|\\.)+)(\\/)?(\\S){71,}/i';

  $txt = preg_replace($pat, "<a href=\"\\0\" target=\"_blank\">$1$2$3/...</a>",

$txt);

# (2) replace the other short urls provided that they are not contained inside an html tag already.

  $pat = '/(?<!href=\")(http|ftp)+(s)?:' .

     .'(\\/\\/)((\\w|\\.)+) (\\/)?(\\S)/i';

  $txt = preg_replace($pat,"<a href=\"$0\" target=\"_blank\">$0</a> ",

  $txt);

return $txt;

}

?>

Note the negative look behind expression added in the second
instance for exempting those that are preceded with ' href=" ' (meaning
that they were already put inside appropiate html tags by the previous
expression)

(get rid of the space between question mark and the last parenthesis
group in both regex, I need to put it like that to be able to post this
comment)

gabe at mudbuginfo dot com
19-Oct-2004 04:39


It is useful to note that the 'limit' parameter, when used with
'pattern' and 'replace' which are arrays, applies to each individual
pattern in the patterns array, and not the entire array.
<?php

$pattern = array('/one/', '/two/');
$replace = array('uno', 'dos');
$subject = "test one, one two, one two three";

echo preg_replace($pattern, $replace, $subject, 1);
?>

If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three

However, in reality this will actually return:
test uno, one dos, one two three

silasjpalmer at optusnet dot com dot au
19-Mar-2004 10:00


Using preg_rep to return extracts without breaking the middle of words
(useful for search results)

<?php
$string = "Don't split words";
echo substr($string, 0, 10); // Returns "Don't spli"

$pattern = "/(^.{0,10})(\W+.*$)/";
$replacement = "\${1}";
echo preg_replace($pattern, $replacement, $string); // Returns "Don't"
?>

j-AT-jcornelius-DOT-com
25-Feb-2004 05:02


I noticed that a lot of talk here is about parsing URLs. Try the
parse_url() function in PHP to make things easier.

http://www.php.net/manual/en/function.parse-url.php

- J.

steven -a-t- acko dot net
09-Feb-2004 01:45


People using the /e modifier with preg_replace should be aware of the
following weird behaviour. It is not a bug per se, but can cause bugs if
you don't know it's there.

The example in the docs for /e suffers from this mistake in fact.

With /e, the replacement string is a PHP expression. So when you use a
backreference in the replacement expression, you need to put the
backreference inside quotes, or otherwise it would be interpreted as PHP
code. Like the example from the manual for preg_replace:

preg_replace("/(<\/?)(\w+)([^>]*>)/e",
             "'\\1'.strtoupper('\\2').'\\3'",
             $html_body);

To make this easier, the data in a backreference with /e is run through
addslashes() before being inserted in your replacement expression. So if
you have the string

He said: "You're here"

It would become:

He said: \"You\'re here\"

...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:

print ' He said: \"You\'re here\" ';
 Output: He said: \"You're here\"

This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.

Using double-quotes to surround the string/backreference will not help
either, because inside double-quotes, the sequence \' is not recognized
and also output literally. And in fact, if you have any dollar signs in
your data, they would be interpreted as PHP variables. So double-quotes
are not an option.

The 'solution' is to manually fix it in your expression. It is easiest
to use a separate processing function, and do the replacing there (i.e.
use "my_processing_function('\\1')" or something similar as replacement
expression, and do the fixing in that function).

If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);

People using preg_replace with /e should at least be aware of this.

I'm not sure how it would be best fixed in preg_replace. Because
double-quotes are a really bad idea anyway (due to the variable
expansion), I would suggest that preg_replace's auto-escaping is
modified to suit the placement of backreferences inside single-quotes
(which seemed to be the intention from the start, but was incorrectly
applied).

Peter
02-Nov-2003 09:00


Suppose you want to match '\n' (that's backslash-n, not newline). The
pattern you want is not /\\n/ but /\\\\n/. The reason for this is that
before the regex engine can interpret the \\ into \, PHP interprets it.
Thus, if you write the first, the regex engine sees \n, which is reads
as newline. Thus, you have to escape your backslashes twice: once for
PHP, and once for the regex engine.

Travis
18-Oct-2003 06:37


I spent some time fighting with this, so hopefully this will help someone else.

Escaping a backslash (\) really involves not two, not three, but four backslashes to work properly.

So to match a single backslash, one should use:

preg_replace('/(\\\\)/', ...);

or to, say, escape single quotes not already escaped, one could write:

preg_replace("/([^\\\\])'/", "\$1\'", ...);

Anything else, such as the seemingly correct

preg_replace("/([^\\])'/", "\$1\'", ...);

gets evaluated as escaping the ] and resulting in an unterminated character class.

I'm not exactly clear on this issue of backslash proliferation, but it
seems to involve the combination of PHP string processing and PCRE
processing.

PHP preg_replace的更多相关文章

  1. 正则表达式preg_replace中危险的/e修饰符带来的安全漏洞问题

    mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit]) /e 修饰符使 preg_rep ...

  2. PHP模板引擎正则替换函数 preg_replace 与 preg_replace_callback 使用总结

    在编写PHP模板引擎工具类时,以前常用的一个正则替换函数为 preg_replace(),加上正则修饰符 /e,就能够执行强大的回调函数,实现模板引擎编译(其实就是字符串替换). 详情介绍参考博文:P ...

  3. PHP preg_replace使用例子

    将 qwer://xxx/545/0 替换为 qwer://trading_system_xxx/0/545 $str = '<a href="qwer://xxx/545/0&quo ...

  4. [fortify] preg_replace命令注入

    慎用preg_replace危险的/e修饰符(一句话后门常用) 作者: 字体:[增加 减小] 类型:转载 时间:2013-06-19我要评论 要确保 replacement 构成一个合法的 PHP 代 ...

  5. str_replace vs preg_replace

    转自:http://benchmarks.ro/2011/02/str_replace-vs-preg_replace/ 事实证明str_replace确实比preg_replace快. If you ...

  6. PHP函数preg_replace() 正则替换所有符合条件的字符串

    PHP preg_replace() 正则替换,与JavaScript 正则替换不同,PHP preg_replace() 默认就是替换所有符号匹配条件的元素. preg_replace (正则表达式 ...

  7. preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead

    由于方法preg_replace()为PHP 5.5.x 中废弃的特性,官方建议需要在代码中将preg_replace()替换为函数preg_replace_callback,可以问题解决. 具体请见 ...

  8. [2012-4-10]ThinkPHP框架被爆任意代码执行漏洞(preg_replace)

    昨日(2012.04.09)ThinkPHP框架被爆出了一个php代码任意执行漏洞,黑客只需提交一段特殊的URL就可以在网站上执行恶意代码. ThinkPHP作为国内使用比较广泛的老牌PHP MVC框 ...

  9. PHP 文件系统管理函数与 preg_replace() 函数过滤代码

    案例:在带行号的代码至文件 crop.js 中.用两种方法去掉代码前面的行号,带行号的代码片段: 1.$(function(){ 2. //初始化图片区域 3. var myimg = new Ima ...

  10. ECSHOP如何解决Deprecated: preg_replace()报错 Home / 开源程序 / ECSHOP / ECSHOP如何解决Deprecated: preg_replace()报错

    Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in D:\w ...

随机推荐

  1. toString() 和 (String) 以及 valueOf() 三者的对照关系[java]

    简述 在Java中,往往需要把一个类型的变量转换成String 类型.作为菜鸟,有时候我会使用(String) data,有时候就使用data.toString(),如果不行还会试试 String.v ...

  2. textarea去掉边框

    <textarea style="BORDER-BOTTOM: 0px solid; BORDER-LEFT: 0px solid; BORDER-RIGHT: 0px solid; ...

  3. easyui 扩展layout的方法,支持动态添加删除块

    $.extend($.fn.layout.methods, { remove: function(jq, region){ return jq.each(function(){ var panel = ...

  4. 用dnSpy破解某旅游系统5.2版。

    某系统是网上最常见也是目前最好用的旅游站系统之一,5.1版本之前采用的maxtocode加壳后可以用de4dot反混淆后破解.5.1版本以后用de4dot无法脱壳. 本文仅限学习和讨论,请勿做侵权使用 ...

  5. 第二百三十九节,Bootstrap路径分页标签和徽章组件

    Bootstrap路径分页标签和徽章组件 学习要点: 1.路径组件 2.分页组件 3.标签组件 4.徽章组件 本节课我们主要学习一下 Bootstrap 的四个组件功能:路径组件.分页组件.标签组件 ...

  6. kafka 安装步骤

    kafka安装文档 1.解压缩(官网下载:http://kafka.apache.org/downloads.html) tar -xzf kafka_2.10-0.8.2.0.tgz cd kafk ...

  7. cout顺序,i++和++i

    先看下以下代码 #include<iostream> using namespace std; ; int f1() { x = ; return x; } int f2() { x = ...

  8. 【转】CStdioFile UNICODE编译 英文系统下读取中文汉字乱码解决

    转载出处:http://www.cnblogs.com/ct0421/p/3242418.html 函数原形为:char *setlocale( int category, const char *l ...

  9. cygwin下编译zlib源代码

    本文介绍在cygwin下编译zlib源代码的方法步骤,这个过程尽管简单,但还是有一些须要注意的地方. zlib源代码下载地址: http://www.zlib.net/ 下载后.解压就会生成zlib- ...

  10. delphi 快捷键的使用

    CTRL+SPACE 代码补全,很好用的(先改了输入法热键)CTRL+SHIFT+C 编写申明或者补上函数CTRL+SHIFT+↑(↓) 在过程.函数.事件内部, 可跳跃到相应的过程.函数.事件的定义 ...