PHP - Manual: mb_ereg_replace

2025-12-06

mb_ereg_replace

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

mb_ereg_replace — Replace regular expression with multibyte support

说明

mb_ereg_replace(
    string $pattern,
    string $replacement,
    string $string,
    ?string $options = null
): string|false|null

Scans string for matches to pattern, then replaces the matched text with replacement

参数

pattern

The regular expression pattern.

Multibyte characters may be used in pattern.

replacement

The replacement text.

string

The string being checked.

options

The search option. See mb_regex_set_options() for explanation.

返回值

The resultant string on success, or false on error. If string is not valid for the current encoding, null is returned.

更新日志

版本	说明
8.0.0	`options` is nullable now.
7.1.0	The function checks whether `string` is valid for the current encoding.
7.1.0	The `e` modifier has been deprecated.

注释

注意:
mb_regex_encoding() 指定的内部编码或字符编码将会当作此函数用的字符编码。

警告

处理非信任的输入时从不使用 e 修饰符，就不会转码（即调用 preg_replace()）。不注意这些会很可能会导致应用程序引发远程代码执行的漏洞。

参见

mb_regex_encoding() - Set/Get character encoding for multibyte regex
mb_eregi_replace() - Replace regular expression with multibyte support ignoring case

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 17 notes

down

125

Pluche ¶

14 years ago

Unlike preg_replace, mb_ereg_replace doesn't use separators


Exemple with preg_replace :

<?php $data = preg_replace("/[^A-Za-z0-9\.\-]/","",$data); ?>



Exemple with mb_ereg_replace :

<?php $data = mb_ereg_replace("[^A-Za-z0-9\.\-]","",$data); ?>

down

daemoneye at gmail dot com ¶

16 years ago

I got a pretty nasty error while trying to parse table rows(all contents were set to UTF-8) from the database for a dictionary project. The idea was to get all the rows from the first table (that is a table with bulgarian phrase in the first field, and its translation in english, french and german in the next fields). I needed to index all the bulgarian words that are found in the table to make an intelligent search. And that is where my headache started.

First of all, even with mb_strtolower() a lot of cyrillic characters went corrupted (ex: 'т,ъ,у,ф,б,г,з,ж,' etc...). After an hour of different attempts I got such a solution:

<?php

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

$rows = $db->getRows();

$contents = array();
foreach ($rows as $eachRow)
{
$cleared = str_replace($commonWords, ' ', mb_strtolower(stripslashes($eachRow['bulgarian']), 'UTF-8' ));
    if (trim($cleared) != '') $contents[] = trim($cleared);
}    

$list = array();
foreach ($contents as $eachRow)
{
$exploded = explode(' ', $eachRow);
    foreach ($exploded as $eachExpl)
    {
$eachExpl = mb_ereg_replace('[^а-я ]',' ', $eachExpl);
        if (trim($eachExpl) != '') 
            if (!in_array($eachExpl, $list, true))    $list[] = trim($eachExpl);
    }
}

?>

To work properly I got to set all the internal encoding settings to UTF-8. Else the default Latin-1 got half my database with missing characters.

I am posting this solution just in case someone has encountered a similar problem. Hope it helps you in case you need something like that.

down

trng ¶

13 years ago

You can use \\n for capture group in replacement.
And you can NOT use $n notation (unlike preg_replace function).

down

Anonymous ¶

9 years ago

Pluche's comment should REALLY be added to the documentation, preferably under the "$pattern" param description. It is crucial to using this function.

down

keizo at gomo dot jp ¶

16 years ago

<?php

$pattern = "([あ-ん]+)[0-9]+";

$string = mb_ereg_replace($pattern, '「\\1」:\\0', $string);

?>



you can use \\n for capture group in replacement

down

Alexey Khrulev ¶

7 years ago

If encoding of PHP script differs from encoding of string to be processed by mb_ereg_replace(), then you can't just write pattern in script. Both $pattern and $replacement must be converted to same encoding as string to be processed. In this example script is in UTF-8, file to be processed is in UTF-16LE encoding:

<?php
$file_encoding = 'UTF-16LE';
mb_regex_encoding( $file_encoding );

$pattern     = "aaa";
$replacement = "AAA";
$pattern_encoded     = mb_convert_encoding( $pattern,     $file_encoding, 'UTF-8' );
$replacement_encoded = mb_convert_encoding( $replacement, $file_encoding, 'UTF-8' );

$result = mb_ereg_replace( $pattern_encoded, $replacement_encoded, file_get_contents('UTF-16LE.txt') );
file_put_contents('UTF-16LE-updated.txt', $result);
?>

down

Anonymous ¶

18 years ago

'i' option does not work correctly with multibyte characters. The function does not locate/replace the multibyte string if it's different case then specified on multibyte needle which is in different case.

down

faxe at neostrada dot pl ¶

19 years ago

A simple mb_str_ireplace() implementation - a faster (?) replacement for non-regexp multi-byte string replacement:


<?php

function mb_str_ireplace($co, $naCo, $wCzym)

{

$wCzymM = mb_strtolower($wCzym);

$coM    = mb_strtolower($co);

$offset = 0;


        while(!is_bool($poz = mb_strpos($wCzymM, $coM, $offset)))

    {

$offset = $poz + mb_strlen($naCo);

$wCzym = mb_substr($wCzym, 0, $poz). $naCo .mb_substr($wCzym, $poz+mb_strlen($co));

$wCzymM = mb_strtolower($wCzym);

    }


    return $wCzym;

}

?>



[thiago - EDITOR NOTE: This function has improvements from d-okumura [aat] fi{dot}kyd[dot]co.jp]

down

Anonymous ¶

2 years ago

Notations to reference captures in the replacement string:

<?php

// (1) \\number notation: (1 to 9, not greater than 9)
echo mb_ereg_replace('(\S*) (\S*) (\S*)', '\\1 jam, \\2 juice, \\3 squash', 'apple orange lemon').'<br>'; // apple jam, orange juice, lemon squash

// (2) \k<number> notation: (also greater than 9) (also as \k'number')
echo mb_ereg_replace('(\S*) (\S*) (\S*)', '\k<1> jam, \k<2> juice, \k<3> squash', 'apple orange lemon').'<br>'; // (same as above)

// (3) \k<word> notation: (also as \k'word')
echo mb_ereg_replace('(?<word1>\S*) (?<word2>\S*) (?<word3>\S*)', '\k<word1> jam, \k<word2> juice, \k<word3> squash', 'apple orange lemon').'<br>'; // (same as above)

// Note non-named-subpatterns like "(\S*)" should not be used with named-subpatterns like "(?<word>..)" because non-named-subpatterns cannot be captured when named-subpatterns exist.

down

-1

j-fr dot fortier at wanadoo dot fr ¶

6 years ago

Since PHP 5.4, to make uppercase ou lowercase characters, or rewrite some uris, without to take care about initial encoding, the transliteration is easier (and probably the best way): see http://php.net/manual/fr/transliterator.transliterate.php and http://userguide.icu-project.org/transforms/general

For example (with create) (french text: replace all accuentued -éèàîïùç...- chars with ascii chars):
<?php
$transliterator = Transliterator::create("NFD; [:Nonspacing Mark:] Remove; NFC;");
echo $transliterator->transliterate("Héhé, ça marche !");
?>
// Result: « Hehe, ca marche ! »

To rewrite a phrase in URI (with createFromRules):
<?php
$transliterator = Transliterator::createFromRules("::Latin-ASCII; ::Lower; [^[:L:][:N:]]+ > '-';");
echo trim($transliterator->transliterate("Héhé, ça marche !"), '-');
?>
// Result : « hehe-ca-marche »

down

marco at thenetworksolution dot it ¶

11 years ago

To selectively uppercase parts of a string via mb_eregi_replace

    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper
('\\1')", $str, 'e');

Full example, how to fix an address manually typed, uppercasing the first letter of a words and keeping uppercase roman numerals and the letters A,B,C after the house number):

function ucAddress($str) {
// first lowercase all and use the default ucwords
    $str = ucwords(strtolower($str));
// let's fix the default ucwords...
// uppercase letters after house number (was lowercased by the strtolower above)
    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper
('\\1')", $str, 'e');
// the same for roman numerals
    $str = mb_eregi_replace('\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b', "strtoupper('\\0')", $str, 'e');
    return $str;
}

down

vondrej(at)gmail(dot)com ¶

18 years ago

Are you looking for htmlentities() for multibyte strings? This might help you - it just replace <, >, ", '


<?php

/**

 *  Multibyte equivalent for htmlentities() [lite version :)]

 *

 * @param string $str

 * @param string $encoding

 * @return string

 **/

function mb_htmlentities($str, $encoding = 'utf-8') {

mb_regex_encoding($encoding);

$pattern = array('<', '>', '"', '\'');

$replacement = array('&lt;', '&gt;', '&quot;', '&#39;');

    for ($i=0; $i<sizeof($pattern); $i++) {

$str = mb_ereg_replace($pattern[$i], $replacement[$i], $str);

    }

    return $str;

}

?>

down

mpnicholas [@t] gmail (dot) com ¶

18 years ago

Regarding the mb_str_ireplace() function: I benchmarked it against mb_eregi_replace() for single-character substitution, and it was significantly slower. Despite avoiding the ereg call, I think the while loop ends slowing you down too much for this to be practical.

down

-2

gmx dot net at ulrich dot mierendorff ¶

16 years ago

If you want to replace characters like "ä" or "ø" you can use mb_ereg_replace, but it is very slow. str_replace is much faster and also works with characters like "ä" or "ø"!

I think this has something to with the fact that str_replace works on byte level and does not care about characters.
I hope that can help.

down

-3

marco at thenetworksolution dot it ¶

11 years ago

To selectively uppercase parts of a string via mb_eregi_replace

    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper
('\\1')", $str, 'e');

Full example, how to fix an address manually typed, uppercasing the first letter of a words and keeping uppercase roman numerals and the letters A,B,C after the house number):

function ucAddress($str) {
// first lowercase all and use the default ucwords
    $str = ucwords(strtolower($str));
// let's fix the default ucwords...
// uppercase letters after house number (was lowercased by the strtolower above)
    $str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper
('\\1')", $str, 'e');
// the same for roman numerals
    $str = mb_eregi_replace('\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b', "strtoupper('\\0')", $str, 'e');
    return $str;
}

Dr. Marco Marsala
Network Solution srl
http://www.realizzazionesitigenova.it

down

-1

squeegee ¶

18 years ago

well, if you just calculated the length of the find and replace strings once instead of on every loop, it would likely speed it up a lot.

down

-4

ms2705335 at gmail dot com ¶

7 years ago

As trng mentioned before you can use \\n for replacement but NOT \\\\n as mentioned in preg_replace docs. So string definition will be like:
$str = '\\1';

＋添加备注

官方地址：https://www.php.net/manual/en/function.mb-ereg-replace.php

有任何技术问题请点击这里网站运营推广招聘

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch HTML/CSS/XML 面试数据库网络 JAVA NoSQL C/C++ Golang 操作系统 Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 商业硬件电子娱乐设计摄影 nginx WordPress 游戏 HTTP 团建数码电器 Docker 大模型

php7.3 使用 PDO_DM 扩展连接 DM8 中文乱码 PhpStorm中PHP注释的规范指南使用PHPWord将docx文件转换为html格式 docker-compose启动nginx与php-fpm laravel查看orm生成的sql PHPStorm ESC 会退出命令行 composer install参数 laravel orm中DB::insert方法导致内存泄漏的问题解决方法 php7 安装fileinfo扩展 adodb手册 ADORecordSet对象 opcache预加载 ADOConnection 公用函数 Composer的Packagist资源 php 将字符串中的连续多个空格转换为一个空格常用的php ADODB使用方法集锦 adodb连接mysql多个数据库的问题 [鸟哥]PHP_INT_MIN 和 -9223372036854775808 利用PHP SOAP实现WEB SERVICE composer基本用法

略微加速

PHP官方手册 - 互联网笔记

mb_ereg_replace

说明

参数

返回值

更新日志

注释

参见

发现了问题？

用户贡献的备注 17 notes