Hyphen Inside Sets in PHP Regex
The hyphen is also a special character inside []
(but outside - it is not). If you need the hyphen itself
as a character - place it where it will not be interpreted
as a range separator.
Why this is important: you might create a character range
without even noticing it. For example, like this - [:-@]
- you think you are
selecting a colon, a hyphen, and an at sign (@), but
actually, you get a range of characters between
:
and @
. This range includes
the following characters: :
, ;
, ?
,
<
, =
, >
.
Where did they come from? From the ASCII table - the colon has a lower number than the at sign - and thus a range is created. That is, all ranges are based on the ASCII table (you can use this to your advantage if desired).
How to deal with this: place the hyphen symbol
where it definitely will not be interpreted as a
range symbol, for example, at the beginning or at the end
(i.e., after [
or before ]
).
You can also escape the hyphen - then
it will represent itself regardless of its
position. For example, instead of [:-@]
, write
[:\-@]
- and there will be no range, but
three characters - colon, hyphen, and at sign @.
Example
In the following example, the search pattern is:
digit 1
, then a letter from 'a'
to 'z'
, then digit 2:
<?php
$str = '1a2 1-2 1c2 1z2';
$res = preg_replace('#1[a-z]2#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! 1-2 ! !'
Example
Let's now escape the hyphen. As a result,
the search pattern is: digit 1
, then
letter 'a'
, or hyphen, or letter 'z'
,
then digit 2:
<?php
$str = '1a2 1-2 1c2 1z2';
$res = preg_replace('#1[a\-z]2#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! ! 1c2 !'
Example
You can simply reposition the hyphen without escaping it:
<?php
$str = '1a2 1-2 1c2 1z2';
$res = preg_replace('#1[az-]2#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! ! 1c2 !'
Example
In the following example, the search pattern is:
the first character is a lowercase letter or
a hyphen '-'
, then two letters 'x'
:
<?php
$str = 'axx Axx -xx @xx';
$res = preg_replace('#[a-z-]xx#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! Axx ! @xx'
Example
In the following example, the search pattern is:
the first character is a lowercase, uppercase
letter or a hyphen '-'
, then two letters
'x'
:
<?php
$str = 'axx Axx -xx @xx';
$res = preg_replace('#[a-zA-Z-]xx#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! ! ! @xx'
Example
You can place the hyphen between two groups - there it certainly won't create another range:
<?php
$str = 'axx 9xx -xx @xx';
$res = preg_replace('#[a-z-0-9]xx#', '!', $str);
?>
As a result, the following will be stored in the variable:
'! ! ! @xx'
Practice Tasks
Given a string:
<?php
$str = 'xaz xBz xcz x-z x@z';
?>
Find all strings matching the following pattern:
letter 'x'
, an uppercase or lowercase
letter or a hyphen, letter 'z'
.
Given a string:
<?php
$str = 'xaz x$z x-z xcz x+z x%z x*z';
?>
Find all strings matching the following pattern:
letter 'x'
, then either a dollar sign, or
a hyphen, or a plus sign, then letter 'z'
.