Pockets in Regular Expressions in PHP
The contents of the pockets are available not only in the replacement string but also in the regular expression itself: we can put something in a pocket and then right in the regular expression, specify that the content of this pocket should be here.
The contents of the pockets are available by their numbers,
preceded by a backslash. For example,
the zeroth pocket will be available like this: \0
,
the first pocket like this - \1
, the second
- \2
and so on.
I am sure that everything written above is still very vague for you. This is not surprising, as pockets are the most difficult-to-understand part of regular expressions. Let's figure it out with examples.
Example
Suppose we have the following string:
<?php
$str = 'aa bb cd ef';
?>
Let's find all places in it where two identical letters stand in a row. To solve the problem, we will search for any letter, put it in a pocket, and then check if the next character is the content of this pocket:
<?php
$res = preg_replace('#([a-z])\1#', '!', $str);
?>
As a result, the following will be written to the variable:
'! ! cd ef'
Example
Suppose we have the following string:
<?php
$str = 'asxca buzxb csgd';
?>
Let's find all words in it where the first and last letters are the same. To solve the problem, we will write the following pattern: a letter, then one or more letters, and then the same letter as the first one:
<?php
$res = preg_replace('#([a-z])[a-z]+\1#', '!', $str);
?>
As a result, the following will be written to the variable:
'! ! csgd'
Example
Instead of \1
, you can write \g1
:
<?php
$res = preg_replace('#([a-z])[a-z]+\g1#', '!', $str);
?>
Example
You can also write \g{1}
:
<?php
$res = preg_replace('#([a-z])[a-z]+\g{1}#', '!', $str);
?>
Example
You can specify negative numbers in curly braces. In this case, the pockets will be counted from the end:
<?php
$res = preg_replace('#([a-z])([a-z])\g{-2}#', '!', $str);
?>
Practical Tasks
Given a string:
<?php
$str = 'aaa bbb ccc xyz';
?>
Find all substrings that contain three identical letters in a row.
Given a string:
<?php
$str = 'a aa aaa abab bbbb';
?>
Find all substrings that contain two or more identical letters in a row.
Given a string:
<?php
$str = 'aaa aaa bbb bbb ccc ddd';
?>
Find all substrings that contain two identical words in a row.