Two-Step Block Parsing with Regex in PHP
When working with regex, you should not try to solve a complex task with a single regex. It is better to apply several regexes sequentially.
Let's look at an example. Suppose we have the following code:
<p>
---
</p>
<main class="header">
<p>
+++
</p>
<p>
+++
</p>
</main>
Suppose we need to parse all paragraphs
from the main tag. Let's do this in two
stages: first, get the content
of the main tag, and then inside this
content, we will search for paragraphs.
So, the first stage. Let the text of the entire page
be stored in the variable $str1.
Let's get the content of the main tag:
<?php
preg_match('#<main[^>]*>(.+?)</main>#su', $str1, $match1);
?>
Let's check that we caught the correct text:
<?php
$str2 = $match1[1];
var_dump($str2);
?>
Now, in the obtained text, let's find all paragraphs:
<?php
preg_match_all('#<p[^>]*>(.+?)</p>#su', $str2, $match2, PREG_PATTERN_ORDER);
?>
Let's check that we found the texts of our paragraphs:
<?php
var_dump($match2[1]);
?>
Parse all h2 tags from the aside tag:
<main>
<h2>---</h2>
</main>
<aside>
<h2>+++</h2>
<p>
text
</p>
<h2>+++</h2>
<p>
text
</p>
<h2>+++</h2>
<p>
text
</p>
</aside>