⊗ppPsBsCn 2 of 84 menu

PHP Limitations in Parsing

PHP is not the best language for parsing. This is due to a fundamental feature of PHP scripts. They are designed to start, execute quickly, and then terminate.

Typically, PHP scripts run for no more than a couple of seconds. And in the PHP settings, there is even a limit set that forcibly terminates a script if it runs for more than 30-60 seconds. Parsing, however, usually requires much more time - from several minutes to hours and days.

Furthermore, the amount of RAM a script can occupy is measured in megabytes. If a script attempts to use more, it will be forcibly terminated. It is easy to exceed this limit when running a parser, if, for example, you try to create an array consisting of the text from hundreds of HTML pages.

In long-running PHP scripts, memory leaks begin to occur, where the script gradually starts to occupy more and more space in RAM over time. Eventually, they reach such a magnitude that the script is forcibly terminated.

Due to these limitations, programmers have to resort to various tricks to make PHP scripts work in a way they were not designed for.

Unlike PHP, scripts written in Python or NodeJS do not terminate; instead, they represent a process constantly loaded into memory.

Why do people choose PHP for parsing then? The fact is that most websites run on PHP, and usually, a parser working on PHP is also added to such a site, especially since the programmer often knows PHP and does not want to learn something else just for the sake of a parser.

In the following lessons, we will consider approaches that allow us to bypass these limitations.

byenru