Book on Parsing in PHP
Basics
Introduction to Parsing
PHP Limitations
Execution Time Limit
Memory Limit
Ignoring Browser Abort
Parser Placement
Preparatory Manipulations
Regex
Introduction
Parsing Strings with Line Breaks
Parsing Russian Text
Parsing Simple Tags
Parsing Tags with Attributes
Parsing Repeating Tags
Parsing Tag Blocks
Two-Stage Block Parsing
The Problem of Attribute Spaces
The Problem of Attribute Quotes
The Problem of Tag Names
Pre-Cleaning Text
Data Cleaning During Parsing
Problems of Regex Parsing
Practice on Regex Parsing
Libraries
DiDom Library
Installation
Parsing Text from a Variable
Parsing Text from URL
Text of the First Element
HTML Element Code
Inner HTML Code of an Element
CSS Selectors
Tag Attributes
Search Inside Elements
Array of Elements
Attributes for an Array of Elements
Documentation
Practice
Paths
Normalization of Absolute Paths
Normalization of Relative Paths
Normalization of Shifted Paths
Universal Path Normalization
Links to External Sites
Encodings
Methods
Function for Getting a Page
Function for Getting Links
Parsing by Links
Step-by-Step Parsing Method
Spider Method on an Array
Spider Method on a Base
Spider Method with Filtering
Parsing Based on sitemap.xml
Files
Parsing Files
Parsing Images
Parsing CSS Files
Parsing JavaScript Files
Parsing Audio Files
Parsing Video Files
Forms
Submitting Forms via GET Method
Submitting Forms via POST Method
Pitfalls When Submitting Forms
Automatic Authorization
Captcha
Automation
Logs During Parsing
Cache During Parsing
Saving on Abort
Scheduling in the Browser
Scheduling on Hosting
Bypassing Protection
Bypassing Anti-Scraping Protection
Delays During Parsing
Changing IP at Home
Changing IP During Parsing
Cookies During Parsing
HTTP Headers During Parsing
USERAGENT During Parsing
Mobile Version of the Site
Using API