Positive and Negative Lookahead in Python Regular Expressions
Sometimes you need to solve a problem of this type: find the string 'aaa'
and replace it with '!'
, but only if after 'aaa'
there is 'x'
, and 'x'
itself should not be replaced. If we try to solve the problem 'head-on'
, we will fail:
txt = 'aaax baaa'
res = re.sub('aaax', '!', txt)
print(res) # '! baaa', and they wanted '!x baaa'
Look ahead
To solve the problem, we need a way to say that 'x'
should not be replaced. This is done with special brackets (?= )
, which simply look, but do not take with them.
These brackets are called positive lookahead. Positive - since 'x'
(in our case) must be - only then will the substitution occur.
Let's apply these brackets to solve our problem:
txt = 'aaax baaa'
res = re.sub('aaa(?=x)', '!', txt)
print(res) # '!x aaab
There is also negative lookahead - (?! )
- it, on the contrary, says that something should not be. In the following example, the replacement will occur only if after 'aaa'
there is NOT 'x'
:
txt = 'aaax aaab'
res = re.sub('aaa(?!x)', '!', txt)
print(res) # 'aaax !b'
View back
Similarly, there is positive lookbehind - (?<= )
. In the following example, the substitution will only occur if 'aaa'
is preceded by 'x'
:
txt = 'xaaa'
res = re.sub('(?<=x)aaa', '!', txt)
print(res) # 'x!'
And there is also negative lookbehind - (?<! )
. In the following example, the substitution will only occur if 'aaa'
is not preceded by 'x'
:
txt = 'baaa'
res = re.sub('(?<!x)aaa', '!', txt)
print(res) # 'b!'
Practical tasks
Given a string containing function names:
txt = 'func1() func2() func3()'
Get an array of function names from a string.
Given a line with a tag:
txt = '<a href="" class="eee" id="zzz">'
Get an array of attribute names for this tag.
Given a line with variables:
txt = '$aaa $bbb $ccc xxxx'
Get substrings that are preceded by a dollar sign.