⊗pyPmRELAB 71 of 128 menu

Positive and Negative Lookahead in Python Regular Expressions

Sometimes you need to solve a problem of this type: find the string 'aaa' and replace it with '!', but only if after 'aaa' there is 'x', and 'x' itself should not be replaced. If we try to solve the problem 'head-on', we will fail:

txt = 'aaax baaa' res = re.sub('aaax', '!', txt) print(res) # '! baaa', and they wanted '!x baaa'

Look ahead

To solve the problem, we need a way to say that 'x' should not be replaced. This is done with special brackets (?= ), which simply look, but do not take with them.

These brackets are called positive lookahead. Positive - since 'x' (in our case) must be - only then will the substitution occur.

Let's apply these brackets to solve our problem:

txt = 'aaax baaa' res = re.sub('aaa(?=x)', '!', txt) print(res) # '!x aaab

There is also negative lookahead - (?! ) - it, on the contrary, says that something should not be. In the following example, the replacement will occur only if after 'aaa' there is NOT 'x':

txt = 'aaax aaab' res = re.sub('aaa(?!x)', '!', txt) print(res) # 'aaax !b'

View back

Similarly, there is positive lookbehind - (?<= ). In the following example, the substitution will only occur if 'aaa' is preceded by 'x':

txt = 'xaaa' res = re.sub('(?<=x)aaa', '!', txt) print(res) # 'x!'

And there is also negative lookbehind - (?<! ). In the following example, the substitution will only occur if 'aaa' is not preceded by 'x':

txt = 'baaa' res = re.sub('(?<!x)aaa', '!', txt) print(res) # 'b!'

Practical tasks

Given a string containing function names:

txt = 'func1() func2() func3()'

Get an array of function names from a string.

Given a line with a tag:

txt = '<a href="" class="eee" id="zzz">'

Get an array of attribute names for this tag.

Given a line with variables:

txt = '$aaa $bbb $ccc xxxx'

Get substrings that are preceded by a dollar sign.

enru