Flags for Python Regular Expression Strings
To specify flags as one of the optional parameters for methods that work with regular expressions, use the following syntax:
flags=re.flag name
Basic flags for working with regular expressions
Flag | Purpose |
---|---|
re.IGNORECASE |
Ignoring the case of characters. |
re.DOTALL |
A period represents any character, including a line break. |
re.I |
Makes the search case insensitive. |
re.L |
Searches for words according to the current language. This interpretation affects alphabetic grouping (\w and \W ) as well as word boundary behavior (\b and \B ).
|
re.M |
The $ symbol searches at the end of any line of text (not just the end of the text) and the ^ symbol searches at the beginning of any line of text (not just the beginning of the text).
|
re.S |
Changes the value of the period (.) to match any character, including a newline.
|
re.U |
Interprets letters according to the Unicode character set. This flag affects the behavior of \w , \W , \b , \B . In Python, 3 + This flag is set by default.
|
re.X |
Allows multi-line regular expression syntax. It ignores whitespace within a pattern (except whitespace within a set of [] or when escaped with a backslash) and treats unescaped '#' as a comment.
|
Example
The re.IGNORECASE
flag allows you to ignore the case of characters. Let's see how this is done. In this example, the regular expression will only match lowercase letters:
txt = 'aaa bbb CCC DDD'
res = re.sub('[a-z]+', '!', txt)
print(res)
Result of code execution:
'! ! CCC DDD'
Example
Now let's add the re.IGNORECASE
flag to the fourth parameter of the method and the regular expression will start searching for characters in all registers:
txt = 'aaa AAA bbb BBB'
res = re.sub('[a-z]+', '!', txt, flags=re.IGNORECASE)
print(res)
Result of code execution:
'! ! ! !'
Example
Let's find all line breaks using a regular expression:
txt = '''aaa
bbb'''
res = re.sub('\n', '!', txt)
print(res)
The result of the executed code:
'aaa!bbb'
Example
But, if you need to replace all the characters, then putting a dot in the regular expression will not capture line breaks:
txt = '''aaa
bbb'''
res = re.sub('.', '!', txt)
print(res)
The result of the executed code:
'!!!
!!!'
Example
To fix this error, you should use the re.DOTALL
flag:
res = re.sub('.', '!', txt, flags=re.DOTALL)
print(res)
The result of the executed code:
'!!!!!!!'
Example
You can also pass several flags in the parameter by placing the operator +
between them. Let's first replace the letter 'a'
at the end of the line:
txt = '''
aaa
AAA
aaa'''
res = re.sub('aaa$', '!', txt)
print(res)
The result of the executed code:
'''
aaa
AAA
!
'''
Example
Now let's set the flag re.M
:
res = re.sub('aaa$', '!', txt, flags=re.M)
print(res)
The result of the executed code:
'''
!
AAA
!
'''
Example
Let's now also apply the flag to ignore case:
res = re.sub('aaa$', '!', txt, flags=re.M+re.IGNORECASE)
print(res)
The result of the executed code:
'''
!
!
!
'''