Character Groups in Python Regular Expressions

There are special commands that allow you to select entire groups of characters at once. The \d command means a digit from 0 to 9. The \w command means a digit, a Latin letter, or an underscore. The \s command means a space or a whitespace character: space, line feed, tab. You can invert the meaning of a command by writing a capital letter: for example, if \d is a digit, then \D is not a digit.

Example

Let's find all the numbers:

txt = '1 12 123'
res = re.sub('\d', '!', txt)

print(res)

Result of code execution:

'! !! !!!'

Example

The repetition operators treat group commands as a single unit, meaning that grouping parentheses are not needed. In the following example, the search pattern is digit from 0 to 9 one or more times:

txt = '1 12 123 abc @@@'
res = re.sub('\d+', '!', txt)

print(res)

Result of code execution:

'! ! ! abc @@@'

Example

In the following example, the search pattern looks like this: anything one or more times but not a number from 0 to 9:

txt = '123abc3@@'
res = re.sub('\D+', '!', txt)

print(res)

Result of code execution:

'123!3!'

Example

In this example, the search pattern looks like this: space character once:

txt = '1 12 123 abc @@@'
res = re.sub('\s', '!', txt)

print(res)

Result of code execution:

'1!12!123!abc!@@@'

Example

In this example, the search template looks like this: NOT a space character one or more times. All space-separated substrings will be replaced with '!':

txt = '1 12 123 abc @@@'
res = re.sub('\S+', '!', txt)

print(res)

Result of code execution:

'! ! ! ! !'

Example

In this example, the search template looks like this: a number or letter one or more times. All substrings consisting of numbers and letters will be replaced by '!':

txt = '1 12 123a Abc @@@'
res = re.sub('\w+', '!', txt)

print(res)

Result of code execution:

'! ! ! ! @@@'

Example

In this example, the search template looks like this: NOT a number and NOT a letter one or more times. In our case, this definition includes '@@@' and all spaces (they are not numbers or letters either). Note that at the end there is one '!' - the string ' @@@' - with a space in front - was transformed into it:

txt = '1 12 123 Abc @@@'
res = re.sub('\W+', '!', txt)

print(res)

Result of code execution:

'1!12!123!Abc!'

Practical tasks

Given a string:

txt = 'a1a a2a a3a a4a a5a aba aca'

Write a regular expression that will find lines in which the letters 'a' are on the edges, and between them there is one digit.

Given a string:

txt = 'a1a a22a a333a a4444a a55555a aba aca'

Write a regular expression that will find lines in which the letters 'a' are on the edges, and any number of digits between them.

Given a string:

txt = 'aa a1a a22a a333a a4444a a55555a aba aca'

Write a regular expression that will find lines in which the letters 'a' are on the edges, and between them there is any number of digits (including zero digits, that is, the line 'aa').

Given a string:

txt = 'avb a1b a2b a3b a4b a5b abb acb'

Write a regular expression that will find lines of the following type: on the edges there are the letters 'a' and 'b', and between them there is neither a number nor a space.

Given a string:

txt = 'ave a#b a2b a$b a4b a5b a-b acb'

Write a regular expression that will find lines of the following type: on the edges there are the letters 'a' and 'b', and between them there is neither a letter, nor a number, nor a space.

Given a string:

txt = 'ave a#a a2a a$a a4a a5a a-a aca'

Write a regular expression that will replace all spaces with '!'.