Hyphen inside sets in Python regular expressions
The hyphen is also a special character inside [ ] (but not outside). If you need the hyphen itself as a symbol, then put it where it is will not be taken as a group separator.
Why this matters: You can make a group of characters without even realizing it. For example, like this - '[:-@]' - you think you're selecting a colon, a hyphen, and a at, but you're actually selecting a group of characters between : and @. The characters in this group are: ? < = > :
Where did they come from? From the ASCII table - the colon has a number lower than the dog - and you get a group. That is, all groups are obtained from the ASCII table (you can use this if you want).
How to deal with this: Place the hyphen where it will definitely not be perceived as a group character, such as at the beginning or end (i.e. after [ or before ]).
You can also escape the hyphen - then it will denote itself regardless of the position. For example, instead of [:-@] write [:\-@] - and there will no longer be a group, but three symbols - a colon, a hyphen and an at @.
Example
In the following example, the search pattern is: digit 1, then a letter from 'a' to 'z', then a digit 2:
txt = '1a2 1-2 1c2 1z2'
res = re.sub('1[a-z]2', '!', txt)
print(res)
Result of code execution:
'! 1-2 ! !'
Example
Let's now escape the hyphen. The resulting search pattern is: the number 1, then the letter 'a', or a hyphen, or the letter 'z', then the number 2:
txt = '1a2 1-2 1c2 1z2'
res = re.sub('1[a\-z]2', '!', txt)
print(res)
Result of code execution:
'! ! 1c2 !'
Example
You can simply move the hyphen without escaping it:
txt = '1a2 1-2 1c2 1z2'
res = re.sub('1[az-]2', '!', txt)
print(res)
Result of code execution:
'! ! 1c2 !'
Example
In the following example, the search pattern is:
the first character is small letters or a hyphen '-', then two letters 'x':
txt = 'axx Axx -xx @xx'
res = re.sub('[a-z-]xx', '!', txt)
print(res)
Result of code execution:
'! Axx ! @xx'
Example
In the following example, the search pattern is:
the first character is small, capital letters or a hyphen '-', then two letters 'x':
txt = 'axx Axx -xx @xx'
res = re.sub('[a-zA-Z-]xx', '!', txt)
print(res)
Result of code execution:
'! ! ! @xx'
Example
You can place a hyphen between two groups - it will definitely not create another group there:
txt = 'axx 9xx -xx @xx'
res = re.sub('[a-z-0-9]xx', '!', txt)
print(res)
Result of code execution:
'! ! ! @xx'
Practical tasks
Given a string:
txt = 'xaz xBz xcz x-z x@z'
Find all lines with the following pattern: letter 'x', uppercase or lowercase letter or hyphen, letter 'z'.
Given a string:
txt = 'xaz x$z x-z xcz x+z x%z x*z'
Find all lines with the following pattern: letter 'x', then either a dollar, or a hyphen, or a plus, then letter 'z'.