Features of Cyrillic in Python Regex
A major advantage of regex in Python
is that Cyrillic characters
are included in the \w group. Let's replace
all Latin and Cyrillic letters in
the following string:
txt = 'x232x 456 xax xтекстx'
res = re.sub('x[\w]*x', '!', txt)
print(res)
Code execution result:
'! 456 ! !'
Cyrillic letters can also be searched for using
a group in square brackets: [а-я].
But there is a problem with it - the letter
'ё' will not be included here. To include it, you need
to add this letter to the group:
res = re.sub('x[а-яё]*x', '!', txt)
print(res)
Code execution result:
'! 456 xax !'
Given a string:
txt = 'wйw wяw wёw wqw'
Write a regex that will find strings
matching the pattern: the edges have the letters
'w', and between them is a Cyrillic letter.
Given a string:
txt = 'ааа ббб ёёё ззз ййй ААА БББ ЁЁЁ ЗЗЗ ЙЙЙ'
Write a regex that will find all words matching the pattern: any Cyrillic letter any number of times.