Regular Expressions

Regular expressions provide a powerful method of pattern matching in text documents. They enable you to:

  • find text that matches the given pattern
  • replace text that matches the given pattern
  • validate whether text matches a given pattern

You are probably thinking to yourself that Python already provides functions to do this, and so it does:

  • find()
  • replace()

There are other functions as well. Unfortunately, these are quite limited as find() is case-sensitive and replace() will replace every occurrence of that string even if it appears as part of another word. The screenshot below demonstrates this:

Showing how find() and replace() work in Python

Regular expressions provide a way to escape from these limitations but this freedom comes at a cost - you have to learn regular expressions syntax. This can be quite daunting but thankfully once you have some understanding you can apply your knowledge everywhere as the sytanx for regular expressions is fairly standard across programming languages.

However, unless you actually need regular expressions you should stick to the built-in string functions. They are easier to understand than regular expression. How do you know when it is time for a regular expression? Well if you have calls to several string functions and a bunch of if statements to deal with specific cases then it is probably time. A regular expression is going to be much more readable than that jumble of code.

Basic syntax

The basic syntax of regular expressions is pretty straight-forward:

Regular Expression Meaning Example Match
a this is the most basic regular expression, it will match only the provided character
orange this regular expression will match the word orange
d.g a dot/period represents any character e.g. dog, dig, dag, dxg
harbou?r a question mark represents zero or one of the preceeding element e.g. harbour, harbor
a* an asterisk represents zero or more of the preceeding element e.g. , a, aa, aaa, aaaa
a+ a plus represents one or more of the preceeding element e.g. a, aa, aaa, aaaa
uk|us a pipe represents one element OR the other e.g. uk, us
ja(b|m) brackets are used to group elements together e.g. jab, jam

We can get a long way with just these simple concepts in regular expressions as we can use brackets and pipes to help construct more complex statements.

Task 1

Use the above table to help you explain the following regular expressions:

  1. colou?r
  2. the(ir|re|y're)

If you have managed to explain these successfully then you can get more practice with simple regular expressions by downloading this worksheet.