Further Regular Expression Syntax

Whilst the syntax on the previous page will enable us to create regular expressions for some quite complex situations there are other symbols which simplify the writing of regular expressions in many situations. The table below introduces some of these:

Regular Expression Meaning Example Match
bre[ae]d characters in square brackets are another way of representing OR e.g. bread, breed
[a-z] this matches any character in the range a to z (lower-case only) e.g. a, z, q, r
[a-zA-Z] this matches any character in the range a to z (lower and upper-case) e.g. a, A, q, Q
[a-zA-Z0-9] this matches any character between a-z (lower and upper-case) and the digits 0-9 e.g. a, A, q, Q, 1, 5
^hello a caret means that the element after it must be found at the start of the string e.g. , hello, hello world
world$ a dollar sign means the the preceding element must be found at the end of the string e.g. world, hello world
a{2} means match to precisely two instances of the character a e.g. aa
a{1,3} means match to a minimum of 1 and maximum of 3 instances of the character a e.g. a, aa, aaa
\.com a backslash is an escape character so that the dot/period is not taken to be a special character e.g. .com
d represents a digit i.e. 0-9 e.g. 0, 5, 7
s represents a single space

There are many more symbols which have meaning in regular expressions. If you are interesting in discovering more about them a simple Google search will bring up many web pages discussing regular expression syntax or you could try the excellent regular expression section of Dive into Python 3.

The table above and the one of the previous page introduce you to most of syntax you will need to construct regular expressions at A-Level. Attempt the task below:


Task 2

Use the above table to help you explain the following regular expression:

  1. [+-]?d+(.d+)?

Task 3 - Post codes

There are six valid post code formats in the U.K. (A represents any letter A-Z and 9 represents any number 0-9):

  1. AA9A 9AA
  2. A9A 9AA
  3. A9 9AA
  4. A99 9AA
  5. AA9 9AA
  6. AA99 9AA

Write regular expressions for each of these formats and then attempt to combine them into a single regular expression which will validate all formats. Try your best to avoid writing (x|y|z) where x, y and z are regular expressions for a particular format.


Task 4 - Telephone numbers

There are four valid telephone number formats in the U.K. for domestic landlines (x represents any number 0-9):

  1. (01xxx) xxxxxx
  2. (01x1) xxx xxxx
  3. (011x) xxx xxxx
  4. (02x) xxxx xxxx

Write regular expressions for each of these formats and then attempt to combine them into a single regular expression which will validate all formats. Try your best to avoid writing (x|y|z) where x, y and z are regular expressions for a particular format.


Task 5 - Car registrations

Eight valid car registration formats in the U.K. are (A represents any letter A-Z and 9 represents any number 0-9):

  1. AA11AAA
  2. AAA111A
  3. AAA11A
  4. AAA1A
  5. A111AAA
  6. A11AAA
  7. A1AAA
  8. AAA111

Write regular expressions for each of these formats and then attempt to combine them into a single regular expression which will validate all formats. Try your best to avoid writing (x|y|z) where x, y and z are regular expressions for a particular format.

You can find the solutions for the post code, telephone numbers and car registration tasks on our GitHub page.