Skip to end of metadata
Go to start of metadata

 

Regular expressions are text pattern matching and manipulation technology than can be used for phone number matching and translation. The following topic provides a brief overview of regular expression syntax.

For more information on Regular Expressions, see Wikipedia or http://www.regular-expressions.info/.

Regular Expression Syntax

In regular expressions, all characters match themselves except for the following special characters:

. [ { ( ) * + ? | ^ $ \

Wildcard .

The single character '.' when used outside of a character set will match any single character.

Pattern

Description

.

Matches any single character

\.

Matches the literal period . character

Anchors ^,$

The anchor characters are used to match the beginning or end of a line.

Pattern

Description

^

Matches the start of a line, not including the first character of the line

$

Matches the end of a line, not including the last character of the line

Marked Group (expr)

A section beginning with open parenthesis ( and ending with a closed parenthesis ) acts as a Marked Group. The string that matches the group pattern is preserved for later use. Marked Groups can also be repeated, or referred to by a Back-Reference..

Pattern

Description

( )

Used to group expressions and to capture a set of characters for use in a back-reference.

\ (

Matches the open parenthesis ( character

\ )

Matches the close parenthesis ) character

Non-Marking Grouping (?: )

A Marked Group is useful to lexically group part of a regular expression, but has the side-effect of spitting out an extra field in the result. As an alternative, you can lexically group part of a regular expression, without generating a marked group by using (?: and ) , for example (?:ab)+ repeats the "ab" match phrase without splitting out a separate marked group.

Pattern

Description

(?: )

Used to group expressions without capturing them for a back-reference

Shorthand Character Classes \d, \w, \s

These expressions provide a shorthand way to describe a class of characters, for example; \d matches any numeric digit. The capital versions of these shorthand express the negative version of this character, for example; \D matches any non-digit character.

Pattern

Description

\d

Matches a numeric digit (0 to 9)

\w

Matches a word character (letters, digits, underscores)

\s

Matches a whitespace character (space, tab, line breaks)

\D

Matches a non-numeric character (no number 0 to 9)

\W

Matches a non-word character (not a letter, digit or underscore)

\S

Matches a non-whitespace character (not a space, tab or line break)

The \d shorthand is commonly used with the curly bracket repeater expression to match a specific number of digits, for example: \d{4} to match four consecutive digits.

Repeaters *,+,?,{}

The repeater characters ( *, +, ?, and {} ) enable matching of a character, expression or character class that is repeated.

Pattern

Description

*

Match the preceding character or expression zero to unlimited times.

+

Match the preceding character or expression one to unlimited times.

{n}

Match the preceding character or expression exactly n times

{n,m}

Match the preceding character or expression at least n times and at most m times

{n,}

Match the preceding character or expression at least n times and unlimited times

?

Optionally match the preceding character or expression

* Examples

The * operator matches the preceding atom zero or more times, for example the expression a*b matches the following input:

Pattern

Input

Match?

a*b

b

(tick) Yes

a*b

ab

(tick) Yes

a*b

aaaaaaaab

(tick) Yes

a*b

acb

(error) No

a*b

aaaaaaacb

(error) No

+ Examples

The + operator matches the preceding atom one or more times, for example the expression a+b matches the following input:

Pattern

Input

Match?

a+b

b

(error) No

a+b

ab

(tick) Yes

a+b

aaaaaaaab

(tick) Yes

a+b

acb

(error) No

a+b

aaaaaaacb

(error) No

? Examples

The ? operator matches the preceding atom zero or one times, for example the expression ca?b matches the following input:

Pattern

Input

Match?

ca?b

b

(error) No

ca?b

ab

(error) No

ca?b

cb

(tick) Yes

ca?b

cab

(tick) Yes

ca?b

caab

(error) No

{ } Examples

The curly bracket repeaters allow matching of a character or expression a specific number of times.

Pattern

Input

Match?

h{4}

hhhh

(tick) Yes

h{4}

hh

(error) No

h{4}

hhhhh

(error) No

Pattern

Input

Match?

h{2,5}

hhhh

(tick) Yes

h{2,5}

hh

(tick) Yes

h{2,5}

hhhhh

(tick) Yes

h{2,5}

h

(error) No

h{2,5}

hhhhhh

(error) No

Pattern

Input

Match?

h{3,}

hh

(error) No

h{3,}

hhh

(tick) Yes

h{3,}

hhhhhh

(tick) Yes

h{3,}

hhhhhhhhhh

(tick) Yes

Non-Greedy Repeats

The normal repeat operators try to match as much input as possible, and so are described as "greedy" expressions. Adding a question mark ? after a repeater symbol alters this matching behavior and makes the expression match as little input as possible while still producing a match. A regular expression altered in this way is sometimes referred to as a "lazy" expression.

Pattern

Description

*?

Matches the previous character or expression zero or more times, while consuming as little input as possible

+?

Matches the previous character or expression one or more times, while consuming as little input as possible

??

Matches the previous character or expression zero or one times, while consuming as little input as possible

{n,}?

Matches the previous character or expression n or more times, while consuming as little input as possible

{n,m}?

Matches the previous character or expression at least n and at most m times, while consuming as little input as possible

Back References \1, \2, \n

An escape character followed by a digit n, where n is in the range 1-9, matches the same string that was matched by Marked Group. Marked Groups are created with an open and close parenthesis pair ( ).

Pattern

Description

\1

Outputs the content of the first capturing marked group

\2

Outputs the content of the second capturing marked group

\n

Outputs the content of the n capturing marked group (n must be a number, not the character n)

Examples

Pattern

Input

Match?

^(a*).*\1$

aaabbaaa

(tick) Yes

^(a*).*\1$

aaabba

(error) No

Alternation |

The | operator match the characters or expressions to the left and right of the | symbol.

Pattern

Description

|

Match the character or expression on either side of the | character

Pattern

Input

Match?

abc|def

abc

(tick) Yes

abc|def

def

(tick) Yes

The alternation operator is typically used with parenthesis to separate the alternate expressions from the rest of the matching expression.

Pattern

Description

( | )

Match the character or expression on either side of the | character, contained within the parenthesis ( )

Pattern

Input

Match?

(abc|def)

abc

(tick) Yes

(abc|def)

def

(tick) Yes

(abc|def)

ab

(error) No

(abc|def)

ef

(error) No

a(bc|de)f

abcf

(tick) Yes

a(bc|de)f

adef

(tick) Yes

a(bc|de)f

abcdef

(error) No

a(bc|de)f

af

(error) No

a(bc|de)f

acdf

(error) No

Character Sets [a-z], [A-Z], [0-9]

A character set is a bracket-expression starting with [ and ending with ], it defines a set of characters, and matches any single character that is a member of that set.

Pattern

Description

[a-z]

Matches any lowercase letter in the range 'a' to 'z'

[A-Z]

Matches any UPPERCASE letter in the range 'A' to 'Z'

[a-c]

Matches any character in the range 'a' to 'c'

[abc]

Matches any of the characters 'a', 'b', or 'c'

[0-9]

Matches any number in the range '0' to '9'

[5-8]

Matches any number in the range '5' to '8'

[1234]

Matches any of the numbers '1', '2', '3' or '4'

[0-9a-zA-Z]

Matches any character in the ranges '0' to '9', 'a' to 'z' or 'A' to 'Z'

Negation ^

This expression is used within a square bracket pair to match any character that is not in the range or set of characters, for example; The regular expression [^a-c] matches any character that is not in the range a-c.

Pattern

Description

[^]

Matches any character that is not shown after the caret ^ character

Pattern

Input

Match?

[^a-z]

1

(tick) Yes

[^a-z]

9

(tick) Yes

[^a-z]

B

(tick) Yes (since 'B' is not lowercase)

[^1-9]

4

(error) No

[^1-9]

0

(tick) Yes (since '0' is not in the range 1 through 9)

[^@]

@

(error) No

[^<]

<

(error) No

[^>]

>

(error) No

Escapes \

Any special character preceded by an escape character '\' shall match itself. The special characters are as follows:

. [ { ( ) * + ? | ^ $ \
Pattern

Description

\

If the following character is a special character, ignore its special meaning and match the literal character.

\ .

Match the '.' character literally

\ [

Match the '[' character literally

\ {

Match the '{' character literally

\ (

Match the '(' character literally

\ )

Match the ')' character literally

\ *

Match the '*' character literally

\ +

Match the '+' character literally

\ ?Match the '?' character literally
\ |

Match the '|' character literally

\ ^

Match the '^' character literally

\ $

Match the '$' character literally

\ \

Match the '\' character literally