Regular Expressions

regular expression - is a string of special syntax which is used to match pattern in text

  • regex - is a shorthand


/ab+c/i - regex literal

new RegExp(string, flag?) - regex constructor

  • string parameter is useful for creating dynamic regex based on variable(s). e.g. new RegExp(`${foo} and ${bar}`, 'g')
  • it can also accept regex literal instead of string

Methods

test(string) - looks for a match and returns boolean

exec(string) - looks for a match and returns an array with values related to the first match

  • returns null if no match
  • array structure:
  • [0] - matched substring
  • [1] - first capturing group
  • [n] - n capturing group
  • groups
  • index - index of first char in match
  • input - original string
  • indices - stores start and end positions of each matched capturing group. property is only present when the "d" flag is set


source - property containing regex string without wrapping slashes and flags

String Instance Methods

match(regex)

  • for non-global regex: returns exec-type array
  • for global regex: returns array with all matches. but without capturing groups
  • returns null if no match


matchAll(regex) - returns an iterator which returns all matched groups one after another

search(regex) - returns index of the first match

  • -1 - if no matches


split(regex)

replace(regex, newSubstring / callback)

  • newSubstring - substring which replaces matched part(s)
  • newSubstring might contain special replacement patterns. they are:
  • $& - inserts the matched substring
  • $` - inserts the portion of the string that goes before the matched substring
  • $' - inserts the portion of the string that goes after the matched substring
  • $n - inserts the nth group (1-indexed)
  • $<name> - inserts named capturing group
  • callback - returns new substring. in case of global regex it will be invoked multiple times, for each match
  • callback parameters:
  • match
  • group1 - if present
  • groupN
  • offset - index of first matched char. e.g. if the whole string is "bar", and the matched substring is "ar", then the value of offset is 1
  • string - the whole string being examined
  • groups

Syntax

[xz]

one of

e.g. [aeiou] - any of vowels

[x-z]

a range

e.g. [a-zA-Z] - all word characters

[^x-z]

all except

it also includes "\n"

 

|

logical OR

e.g. foo|bar - matches either "foo" or "bar"

it can also be used in pair with grouping

.

any symbol

except of "\n"

\

escaping

e.g. \. - treat dot as dot

Quantifiers

x{n}

repeat "n" times

e.g.

.{4} - repeat any symbol 4 times

a.{4}e - match "a" + any 4 symbols + "e"

x{n,m}

repeat "n" up to "m" times

e.g. 0{2,4} - repeat "0" 2 to 4 times

x{n,}

repeat "n" or more times

 

x{n,m}?

non-greedy (lazy) repetition. it matches as few characters as possible

e.g. a.{2,4}?e - if possible matches 2 or 3 chars between "a" and "e"

by default all repetitions are greedy

note: "?" can affect the last matched character, but not the first one. e.g. \..+?$ - "?" will not prevent matching ".gov.ua" for example

 

?

optional char. is equivalent of {0,1}

+

one or more repetitions. is equivalent of {1,}

*

any amount of repetitions. is equivalent of {0,}

e.g. a.*e - "a" after which "e" exists

Grouping

()

group part of finding

(?<x>)

set a name for regex group

 

(?:)

cancel capturing group. can be used for meaningless groups

e.g. recogni(?:s|z)e - pure logical OR use case

positive lookahead & negative lookahead do not create groups

 

\n

backreference. matches the result of specific capturing group

"n" is a digit

e.g. ^(.*)\1+$ - match string containing one repetitive substring

\k<x>

named backreference. is for named capturing groups

Lookaheads & Lookbehinds

x(?=y)

positive lookahead. matches "x" only if followed by "y"

e.g. Steve(?= Jobs) - matches "Steve" followed by " Jobs". when succeeded, " Jobs" is not a part of matched result

(?!.*foo) - such variation is used to lookahead till the end of line

 

positive lookahead can be used to match string with several requirements

e.g. ^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$ - matches password of minimum 8 characters with at least one letter and one number

x(?!y)

negative lookahead. matches "x" only if not followed by "y"

e.g. Michael(?! Scott) - matches "Michael" that is not followed by " Scott". when succeeded, " Scott" is not a part of matched result

 

(?<=y)x

positive lookbehind. matches "x" only if preceded by "y"

e.g. (?<=Hello )\w+ - match a word preceded by "Hello" + space

(?<!y)x

negative lookbehind. matches "x" only if not preceded by "y"

negative lookbehind can be useful to match the first occurrence of substring in complex regex. i.e. (?<! yes .*) yes

Anchors

\b

word boundary

\B

not word boundary

^

beginning of the string

or beginning of the line for multiline regex

$

end of the string

or end of the line for multiline regex

Shortenings

\d

any digit, i.e. [0-9]

\D

not a digit, i.e. [^0-9]

 

\w

any word character

i.e. alphanumeric and underscore

\W

not a word character

 

\s

any whitespace character

i.e. space, tab, line break

\S

not a whitespace character

Flags

i - case insensitive
g - global
m - multiline. changes behaviour of "^" and "$"
s - single line (dotall). allows a dot "." to match newline character "\n"
u - unicode. enables full unicode support
y - sticky. searching at the exact position in the text
d - when present, "exec" call result will contain "indices" property. see Methods

Examples

[aeiou]{2,} - match successive vowels

^[\w.]+(\+\w+)?@[\w.]+$ - match email. it must have @ and alphanumeric characters on both sides of it; can optionally have dots