Regular Expressions

regular expression - is a string of special syntax which is used to match pattern in text

  • regex - is a shorthand


/ab+c/i - regex literal

new RegExp(string, flag?) - regex constructor

  • string parameter is useful for creating dynamic regex based on variable(s). e.g. new RegExp(`${foo} and ${bar}`, 'g')
  • it can also accept regex literal instead of string

Methods

test(string) - looks for a match and returns boolean

exec(string) - looks for a match and returns an array with values related to the first match

  • returns null if no match
  • array structure:
  • [0] - matched substring
  • [1] - first capturing group
  • [n] - n capturing group
  • groups
  • index - index of first char in match
  • input - original string
  • indices - stores start and end positions of each matched capturing group. property is only present when the "d" flag is set

String Instance Methods

match(regex)

  • for non-global regex: returns exec-type array
  • for global regex: returns array with all matches. but without capturing groups
  • returns null if no match


matchAll(regex) - returns an iterator which returns all matched groups one after another

search(regex) - returns index of the first match

  • -1 - if no matches


split(regex)

replace(regex, newSubstring / callback)

  • newSubstring - substring which replaces matched part(s)
  • newSubstring might contain special replacement patterns. they are:
  • $& - inserts the matched substring
  • $` - inserts the portion of the string that goes before the matched substring
  • $' - inserts the portion of the string that goes after the matched substring
  • $n - inserts the nth group (1-indexed)
  • $<name> - inserts named capturing group
  • callback - returns new substring. in case of global regex it will be invoked multiple times, for each match
  • callback parameters:
  • match
  • group1 - if present
  • groupN
  • offset - index of first matched char. e.g. if the whole string is "bar", and the matched substring is "ar", then the value of offset is 1
  • string - the whole string being examined
  • groups

Flags

i - case insensitive
g - global
m - multiline. changes behaviour of "^" and "$"
s - single line (dotall). allows a dot "." to match newline character "\n"
u - unicode. enables full unicode support
y - sticky. searching at the exact position in the text
d - when present, "exec" call result will contain "indices" property. see Methods

Syntax

[xz] - one of

  • e.g. [aeiou] - any of vowels

[x-z] - a range

  • e.g. [a-zA-Z] - all word characters

[^x-z] - all except

  • it also includes "\n"


| - logical OR. might be used with grouping
. - any symbol except of "\n"
\ - escaping

  • e.g. \. - treat dot as dot


x{n} - repeat n times

  • e.g.
  • .{4} - repeat any symbol 4 times
  • a.{4}e - match "a" + any 4 symbols + "e"

x{min,max}

  • e.g. x{2,4} - repeat "x" 2 to 4 times

x{min,}

  • e.g. x{2,} - repeat "x" 2 or more times

x{,max}

  • e.g. x{,4} - repeat "x" 4 or less times

x{min,max}? - non-greedy (lazy) repetition. it matches as few characters as possible

  • e.g. a.{2,4}?e - if possible matches 2 or 3 chars between "a" and "e"
  • by default all repetitions are greedy
  • note: "?" can affect the last matched character, but not the first one. e.g. \..+?$ - "?" will not prevent matching ".gov.ua" for example


? - optional char. is equivalent of {0,1}
+ - one or more repetitions. is equivalent of {1,}
* - any amount of repetitions. is equivalent of {0,}

  • e.g. a.*e - "a" after which "e" exists


\b - word boundary
\B - not word boundary
^ - beginning of the string. or beginning of the line for multiline regex
$ - end of the string. or end of the line for multiline regex

() - group part of finding

  • (?<name>foo) - set a name for regex group

(?:) - cancel capturing group

  • is helpful when grouping is used just for logical OR purposes. e.g. foo(x|y)bar
  • positive lookahead & negative lookahead do not create groups


(?=x) - positive lookahead

  • e.g. Steve(?= Jobs) - matches "Steve" followed by " Jobs". when succeeded, only "Steve" is matched
  • lookahead string is not a part of matched substring
  • positive lookahead can be used to match string with several requirements
  • e.g. ^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$ - matches password of minimum 8 characters with at least one letter and one number


(?!y) - negative lookahead

  • e.g. Michael(?! Scott) - matches "Michael" that is not followed by " Scott". when succeeded, only "Michael" is matched


(?<=x) - positive lookbehind

  • e.g. (?<=Hello )\w+ - match a word preceded by "Hello" + space


(?<!y) - negative lookbehind

  • negative lookbehind can be useful to match the first occurrence of substring in complex regex. i.e. (?<! yes .*) yes


\n - backreference. matches the result of specific capturing group. n - is a digit

  • e.g. ^(.*)\1+$ - match string containing one repetitive substring

\k<name> - named backreference. same, but references named capturing group

Shortenings

\d - any digit, i.e. [0-9]
\D - not a digit, i.e. [^0-9]

\w - any word char, i.e. alphanumeric and underscore
\W - not a word

\s - any whitespace char, i.e. space, tab, line break
\S - not a whitespace

Examples

[aeiou]{2,} - match successive vowels

^[\w.]+(\+\w+)?@[\w.]+$ - match email. it must have @ and alphanumeric characters on both sides of it; can optionally have dots

;