Regular Expressions

regular expression - is a string of special syntax which is used to match pattern in text

regex - is a shorthand

/ab+c/i - regex literal

new RegExp(string, flag?) - regex constructor

string parameter is useful for creating dynamic regex based on variable(s). e.g. new RegExp(`${foo} and ${bar}`, 'g')
it can also accept regex literal instead of string

Methods

test(string) - looks for a match and returns boolean

exec(string) - looks for a match and returns an array with values related to the first match

returns null if no match
array structure:
[0] - matched substring
[1] - first capturing group
[n] - n capturing group
groups
index - index of first char in match
input - original string
indices - stores start and end positions of each matched capturing group. property is only present when the "d" flag is set

source - property containing regex string without wrapping slashes and flags

String Instance Methods

match(regex)

for non-global regex: returns exec-type array
for global regex: returns array with all matches. but without capturing groups
returns null if no match

matchAll(regex) - returns an iterator which returns all matched groups one after another

search(regex) - returns index of the first match

-1 - if no matches

split(regex)

replace(regex, newSubstring / callback)

newSubstring - substring which replaces matched part(s)
newSubstring might contain special replacement patterns. they are:
$& - inserts the matched substring
$` - inserts the portion of the string that goes before the matched substring
$' - inserts the portion of the string that goes after the matched substring
$n - inserts the nth group (1-indexed)
$<name> - inserts named capturing group
callback - returns new substring. in case of global regex it will be invoked multiple times, for each match
callback parameters:
match
group1 - if present
groupN
offset - index of first matched char. e.g. if the whole string is "bar", and the matched substring is "ar", then the value of offset is 1
string - the whole string being examined
groups

Syntax

[xz]	one of e.g. [aeiou] - any of vowels
[x-z]	a range e.g. [a-zA-Z] - all word characters
[^x-z]	all except it also includes "\n"

\|	logical OR e.g. foo\|bar - matches either "foo" or "bar" it can also be used in pair with grouping
.	any symbol except of "\n"
\	escaping e.g. \. - treat dot as dot

Quantifiers

x{n}	repeat "n" times e.g. .{4} - repeat any symbol 4 times a.{4}e - match "a" + any 4 symbols + "e"
x{n,m}	repeat "n" up to "m" times e.g. 0{2,4} - repeat "0" 2 to 4 times
x{n,}	repeat "n" or more times

x{n,m}?	non-greedy (lazy) repetition. it matches as few characters as possible e.g. a.{2,4}?e - if possible matches 2 or 3 chars between "a" and "e" by default all repetitions are greedy note: "?" can affect the last matched character, but not the first one. e.g. \..+?$ - "?" will not prevent matching ".gov.ua" for example

?	optional char. is equivalent of {0,1}
+	one or more repetitions. is equivalent of {1,}
*	any amount of repetitions. is equivalent of {0,} e.g. a.*e - "a" after which "e" exists

Grouping

()	group part of finding
(?<x>)	set a name for regex group

(?:)	cancel capturing group. can be used for meaningless groups e.g. recogni(?:s\|z)e - pure logical OR use case positive lookahead & negative lookahead do not create groups

\n	backreference. matches the result of specific capturing group "n" is a digit e.g. ^(.*)\1+$ - match string containing one repetitive substring
\k<x>	named backreference. is for named capturing groups

Lookaheads & Lookbehinds

x(?=y)	positive lookahead. matches "x" only if followed by "y" e.g. Steve(?= Jobs) - matches "Steve" followed by " Jobs". when succeeded, " Jobs" is not a part of matched result (?!.foo) - such variation is used to lookahead till the end of line* positive lookahead can be used to match string with several requirements e.g. ^(?=.[A-Za-z])(?=.\d)[A-Za-z\d]{8,}$ - matches password of minimum 8 characters with at least one letter and one number
x(?!y)	negative lookahead. matches "x" only if not followed by "y" e.g. Michael(?! Scott) - matches "Michael" that is not followed by " Scott". when succeeded, " Scott" is not a part of matched result

(?<=y)x	positive lookbehind. matches "x" only if preceded by "y" e.g. (?<=Hello )\w+ - match a word preceded by "Hello" + space
(?<!y)x	negative lookbehind. matches "x" only if not preceded by "y" negative lookbehind can be useful to match the first occurrence of substring in complex regex. i.e. (?<! yes .*) yes

Anchors

\b	word boundary
\B	not word boundary
^	beginning of the string or beginning of the line for multiline regex
$	end of the string or end of the line for multiline regex

Shortenings

\d	any digit, i.e. [0-9]
\D	not a digit, i.e. [^0-9]

\w	any word character i.e. alphanumeric and underscore
\W	not a word character

\s	any whitespace character i.e. space, tab, line break
\S	not a whitespace character

Flags

i - case insensitive
g - global
m - multiline. changes behaviour of "^" and "$"
s - single line (dotall). allows a dot "." to match newline character "\n"
u - unicode. enables full unicode support
y - sticky. searching at the exact position in the text
d - when present, "exec" call result will contain "indices" property. see Methods

Examples

[aeiou]{2,} - match successive vowels

^[\w.]+(\+\w+)?@[\w.]+$ - match email. it must have @ and alphanumeric characters on both sides of it; can optionally have dots