Regular Expressions
regular expression - is a string of special syntax which is used to match pattern in text
- regex - is a shorthand
/ab+c/i - regex literal
new RegExp(string, flag?) - regex constructor
- string parameter is useful for creating dynamic regex based on variable(s). e.g. new RegExp(`${foo} and ${bar}`, 'g')
- it can also accept regex literal instead of string
Methods
test(string) - looks for a match and returns boolean
exec(string) - looks for a match and returns an array with values related to the first match
- returns null if no match
- array structure:
- [0] - matched substring
- [1] - first capturing group
- [n] - n capturing group
- groups
- index - index of first char in match
- input - original string
- indices - stores start and end positions of each matched capturing group. property is only present when the "d" flag is set
String Instance Methods
match(regex)
- for non-global regex: returns exec-type array
- for global regex: returns array with all matches. but without capturing groups
- returns null if no match
matchAll(regex) - returns an iterator which returns all matched groups one after another
search(regex) - returns index of the first match
- -1 - if no matches
split(regex)
replace(regex, newSubstring / callback)
- newSubstring - substring which replaces matched part(s)
- newSubstring might contain special replacement patterns. they are:
- $& - inserts the matched substring
- $` - inserts the portion of the string that goes before the matched substring
- $' - inserts the portion of the string that goes after the matched substring
- $n - inserts the nth group (1-indexed)
- $<name> - inserts named capturing group
- callback - returns new substring. in case of global regex it will be invoked multiple times, for each match
- callback parameters:
- match
- group1 - if present
- groupN
- offset - index of first matched char. e.g. if the whole string is "bar", and the matched substring is "ar", then the value of offset is 1
- string - the whole string being examined
- groups
Flags
i - case insensitive
g - global
m - multiline. changes behaviour of "^" and "$"
s - single line (dotall). allows a dot "." to match newline character "\n"
u - unicode. enables full unicode support
y - sticky. searching at the exact position in the text
d - when present, "exec" call result will contain "indices" property. see Methods
Syntax
[xz] - one of
- e.g. [aeiou] - any of vowels
[x-z] - a range
- e.g. [a-zA-Z] - all word characters
[^x-z] - all except
- it also includes "\n"
| - logical OR. might be used with grouping
. - any symbol except of "\n"
\ - escaping
- e.g. \. - treat dot as dot
x{n} - repeat n times
- e.g.
- .{4} - repeat any symbol 4 times
- a.{4}e - match "a" + any 4 symbols + "e"
x{min,max}
- e.g. x{2,4} - repeat "x" 2 to 4 times
x{min,}
- e.g. x{2,} - repeat "x" 2 or more times
x{,max}
- e.g. x{,4} - repeat "x" 4 or less times
x{min,max}? - non-greedy (lazy) repetition. it matches as few characters as possible
- e.g. a.{2,4}?e - if possible matches 2 or 3 chars between "a" and "e"
- by default all repetitions are greedy
- note: "?" can affect the last matched character, but not the first one. e.g. \..+?$ - "?" will not prevent matching ".gov.ua" for example
? - optional char. is equivalent of {0,1}
+ - one or more repetitions. is equivalent of {1,}
* - any amount of repetitions. is equivalent of {0,}
- e.g. a.*e - "a" after which "e" exists
\b - word boundary
\B - not word boundary
^ - beginning of the string. or beginning of the line for multiline regex
$ - end of the string. or end of the line for multiline regex
() - group part of finding
- (?<name>foo) - set a name for regex group
(?:) - cancel capturing group
- is helpful when grouping is used just for logical OR purposes. e.g. foo(x|y)bar
- positive lookahead & negative lookahead do not create groups
(?=x) - positive lookahead
- e.g. Steve(?= Jobs) - matches "Steve" followed by " Jobs". when succeeded, only "Steve" is matched
- lookahead string is not a part of matched substring
- positive lookahead can be used to match string with several requirements
- e.g. ^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$ - matches password of minimum 8 characters with at least one letter and one number
(?!y) - negative lookahead
- e.g. Michael(?! Scott) - matches "Michael" that is not followed by " Scott". when succeeded, only "Michael" is matched
(?<=x) - positive lookbehind
- e.g. (?<=Hello )\w+ - match a word preceded by "Hello" + space
(?<!y) - negative lookbehind
- negative lookbehind can be useful to match the first occurrence of substring in complex regex. i.e. (?<! yes .*) yes
\n - backreference. matches the result of specific capturing group. n - is a digit
- e.g. ^(.*)\1+$ - match string containing one repetitive substring
\k<name> - named backreference. same, but references named capturing group
Shortenings
\d - any digit, i.e. [0-9]
\D - not a digit, i.e. [^0-9]
\w - any word char, i.e. alphanumeric and underscore
\W - not a word
\s - any whitespace char, i.e. space, tab, line break
\S - not a whitespace
Examples
[aeiou]{2,} - match successive vowels
^[\w.]+(\+\w+)?@[\w.]+$ - match email. it must have @ and alphanumeric characters on both sides of it; can optionally have dots