Regular Expressions
regular expression - is a string of special syntax which is used to match pattern in text
- regex - is a shorthand
/ab+c/i - regex literal
new RegExp(string, flag?) - regex constructor
- string parameter is useful for creating dynamic regex based on variable(s). e.g. new RegExp(`${foo} and ${bar}`, 'g')
- it can also accept regex literal instead of string
Methods
test(string) - looks for a match and returns boolean
exec(string) - looks for a match and returns an array with values related to the first match
- returns null if no match
- array structure:
- [0] - matched substring
- [1] - first capturing group
- [n] - n capturing group
- groups
- index - index of first char in match
- input - original string
- indices - stores start and end positions of each matched capturing group. property is only present when the "d" flag is set
source - property containing regex string without wrapping slashes and flags
String Instance Methods
match(regex)
- for non-global regex: returns exec-type array
- for global regex: returns array with all matches. but without capturing groups
- returns null if no match
matchAll(regex) - returns an iterator which returns all matched groups one after another
search(regex) - returns index of the first match
- -1 - if no matches
split(regex)
replace(regex, newSubstring / callback)
- newSubstring - substring which replaces matched part(s)
- newSubstring might contain special replacement patterns. they are:
- $& - inserts the matched substring
- $` - inserts the portion of the string that goes before the matched substring
- $' - inserts the portion of the string that goes after the matched substring
- $n - inserts the nth group (1-indexed)
- $<name> - inserts named capturing group
- callback - returns new substring. in case of global regex it will be invoked multiple times, for each match
- callback parameters:
- match
- group1 - if present
- groupN
- offset - index of first matched char. e.g. if the whole string is "bar", and the matched substring is "ar", then the value of offset is 1
- string - the whole string being examined
- groups
Syntax
[xz] | one of e.g. [aeiou] - any of vowels |
[x-z] | a range e.g. [a-zA-Z] - all word characters |
[^x-z] | all except it also includes "\n" |
| |
| | logical OR e.g. foo|bar - matches either "foo" or "bar" it can also be used in pair with grouping |
. | any symbol except of "\n" |
\ | escaping e.g. \. - treat dot as dot |
Quantifiers
x{n} | repeat "n" times e.g. .{4} - repeat any symbol 4 times a.{4}e - match "a" + any 4 symbols + "e" |
x{n,m} | repeat "n" up to "m" times e.g. 0{2,4} - repeat "0" 2 to 4 times |
x{n,} | repeat "n" or more times |
| |
x{n,m}? | non-greedy (lazy) repetition. it matches as few characters as possible e.g. a.{2,4}?e - if possible matches 2 or 3 chars between "a" and "e" by default all repetitions are greedy note: "?" can affect the last matched character, but not the first one. e.g. \..+?$ - "?" will not prevent matching ".gov.ua" for example |
| |
? | optional char. is equivalent of {0,1} |
+ | one or more repetitions. is equivalent of {1,} |
* | any amount of repetitions. is equivalent of {0,} e.g. a.*e - "a" after which "e" exists |
Grouping
() | group part of finding |
(?<x>) | set a name for regex group |
| |
(?:) | cancel capturing group. can be used for meaningless groups e.g. recogni(?:s|z)e - pure logical OR use case positive lookahead & negative lookahead do not create groups |
| |
\n | backreference. matches the result of specific capturing group "n" is a digit e.g. ^(.*)\1+$ - match string containing one repetitive substring |
\k<x> | named backreference. is for named capturing groups |
Lookaheads & Lookbehinds
x(?=y) | positive lookahead. matches "x" only if followed by "y" e.g. Steve(?= Jobs) - matches "Steve" followed by " Jobs". when succeeded, " Jobs" is not a part of matched result (?!.*foo) - such variation is used to lookahead till the end of line
positive lookahead can be used to match string with several requirements e.g. ^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$ - matches password of minimum 8 characters with at least one letter and one number |
x(?!y) | negative lookahead. matches "x" only if not followed by "y" e.g. Michael(?! Scott) - matches "Michael" that is not followed by " Scott". when succeeded, " Scott" is not a part of matched result |
| |
(?<=y)x | positive lookbehind. matches "x" only if preceded by "y" e.g. (?<=Hello )\w+ - match a word preceded by "Hello" + space |
(?<!y)x | negative lookbehind. matches "x" only if not preceded by "y" negative lookbehind can be useful to match the first occurrence of substring in complex regex. i.e. (?<! yes .*) yes |
Anchors
\b | word boundary |
\B | not word boundary |
^ | beginning of the string or beginning of the line for multiline regex |
$ | end of the string or end of the line for multiline regex |
Shortenings
\d | any digit, i.e. [0-9] |
\D | not a digit, i.e. [^0-9] |
| |
\w | any word character i.e. alphanumeric and underscore |
\W | not a word character |
| |
\s | any whitespace character i.e. space, tab, line break |
\S | not a whitespace character |
Flags
i - case insensitive
g - global
m - multiline. changes behaviour of "^" and "$"
s - single line (dotall). allows a dot "." to match newline character "\n"
u - unicode. enables full unicode support
y - sticky. searching at the exact position in the text
d - when present, "exec" call result will contain "indices" property. see Methods
Examples
[aeiou]{2,} - match successive vowels
^[\w.]+(\+\w+)?@[\w.]+$ - match email. it must have @ and alphanumeric characters on both sides of it; can optionally have dots