Photo by Blake Connally on Unsplash
Regular expressions for JS developers
JS developers need to know regular expressions
I've lived a long life before I knew regular expression. Although it's not on the list of what I should know, without it, I feel a lack of something. So many times, when I look at the code base with hell-long regular expressions, my instinct is to ignore it but I know that good FE developers should know this trick so in this blog post, you and I will walk through it and master it together
Let's have a list of the main benefits of regular expression
Manipulating strings of HTML nodes
Locating partial selectors within a CSS selector expression
Determining whether an element has a specific class name
Input validation
And more
Why
Let's say we want to validate that a string, perhaps entered into a form by users, follows the format of a nine-digit US postal code. The rule is:
88888-8888
Each 9 represents a decimal digital, and the format is 5 decimal digits, following a hyphen, followed by 4 decimal digits
function isThatAZipCode(candidate) {
if (typeof candidate !== string || candidate.length != 10)
{ return false; }
for (let n = 0 ; n < candiate.length ; n++) {
let c = candidate[n];
switch(n) {
case 0 : case 1 : case 2 : case 3 : case 4
case 6 : case 7 : case 8 : case 9
if( c < '0' || c > '9') return false;
break;
case 5 :
if( c! == "-") return false;
break;
}
}
return true;
}
function isThisAZipCode(candidate) {
return /^\d{5}-\d{4}$/.test(candidate);
}
Regular expressions in JS
A regular expression is a type of object. It can be either constructed with the RegExp constructor or written as a literal value.
Via a regular expression literal, /test/ (forward slash)
💡/something/ is a syntax to create a regular expressionBy constructing an instance of RexExp, new RegExp("test");
Regular expression's methods
Exec
Exec and match are similar but match are for string and exec for regular expression object
Exec in simple form
let match = /\d+/.exec("one two 100 200");
console.log(match);
// ["100"]
console.log(match.index);
// 8
Exec with group parentheses
let quotedText = /'([^']*)'/;
quotedText.exec("she said 'Hello'");
console.log(quoteText);
// ["'hello'","hello"];
quoteText = /bad(ly)?/;
quotedText.exec("bad");
console.log(quoteText);
//["bad",undefined];
Groups can be useful for extracting parts of a string
let input = "A string with 3 numbers in it... 42 and 88.";
let number = /\b\d+\b/g;
let match;
while ( match = number.exec(input)) {
console.log("Found", match[0] , "at" , match.index);
}
// → Found 3 at 14
// Found 42 at 33
// Found 88 at 40
Terms and operators
Exact matching
Any character that's not a special character or operator must appear literally in the expression. For example, in our /test/ regex, that must match exactly the test keyword
A single match from a class of characters
A finite set of characters (one of)
[abc] means we want to match either a,b or c in a single character
A finite set of multiple characters (one of) plus quantities
Examples of valid matches:
a
b
ab
ba
aaa
bbb
abababab
, etc.
Let's try
[ab]{2}
So, there are 4 possible strings that match [ab]{2}
:
aa
ab
ba
bb
Anything but a finite set of characters
[^abc] means we want to match anything but either a, b or c
From a range [a-m]
Instead of writing a long [abcdefghijklm], we can write [a-m]
But what about we want to match
A single match wouldn't work well here ??? How do we express to match either cat or dog ?
Parentheses for grouping
Escaping (backslash)
Sometimes, we need to express our searching in special characters like $ and \ ^ [] But these characters have special meanings in Regex; how can we tell them? "Hey, I want these to match these exact special characters." So backslash are a way to make it a literal match. A double backslash // matches a single backslash
Some of the special characters we have in our natural language are: asterisk (*), ampersand (&), braces{}, comma (,), brackets ([]),hyphen (-), equal-size (=), parenthesis (()), semicolon (;), slash (/), etc.
Begins and Ends
Unfortunately, most of the time, when we use /test/, it will match anywhere in the string, so in this case, i'matest will start the match at the end of the string. And if we want it to start at a complete new string, not a substring?
/^test$/
Using both ^ and $ indicates that the specified pattern must encompass the entire candidate string
Apply with strict comparison.
const number= "1234-5678-123456";
/\d{4}-\d{4}-\d{5}/.test(number); //true
//Apply strict comparision with ^ and $
/^\d{4}-\d{4}-d{5}$/.test(number); //false
console.log(/cat/.test("concatenate"));
// → true console.log(/\bcat\b/.test("concatenate")); // → false
Quantifiers
Combine greedy character (+) with non-greedy character +?
Predefined character classes
Regular expression provides some common set of characters that we often want to match
Predefined | Matches |
\t | Horizon tab |
\b | Backspace |
\n | Newline |
. | Any character (more powerful than \w) can take * & ^ % (many more special characters) except for white space (\s) |
\d | Any digital number |
\D | Any character but the number |
\w | Any alphanumeric character, including underscores, is equivalent to [A-Za-z0-9_]. Note: white-space is not alphanumeric character |
\W | Any character but alphanumeri, including underscore characters; equivalent to [^A-Za-Z0-9_] |
\s | Any whitespace character (space, tab, form feed, and so on) [ \t\n\r\f\v] (white-space is not a character) |
\S | Any character but a whitespace character |
[\s\S] | Get any character, including white-space (more powerful than any character ) |
\b | A word boundary |
\B | A non-word boundary |
Visual studio code
In visual code, \s doesn't contain \n. Details are here
https://github.com/microsoft/vscode/issues/108368
The problem with \s in visual studio code is \s is very broad concept, so when you declare \s (it only means searching for white-space) but not including new-line(\s) so if you really need to include \s and \S you have to explicitly tell VS code to do so
<Form[\s\S]*
Capturing matching segments(phân đoạn)
Perform a single captures
Say we want to extract a value that's embedded in a complex string. A good example of such a string is the value of the CSS transform property, through which we can modify the visual position of an HTML element
<html>
<div style="transform: translateX(15px)">
This is a simple HTML + CSS template!
</div>
<div id="test1"></div>
<div id="test2"></div>
<div id="test3"></div>
</body>
<script>
const transformValue = styleElement.style.transform;
if (transformValue) {
const match = transformValue.match(/translateX\(([^/)]+)\)/);
match[0]="transform: translateX(15px);
match[1]= 15px
}
</script>
</html>
Match segment
Using a local regular expression without /g. The string object match method will return an array containing
entire matched string
along with matches captured in regular expressions but for the first match only
const html = "<div class='test'><b>Hello</b> <i>world!</i></div>";
const result = html.match(/<(\/?)(\w+)([^>]+?)>/);
result: [ '<div class=\'test\'>', '', 'div', ' class=\'test\'' ]
Match segment doesn't work with global
Using match with speard JS operation
JS has a standard class for representing dates
console.log(new Date());
//Wed Dec 27 2023 11:09:17 GMT+0700 (Indochina Time)
// console.log(new Date(2009, 11, 9));
// → Wed Dec 09 2009 00:00:00 GMT+0100 (CET)
//console.log(new Date(2009, 11, 9, 12, 59, 59, 999));
// → Wed Dec 09 2009 12:59:59 GMT+0100 (CET)Referencing captures
We can do it better
function getDate(string) {
const [_,month,day,year]= /(\d{1,2})-(\d{1,2})-(\d{4})/.exec(string);
return new Date(year,month-1,day);
}
console.log(getDate("1-30-2003"));
// → Thu Jan 30 2003 00:00:00 GMT+0100 (CET)
Capture reference within the replace string
In this code, the value of the first capture (in this case, the capital letter F) is referenced in the replace string (via $1). This allows us to specify a replacement string without even knowing what its value will be until matching time. That’s a powerful ninja-esque weapon to wield.
"fontFamily".replace(/([A-Z])/g,"-$1").toLowerCase();
//font-family
Replace string
The replace method of the String object is a powerful and versatile method. When a regular expression is provided as first parameter to replace, it will cause a replacement on a match ( or matches if the regex is global) to the pattern rather than a fixed string
Simple form
//Replace with simple string
console.log("papa".replace("p","m"));
// mapa
"ABCDEF".replace(/[A-Z]/,"X");
// XBCDEF
"ABCDEF".replace(/[A-Z]/g,"X";
//XXXXXX
Swap string
"Liskov, Barbara\nMcCarthy, John\nWadler, Philip".replace(/(\w+), (\w+)/g,"$2 $1")
// Barbara Liskov
// John McCarthy
// Philip Wadler
Regular expressions with function
"border-bottom-width".replace(/-(\w)/g,(all,letter)=>{
return letter.toUpperCase();
});
// borderBottomWidth
// the function will get called twice , each with all = -b , letter = b
// second is all = -w , letter = w
We can use replace function to iterate over a string as well
// data: foo=1&foo=2&blah=a&blah=b&foo=3
// expected: "foo=1,2,3&blah=a,b",
function compress(source) {
const keys = {};
source.replace(/([^=&]+)=([^&]*)/g,
function (full,key,value) {
keys[key] =
(keys[key] ? keys[key] + "," : "") + value;
return "";
});
//full = foo=1 , key=foo , value = 1
const result = [];
for (let key in keys) {
result.push(key + "=" + keys[key]);
}
return result.join("&");
}
}
The most interesting aspect of this example is its use of the string replace method as a means of traversing a string for values rather than as a search-and-replace mechanism. The trick is twofold: passing in a function as the replacement value argument, and instead of returning a value, using it as a means of searching
With global matching string, we can only get a list of matches.
The search method
The indexOf method on strings cannot be called with a regular expression. But there is another method, search, that does expect a regular expression. Like indexOf, it returns the first index on which the expression was found, or -1 when it wasn’t found.
console.log(" word".search(/\S/);
// -> 2
console.log(" ".search(/\S/);
// -> -1
However there's no way to indicate where the match should start like
indexOf
Solving common problem with regular expressions
Matching newlines
When performing a search, it’s sometimes desirable for the period (.) term, which matches any character except for newline, to also include newline characters. Regular expression implementations in other languages frequently include a flag for making this possible, but JavaScript’s implementation doesn’t.
const html = "<b>Hello</b>\n<i>world!</i>";
/.*/.exec(html)[0] === "<b>Hello</b>";
/[\S\s]*/.exec(html)[0] ===
"<b>Hello</b>\n<i>world!</i>",
Another approach is ( | )
/(?:.|\s)/.exec(html)[0] === "<b>Hello</b><i>world!</i>";
Exercises
In JavaScript, regular expressions can be created with which of the following?
a Regular expression literals
b The built-in RegExp constructor
c The built-in RegularExpression constructor
Which of the following is a regular expression literal?
a /test/
b \test\
c new RegExp("test");Choose the correct regular expression flags:
a /test/g
b g/test/
c new RegExp("test", "gi");The regular expression /def/ matches which of the following strings?
a One of the strings d, e, or f
b def
c de
The regular expression /[^abc]/ matches which of the following?
a One of strings a, b, c
b One of strings d, e, f
c Matches the string ab
Which of the following regular expressions matches the string hello?
a /hello/
b /hell?o/c /helo/
*d /[hello]/The regular expression /(cd)+(de)*/ matches which of the following strings?
a cd
b de
c cdde
d cdcd
e ce
f cdcddededeIn regular expressions, we can express alternatives with which of the following?
a#
b&
c|
In the regular expression /([0-9])2/, we can reference the first matched digit with which of the following?
a /0
b /1
c \0
d \1
The regular expression /([0-5])6\1/ will match which of the following?
a060
b 16
c 261
d 565
The regular expression /(?:ninja)-(trick)?-\1/ will match which of the fol-lowing?
a ninja-
b ninja-trick-ninjac ninja-trick-trick
What is the result of executing "012675" replace(/0-5/g, "a")?
a aaa67a
b a12675
c a1267a
Practical use-case with Regular expression
Making self-closing tag
Solving the problem with self-closing tags. Not all of the HTML elements have this behavior, so we can come up with a way to use a JS regular expression to parse them correctly. Some of the HTML elements have a closing tag
area
base
img
link
menuitem
meta
source
etc
HTML wrapping
According to the semantics of HTML, some HTML elements must be within certain container elements before they can be injected.
Element name | Ancestor element |
<option>,<optgroup> | <select multiple>...</select> |
<legend> | <fieldset>....</fieldset> |
<thead>,<tbody>,<tfoot>,<colgroup>,<caption> | <table>....</table> |
<tr> | <table><thead>...</thead><table><table><tbody>...</tbody></table> ... |
<td>,<th> | <table><tbody><tr>...</tr></tbody></table> |
<col> | <table><tbody></tbody><colgroup>...</colgroup> |
<option>Yoshi</option>
<option>Kuma</option>
<!-- We want this to turn into this -->
<select multiple>
<option>Yoshi</option>
<option>Kuma</option>
</select>
A simple method for accessing style
CSS attributes is different than accessing CSS value from JS
Consider this example
const fontSize = element.style['font-size'];
The preceding is perfect. But the following isn't
the fontSize = element.style.font-size;
As forcing every developer to obey the rules in JS. Instead, why don't we write a simple abstraction and let's everyone use what they prefer?
function style(element,name,value) {
name = name.replace(/-[a-z]/ig,(all,letter)=>{
return letter.toUpperCase();
});
if(typeof value !==undefined) {
element.style[name] = value;
}
return element.style[name];
}
Fetching computed style
Normalize fetching in a computed-style API. The computed style API lets you pass in the CSS property but we'd also like to support passing JS style to it for example, fontColor
function fetchComputedStyle(element,property) {
const computedStyle = getComputedStyle(element);
if(computedStyle) {
property = property.replace(/([A-Z])/g,'-$1').toLowerCase();
return computedStyle.getPropertyValue(property);
}
}
fetchComputedStyle("div","fontSize");