Regular expressions in JavaScript
Regular expressions are an extremely powerful tool, and every major language supports them (in their own way). As a programmer you probably don't have to know how to write complex regular expressions, instead you can usually find ones that have already been written (from a reliable source). But you do need to know how to make use of them in your programs.
Regular expressions find patterns in strings. They are used to see if a string matches a pattern, or if a pattern exists inside a string. You define a 'pattern' to search for, and then you apply that pattern to a string to see if there is a match. A common pattern is an email address, every email address starts with a mailbox name, then an 'at' sign (@) followed by a domain name and then a dot, and finally a top-level domain. For example: bobsmith@example.com
Metacharacters
The syntax used in regular expressions is made up of meta characters, which have special meaning when creating patterns. For example, in the regular expression ^cat, the carrot means that a string will match this pattern only if 'cat' appears at the beggining of the string. If the regular expression is cat$, the dollar sign metacharacter means that 'cat' must appear at the end of the string. So, ^cat$ would mean that the string must be equal to 'cat'.
To see if a metacharacter is included in a string, then you would have to 'escape' it with a back slash. For example, if (for some reason) you wanted to see if a string included 'ca^t', then the regular expression would be ca\^t.
Here are some of the common metcharacters:
[] a 'chunk' of charachters (you usually put a 'class' of characters in the chunk)
to specify more than one range of characters.
for example [A-Za-z0-9] would match any letter (capital or lowercase) or any number
? the preceding chunk can occur 0 or 1 time
ex: CAR?T matches CAT and CART
ex: CA[A-Z]?T matches CAT and CANT and CART and...
* the preceding chunk can occur 0 or more times
ex: M[O]*N matches MN, MON, MOON (you could also write it as MO*N)
+ the preceding chunk must occur 1 or more times
ex: C[A-Z]+T matches CAT,CBT,CCT but not CT
{2} the preceding chunk must occur exactly 2 times
{0,} the preceding chunk can occur 0 or any number of times
{1,4} the precding chunk must occur at least 1 time but not more than 4 times
^ negation
\ to escape a character. ex \+ searches for a + in the string
YOU MUST ESCAPE ALL METACHARACTERS TO BE ABLE TO SEARCH FOR THEM
\Q \E can be used in most flavors to escape everything that is put
between \Q and \E
() to group expressions chunks, and classes of characters
([0-9]AB)?:([a-z]) any single number followed by AB can appear 0 or 1 time before a colon
and any lowercase letter
short-hand characters
\d number, same as [0-9]
\s whitespace characters (this also includes \t \r \n)
\w a 'word character' can be alphanumeric or underscore. same as [A-Za-z0-9_]
NONPRINTABLE characters that you can search for \t \n \r
NOTE: windows text files use \r\n to terminate lines unix uses \n
RegExp objects in JavaScript
Each language has it's own APIs for working with regular expressions. In JavaScript there is a RegExp object that can be used to create your patterns. A RegExp object has a test() and an exec() method that you can use.
In this example, the pattern constant is a RegExp object that can be use to see if a string contains 'Monday'. The test() method will return true if the parameter matches the pattern defined by the regular expression.
const text = "He said he would be here on Monday, or Tuesday";
const pattern = new RegExp("Monday");
const result = pattern.test(text); // should be true
There is an alternative syntax you can use for creating a RegExp object:
const pattern = /Monday/; // same as: const pattern = new RegExp("Monday");
You could make your pattern case-insensitive by using the i modifier like this:
const text = "He said he would be here on Monday, or Tuesday";
const pattern = /monday/i;
const result = pattern.test(text);
Here are some examples of regular expression patterns:
/^hello/ hello must be at the start of a string
/hello$/ hello must be at the end of a string
/hel+o/ 1 or more l’s
/hel*o/ 0 or more l’s
/hel?o/ 0 or 1 l
/hello|goodbye/ hello or goodbye
/he..o/ . any character
/\wello/ \w any alphanumeric char or an underscore
/\bhello/ \b word boundary, must be a space or return before hello
/[crh]ope/ character range
strings that contain 'cope', 'rope', or 'hope' would match
String Methods For Working With Regular Expressions
The String object also has methods that can be used with a RegExp object, such as replace(), search(), and match().
This example uses the g modifier in the regular expression, which will replace all occurrences of 'cats' in the text variable (the 'g' stands for 'global'):
const pattern = /cats/g;
const text = "It's raining cats and dogs, and more cats";
const result = text.replace(pattern, "guinea pigs");
See what you get in the console log when you try out this code sample:
const text = "He said he would be here on Monday, or Tuesday";
const pattern = /mon|tues|wed|thurs|fri day/ig;
const result = text.match(pattern);
console.log(result);
The regular expresion (in the 'pattern' variable) is checking for 'mon', or 'tue', or 'wed', or 'thurs', or 'fri' and is using both the i and g modifiers, which means that it is both case insensitive and global. The result returned by the match() method should be an array that includes 'Mon' and 'Tues'.
Capturing Matches
You can use the exec() method of a RegExp object to 'capture' various parts of a string that match the pattern.
Here's a very powerful example of the exec() method of the RegExp object. We are searching for /books/ followed by any number (repeated any number of times), followed by a forward slash. We can not only capture the the entire match in the text, we can use parenthesis to capture the number(s) that appear in the match.
const url = "http://www.some-web-service-for-books.com/books/7/edit";
const pattern = /books\/([0-9]*)/;
const result = pattern.exec(url);
console.log(result);
The result will be an interesting type of array that includes the pattern we are matching (/books/7/) and the number 7 which is what we 'captured' by including the parenthesis in our regular expression.
Some useful regular expressions
Here's a little program that uses regular expressions to check for phone numbers, dates, email addresses, and password strength. The code for it is below:
The HTML code:
<input type="text" id="txtPhone" />
<button id="btnPhone">Validate a phone number in xxx-xxx-xxxx format</button>
<br>
<br>
<input type="text" id="txtDate" />
<button id="btnDate">Validate a Date in mm/dd/yyy format</button>
<br>
<br>
<input type="text" id="txtEmail" />
<button id="btnEmail">Validate email address</button>
<br>
<br>
<input type="text" id="txtPassword" />
<button id="btnPassword">Validate that a password is strong</button>
The JavaScript code:
document.getElementById("btnPhone").onclick = function(){
var pattern = /[0-9]{3}-[0-9]{3}-[0-9]{4}/;
var userInput = document.getElementById("txtPhone").value;
var result = pattern.test(userInput);
alert(result);
};
document.getElementById("btnDate").onclick = function(){
var pattern = /^\d{2}[./-]\d{2}[./-]\d{4}$/;
userInput = document.getElementById("txtDate").value;
var result = pattern.test(userInput);
alert(result);
};
document.getElementById("btnEmail").onclick = function(){
var pattern = /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/;
userInput = document.getElementById("txtEmail").value;
var result = pattern.test(userInput);
alert(result);
};
document.getElementById("btnPassword").onclick = function(){
var pattern = /^.*(?=.{8,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!#$%&? "]).*$/;
userInput = document.getElementById("txtPassword").value;
var result = pattern.test(userInput);
alert(result);
/*
(?=.{8,}) // the length must be at least 8 characters
(?=.*[a-zA-Z]) //it must include letters
(?=.*\d) //it must include numbers
(?=.*[!#$%&? "]) //it must include at least one of these characters
*/
};