Categories
Codes Javascript Regex

Simple Regex Implementation In Javascript

“Intellect without implementation is ignorance, not intelligence.

― Richie Norton

Introduction

In this post, we are going to learn about implementing regex in JavaScript. After all, learning without doing is the same with not learning at all.

If you don't know a thing about regex, I suggest you read this tutorial, then get back to this post later when you have grasped the basics of it.

Done reading the basics? Good.

We are going to use vanilla, pure, unmodified JavaScript (that means no ReactJS, no jQuery, no Vue.js, etc.). Therefore the syntax here can be used in any extended JavaScript libraries.

We'll cover the methods which you will use, complete with the examples, and then some practical use cases you can implement yourselves.

Oh, one more thing, the examples shown here exist for learning purposes only. If you would like to get better results in your implementation, I suggest you use the built-in methods in the framework / library you choose, as it's already battle-tested and used by many.

OK, let's get started.

Methods You Should Know

As regex is used for string matching, there are three built-in methods in JavaScript we can use: search, match, matchAll() and replace. We will cover each of the methods with some examples.

Important note: When specifying regular expressions, you don't need to quote it like a string. Write as /regexp/ instead of "/regexp/".

Search

Returns the index of the first character of the first match result. If not found, returns -1.

The syntax is simple and straightforward. Just call a search() method on a string variable, shown as the example below:

const someString = "This is a string";
const index = someString.search(/is/);
// index = 2

As a proof that the code works, here is a code snippet which shows the regex finds the result and returning the index, using the regex /the/:

Then this snippet below shows if the result is not found, using regex /vixen/. Because no "vixen" is found, the result will be -1.

Match

Returns an array of matches. If no matches found, will return null. Will return the first match only if flag g is not specified.

Just like search(), the syntax is only calling a match() on a variable which is a string:

const someString = "This is a string, isn't it?";
const result = someString.match(/is/g);
// result = is,is,is

Here is a working snippet which matches are found, using regex /is/g on the string "This is a string isn't it?":

If there are no g flag, the match() method will only return the first match, shown in this snippet:

And here is a snippet in which no matches are found. The method will return null as shown below:

Match All

There is also another variant of match(), it's called matchAll(). matchAll() will also return the capturing group in addition of the matched parts of string. The result is an array of arrays of string.

To make it clearer, here is a code snippet demonstrating how to use matchAll():

In the regex /c(he)(ck(\d?))/g, notice there are three capturing groups:

  • (he)
  • (ck(\d))
  • (\d)

Each of the capturing group result will also be pushed in the array, after the result match. The illustration below explains the array hierarchy:

  • result => Array(2)
    • result[0]
      • result[0][0]: "check1"
      • result[0][1]: "he"
      • result[0][2]: "ck1"
      • result[0][3]: "1"
    • result[1]
      • result[1][0]: "check2"
      • result[1][1]: "he"
      • result[1][2]: "ck2"
      • result[1][3]: "2"

matchAll() will return an empty object if there are no matches. The example is shown in the snippet below:

Replace

Search a part of string using regular string or regex, and then replace the matches with a specified value. Returns a new string already replaced by the new value. The specified value can be a string, or a function.

I'm sure you noticed that in some of the code snippets I shared on search() and match() section before, I already used the method replace().

It is of the same pattern with search() and match(), by calling replace() method on a string variable:

const someString = "You are a dog.";
const result = someString.replace(/dog/, 'cat');
//result = "You are a cat."

Here is a snippet showing how replace() matches successfully on a string with /sad/g regex and replaced with string happy:

And here's what happen if no match is found, which results in no string replacements:

That wasn't so hard, wasn't it? By using these three methods, you could achieve more in your codes. The three methods are summed as below:

search() - Returns index of first match, -1 if no match.
match() - Returns array of matches, null if no match.
matchAll() - Returns array of array of matches including capturing groups, empty object if no match.
replace() - Returns string replaced with new value if match.

Practical Use Cases

We have covered the basics we can use. Now, we will learn and practice real life use cases. If you are learning to program, particularly a web programmer, sooner or later you will have to solve these problems:

Use Case 1: Check Email Format

I am very sure that you have already registered an account at least once with your personal email, whether it's for online shops or online course, or probably Netflix.

An email has its own format:

<local-part>@<host>.<domain>

And, when you are registering for an account and mistyped your email as your name, you will be shown an error pretty much like this:

Please enter a valid email (e.g. someone@example.com)

Classic.

Now that you know your regex and functions, you can also create a simple email checker. Here is a snippet for testing whether an inputted email is already in correct format or not:

The regex used is this:

/[a-zA-Z0-9.!#$%&’+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*/

It might look quite complex and long, but it isn't. The regex can be split to parts to make it look simpler to read:

Test local-part: [a-zA-Z0-9.!#$%&’+/=?^_`{|}~-]+
This means that there must be at least one character, either lower case letter, upper case letter, any digits, or any symbol ranging from .!#$%&’+/=?^_`{|}~-. It matches the string before @ in an email: sample@example.com

Test @ exists: @
This regex requires that an @ symbol must exist after the previous part. It matches the @ in sample@example.com

Test if host exists: [a-zA-Z0-9-]+
This part tests that there must be at least one character of either lower case letter, upper case letter, any digits, or - symbol. It matches for example: sample@example.com

Test if domain exists (optional): (?:\.[a-zA-Z0-9-]+)*
It tests a . followed by at least one character of lower case, upper case letter, any digits, or - symbol. And then the * means for zero or more characters in the group. It matches for example: .com or .co.id

Use Case 2: Check Password Format

What comes after email?

Password.

Email and password are the two must haves (besides username and phone number) when we are talking about creating an account. A password must be strong enough to make sure it's hard to be hacked.

There are some criteria which must be fulfilled in creating today's password:

  • Must be more than n characters.
  • Must have one upper case letter (a-z)
  • Must have one lower case letter (A-Z)
  • Must have a number (0-9)
  • Must have a symbol (!@#$%^&*()...)

All this criteria can be easily checked using a regex. And here's a snippet to check a password whether it has fulfilled the above criteria or not:

As you can see, this is the regex used:

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

Let's break down the expressions, shall we? You should notice there are (?=...) expressions. That is called a positive lookahead. Basically, it's used for returning whether it's possible to match or not, but not included on the result match. Yeah, it's confusing. Just read more about it here.

OK, let's put aside the positive lookahead syntax for now, put them inside your mind drawer right now. We'll get to it a bit later.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

becomes

/.{8,}/

Nice, it's become quite readable now.

/.{8,}/ consists of . and {8,}, which means match anything except newline at least 8 characters or more. And thus, it serves as a checker for "Must be more than n (which is 8) characters". Simple.

Now, take the positive lookahead expressions from your mind drawer. There are 4 of them:

(?=.*[a-z]): Matches any character at least zero or more, followed by any lower case character. This satisfies "Must have at least one lower case letter".

(?=.*[A-Z]): Matches any character at least zero or more, followed by any upper case character, for checking "Must have at least one upper case letter".

(?=.*[0-9]): Matches any character at least zero or more, followed by any digits, to satisfy "Must have at least one number".

(?=.*\W): Matches any character at least zero or more, followed by any non-word characters, for checking "Must have at least one symbol".

If all conditions are fulfilled, it matches. Then you will have a good password checker, to make sure the password is valid.

Use Case 3: Escaping User Input

Another use case we will cover is about escaping user input. What does it mean?

Some characters such as ' " / \ ? & < > are widely used in any programming language. And these characters, if not escaped, would be executed as part of the script, for example:

If you tested the script, there are three cases here:

  1. The first one only input Test, so it will correctly return Test.
  2. The second case: </p><a href="https://google.com">Google</a>
    will result in a link to Google.
  3. The third case is sort of cool though, you can create a new unusable input form below the original form.

As you can see here, for the second and third case the string transformed to a HTML object. And you don't want that.

In this simple example, it's just HTML, so it causes no harm, but a rather cool feature. What about injected SQL script or injected JavaScript then? It would wreak havoc in the system you have maintained. It could cause database corruption, or worse, cause the data stolen.

But no fear, regex can come to the rescue! To make things simple, we are going to escape the input so case 2 and case 3 do not happen:

This is the regex used: /[<>]/g
It means, check in range whether < or > exists in the string. And then when it matches, it will call this function:

function (tag) {
    switch (tag) {
      case '<': return '&lt;';
      case '>': return '&gt;';
      default: return tag;
    }
}

If the match is <, it will be replaced with &lt;. If it's >, it will be replaced with &gt;. If it matches but not specified what the tag is, it will return as is.

And then, the result is any HTML tags will not be converted to HTML object anymore when inputted by user. We are safe.

Use Case 4: Parse Configuration Files

The last use case in this post. And I can feel your enthusiasm because of the word "last". It's about parsing configuration files.

Configuration files are one of the most important concept you must know and can't live without. Basically, you can change some variables without changing the code, so it can be deployed easily to any environment.

There are many kinds of configuration files, it depends on the programmer. But personally, I like the env-style of configuration, here is an example:

HOST=http://any-web-you-want.com
PORT=3000
TIMEOUT=30000
SERVICE_ENABLED=true
HOST_2=http://another-host.com
...

The pattern is simple. The first one is the environment variable, followed by =, and then the value. Therefore, we can use this regex:

/^(\w+)=(.+)$/gm

And here is an example code snippet to parse the configuration text:

In the code, we use matchAll() to get the capturing groups in the match. We know that matchAll() will return array of array of matches including capturing groups. Therefore, we will have this pattern:

result[0][0] = "HOST=http://any-web-you-want.com"
result[0][1] = "HOST"
result[0][2] = "http://any-web-you-want.com"
result[1][0] = "PORT=3000"
result[1][1] = "PORT"
result[1][2] = "3000"
...

To get the key and value, we only need result[i][1] and result[i][2] for each array of arrays. Iterate it and you will have the keys and values from the environment variables. Done.

Conclusion

Finally, you reached the end of this post! That's amazing!

Implementation of regex is not just limited to JavaScript. Most programming language now supports regex. If you already learn the logic, you won't have any trouble using any programming language you desire.

There are still so many cases you can solve using regex. Just keep learning and practicing, you'll get much better in no time!

If there are any mistakes I made in this post, please don't hesitate to reach me, as I will fix it, be grateful to you and also will help others who read the post to understand better

I hope you learned a lot! Always remember to live your code and code your life!

Implementation is the sincerest form of flattery.

- L Peter Deustch

You might also interested in:

By Ericko Yap

Just a guy who is obsessed to improve himself. Working as a programmer in a digital banking company. Currently programming himself in calisthenics, reading books, and maintaining a blog.

Leave a Reply

Your email address will not be published. Required fields are marked *