Categories
Codes Regex

Regex: Lookahead

It is always wise to look ahead, but difficult to look further than you can see.

– Winston Churchill

Intro

Actually, at this time I’m writing a tutorial about implementing Regex in JavaScript. When I reached the part where I write code snippets and explaining how does it work in practical use cases, I came across lookahead in regex.

After some time writing to explain this topic, I realised the post was getting too long to read, almost off topic because there was a sub-tutorial inside a tutorial. Which is why, I decide to split it and write this post.

Not bad for productivity.

What is a Lookahead?

Look at this regex:

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

You should notice there are the (?=<expr>) expressions. That is called a positive lookahead. Basically, it’s used for returning whether it’s possible to match or not, but then acts as if it does not exist.

There is also the inverse, negative lookahead. It’s just the inverse of positive lookahead, with this syntax: (?!<expr>).

Yeah, it’s confusing at first. It took me some time to understand this too.

As you continue reading, you will understand what I meant.

How does it work?

Positive Lookahead

The positive lookahead syntax is (?=<expr>).

Suppose I have this regex: /ma(?=n)/. It will match man, mandarin, man-made.

As you can see, the (?=n)acts like it’s a ghost. It affects, but it does not appear in the match. It checks if there’s an n after ma. It matches ma that has an n after, but returns as ma.

Negative Lookahead

The syntax of a negative lookahead is (?!<expr>).

For example, we have /ma(?!n)/, it means that man will not satisfy the regex, but mad, max, mat, ma, master will.

It’s just the inverse of positive lookahead. What returns true in positive lookahead will return false in negative lookahead, and vice versa.

Multiple Cases

The multiple positive lookaheads in one regex are quite difficult to understand using our brain compiler if you are new to regex.

For example, we have this string: P@ssw0rd. We would like the string to satisfy all conditions in the regex: 8 characters with at least 1 number, 1 upper case letter, 1 lower case letter, and a symbol.

To understand this, I will explain with these expressions:

  • /(?=.*[a-z]).{8,}/
  • /(?=.*[A-Z]).{8,}/
  • /(?=.*[0-9]).{8,}/
  • /(?=.*\W).{8,}/

Let’s trace it step by step:

Step 1: /(?=.*[a-z]).{8,}/

(?=.*[a-z]).{8,} checks for P@ssw0rd, because .* is satisfied by P@, and [a-z] is satisfied by s. And then, P@ssw0rd is for .{8,}, which satisfies any character for at least 8 or more.

The question is, should it not satisfy the condition? Because P@s is for .*[a-z], and should sw0rd not satisfy.{8,} because it’s only 5 characters? You are right if the regex is /(.*[a-z]).{8,}/.

After returning the match, the positive lookahead “resets” the cursor back to before the positive lookahead expression as if it does not exist.

Back to when(?=.*[a-z])is satisfied by P@ssw0rd, the cursor is now at s after P@s satisfies .*[a-z]. Then, because it’s a positive lookahead, the cursor resets to before P, which is void. And from the void before P, P@ssw0rd tries to satisfy .{8,}, which it does.

OK, the long explanation already gets out of the way. The next steps will be simpler to understand now.

Step 2: (?=.*[A-Z]).{8,}

.* is satisfied by void before P, and [A-Z] is satisfied by P. The cursor is now at P, but it resets back to void before P. And then, check P@ssw0rd for .{8,}, which satisfies any character for at least 8 or more.

Step 3: (?=.*[0-9]).{8,}

.* is satisfied by P@ssw, and [0-9] is satisfied by 0. The cursor is now at 0, but it resets back to void before P. And then, check P@ssw0rd for .{8,}, which satisfies any character for at least 8 or more.

Step 4: (?=.*\W).{8,}

.* is satisfied by P, and \W is satisfied by @. The cursor is now at @, but it resets back to void before P. And then, check P@ssw0rd for .{8,}, which satisfies any character for at least 8 or more.

The Real Deal

Suppose we have this expression, from the example up above.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

The 4 steps before are for explaining how positive lookahead works. Now that we already understand it, the steps are actually like this:

P@ssw0rd satisfies (?=.*[a-z]), because .* is satisfied by P@, and [a-z] is satisfied by s. Cursor resets back to void before P.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

Next, P@ssw0rd satisfies (?=.*[A-Z]), because.* is satisfied by void before P, and [A-Z] is satisfied by P. Cursor resets back to void before P.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

And then, P@ssw0rd satisfies (?=.*[0-9]), because .* is satisfied by P@ssw, and [0-9] is satisfied by 0. Cursor resets back to void before P.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

The next step, P@ssw0rd satisfies (?=.*\W), because .* is satisfied by P, and \W is satisfied by @. Cursor resets back to void before P.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

Finally, P@ssw0rd satisfies .{8,}, because P@ssw0rd satisfies at least any 8 characters or more. The cursor does not reset because it’s not a positive lookahead.

/(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*\W).{8,}/

And voila, it is done. P@ssw0rd satisfies all conditions.

Conclusion

The great magic of regex never ends. But it’s not magic if you understand it, it becomes knowledge and logic. Though, it will appear like magic to people who doesn’t know.

The lookahead is a great help on creating regex. It allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.

If you would like to read more about lookahead, I suggest reading on this site. They explain more in technical term about it. There’s also a regex called lookbehind too.

I hope you find this post able to improve your understanding about regex, and use it wherever you see fit. Now, go get some practice and always remember to live your code and code your life!

Now is the best time you’ll ever have in life to get ahead!

– Zig Ziglar

You might also interested in:

By Ericko Yap

Just a guy who is obsessed to improve himself. Working as a programmer in a digital banking company. Currently programming himself in calisthenics, reading books, and maintaining a blog.

Leave a Reply

Your email address will not be published. Required fields are marked *