Regular expressions and javascript

It was 6-7 months back I googled for a clentside javascript fom validation code.With immediate effect I got the code which I was searching for.It was working perfectly.When I analyzed the code i found expressions like /^[A-Za-z0-9 ]{3,20}$/;.I didn't have any idea what it meant.Since the code was working well i didn't bother to waste time to check what that meant.

Two or three months later I came across an article in the Digit magazine titled Regular expressions.When i read it I came to know that it was the explanation for that strange expression which i found in the js validation code.In this article i would like to share the knowledge i gained from that article.


What are regular expressions?


Regular expressions(Regex) are a way of defining a pattern to be extracted/replaced/processed in a body of text.The exact syntax of regex in different programs differ.

Basics


Since it is pattern that we need to match rather than text sequences,regular expressions give special meanings to standard characters so that they can match multiple sequences.

For example if we want to match any double digit number,we do that using the regular expression \d\d.Here \d stands for any numeric digit.

What if we want to match any double digit even number?
The regex code for that is \d[02468].
The above code implies that:
\d means the first digit can be any numeric digit.
[02468] implies the second digit can either be 0,2,4,6 or 8.

Also we can use [0123456789] instead of \d.But there is even better way using ranges.[1-5][6-9] indicates that first digit can be 1,2,3,4 or 5 and second digit can be 6,7,8 or 9.

It is also possible to invert characters,for eg:[^1-5] indicates any number other than 1,2,3,4 or 5.

Pipe character "|" can be used to provide alternatives.For eg: Ju(ne|ly) shall be used if you are looking for occurences of both "June" and "July".

Repetitions


So far we checked double digit numbers.Now if we have to look for any number,i.e. any uninterrupted sequence of digits,we can use
\d+

The "\d" matches any digut and (+) plus sign signifies that the previous expression should occur once or more.

So the regex code to check decimal number is.
\d*\.\d+

  • \d* - Here the * matches zero or more occurences of the previous expression i.e. \d or any numeric character

  • \. - We need to match a dot character after the numeric part or before the decimal part.We use backlash '\' before dot character because '.' character has other function in regex.Backlash indicates that it is not a regex syntax but it is a character to be matched.

  • \d+ - matches for one or more occurence of a digit.


Now suppose we have to match a phone number("+91xxxxxxxxxx") from a document.The regex code for this is:
\+91\d{10}
Again we use backlash before '+' to indicate that it is not a syntax but a character to be matched.'+' already has a special meaning in regex.

Now "?" character is ysed to specify that preceeding character may or may not be present.
For eg:colou?r indicates that character 'u' may or may not be present.

Groups


To apply "*","+","?" etc. to a group of character,we can use brackets to group them together.For eg: to match both "child" and "children" we could use child(ren)?.

We now know to match input URL:
http://www.example.com/\d{2}/\d{2}/\d{4}/article.html
The day and month are 2-digit numbers and year is a for digit number.

If we have to save the day,month and year for further use,we can do it by putting each day,month and year in separate brackets as shown below:
http://www.example.com/(\d{2})/(\d{2})/(\d{4})/article.html.

Now when the regex engine matches the whole text,it will also store the three subsets of the match separately so that we can use them later on.
CharacterFunction
.Match all characters except newline
[abc]Match all characters inside bracket
[^abc]Match all characters except those inside bracket
[n-q]Match all characters in range
\dMatch all digits
\DMatch all non-digits
\nMatch newline character
\sMatch all whitespace characters
\SMatch all non-whitespace characters
\tMatch tab character
\wMatch any word character [A-Za-z0-9]
\WMatch any non-word character
\bMatch word boundary i.e. match between word ending and space.
\BMatch non-word boundary i.e. match inside word
\For use with special characters \\,\+,\(, etc.
*Match preceeding expression zero or more times
+Match preceeding expression one or more times
?Preceeding expression is optional,i.e. match zero or one time
{n}match preceeding expression n times
{n,}Match preceeding expression n or more times
{n,m}Match preceeding expression between n and m times
()Substring or capturing group


No comments:

Post a Comment