Chaos Pavanred's Blog ….. Exponential Error

20May/080

Regular Expressions

Regular expressions are the simple and flexible method of identifying strings, characters, symbols, patterns of any combination of these. Regular expressions (abbr. REGEX) are instrumental in searching particular strings and patterns in text; they are used for simple validations.

For instance, client side validations reduce greatly the overhead of the server side script to accommodate the handling of various types and forms of invalid inputs. And these expressions can be used in Perl, PHP, Java, .NET language or a multitude of other languages.

Let’s consider a common example -
There is a textbox which has to accept a positive integer from the user. Essentially the input provided in by the user (accepted in the textbox) will have to be validated. So the code has to accommodate the conditions -
check if the input is a integer then check if the input is a integer is a value greater than zero, if any of this fails then user has to be prompted to provide a valid input. This would include, accepting the value from the user, posting to the server, execution of the server end code and then reporting bad input to the user if in case.

Instead Regular expressions check the validity of the input at client end and post the input to the server for further execution only if the input is valid.

Now, returning back to the simple example we choose, The REGEX to check for a positive integer is ^\d+$. ^ indicates the starting position and $ indicates the end.

Let’s consider a name of a person. The REGEX for this is ^[a-zA-Z''-'\s]{1,20}$.
A valid input to this would be 'Pavan Kumar' or 'Tim O'Reilly'. This expression as you can see, allows a maximum of 20 characters, and allows both lower and upper case characters and also accommodates a special character in the name as shown in the sample name.

Now let’s validate a password, (?!^[0-9]*$)(?!^[a-zA-Z]*$)^([a-zA-Z0-9]{8,10})$
This Validates a strong password. It must be between 8 and 10 characters, contain at least one digit and one alphabetic character, and must not contain special characters.
Its obvious that the password should be 8 to 10 characters long, and it should not include any special characters, besides the '*' indicates the match of the preceding element once or many times. So, "(?!^[0-9]*$)" checks for at least one digit and "(?!^[a-zA-Z]*$)" checks for at least one alphabet of upper or lower case.

Another simple example is ^\d+(\.\d\d)?$. This validates a positive currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point. For example, 7.65 is valid but 3.1 is not.

Other very commonly used expressions are email and url -
^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$
This validates a email address.I guess it can now be easily understood by examining the REGEX.
Similarly, ^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$, This validates a url, A little bit of time invested then it is possible to formulate such REGEX, but general practice is that such common REGEX don’t need time to be invested as Google gives scores of such expressions if searched for.
Only if there is a specific purpose, then there would be formulation of REGEX to be done, Purposes like, Searching for a string or a pattern in a text, or matching for it, or even replacing them.

There are some websites which claim to be REGEX generators, this is one of the most popular REGEX generator (as they claim it), but I don’t find them even close to what I would want from a actual REGEX generator.

All in all REGEX are very simple and flexible way of reducing a great amount of overhead in actual code.

Add to Technorati Favorites

Tagged as: Leave a comment
blog comments powered by Disqus