Regular Expressions

Introduction

Regexp, or Regular Expressions are used to grab information based upon characters. It is much like split except easier in some ways are more complicated in others. It is made to only accept the kind of data you allow it to. It can be used for email verification and many other things. The cheatsheat below tells what each character does.

Basic Syntax

/regexp here/ The first slash is the beginning and the last slash is the end. You can add 2 letters, known as modifiers, after the last slash if you want. What they are and do: i-matches any case g-global search, this mean it doesn't stop after one. However, several times you must escape something. In the cheatsheat you may notice that the question mark does something. To get the litteral question mark you would do: ? A slash escapes any character's special value (besides those including a slash, like w). If you have another forward slash: / is the proper syntax. Or for a back slash: .

New RegExp

This is the syntax to make a variable regular expression: var regname = new RegExp(\"regular expression\", \"modifiers\"); The modifiers are the letters listed above. You can put either i, g, or gi.

Matches

You can select matches from parenthesis to obtain certain values. Lets say you have this somewhere on the page: A: 56 You'd then use this regexp to match it: /A: (n)/ Then you could output/use the value: document.write(RegExp.$1);

Greedy | Non-Greedy

This is a very commonly used idea. If it is greedy it will take as much as it can with still matching, where as non-greedy would take as little as it could. A common example might be in UBBC/html tags. If you want to match anything lets say the image url from an UBBC tag, and image UBBC tag.

The post/comment:

<img src=“http://blah.com/asdf.gif /> asdf <img src=“http://blah.com/jkl.gif />

Greedy would match:

http://blah.com/asdf.gif" /> asdf <img src=“http://blah.com/jkl.gif

Non-Greedy would match:

http://blah.com/asdf.gif

So how do you set the difference between greedy and non-greedy?

Well when you use a plus or * you just leave it to make it greedy and you add a ? after to make it non-greedy. So non-greedy would be: +? or *?.

Greedily match paragraph tags:

/(<p>|<P>)(.+)(</p>)/i

Non-greedily match:

/(<p>|<P>)(.+?)(</p>)/i

One last example

/^[w-.]+@[A-Z0-9-.]+$/i I'm just going to quickly break this down. / - beginning of regex ^ - starting at the begining of the string [w-.] - match a period, a dash and any letters, numbers, and underscoreds + - 1 or more of the previous @ - the at sign [A-Z0-9-.] - any letter, number, dash, or period. + - 1 or more of the previous $ - end of string /i - end of regex without matching case. If you can't already tell, this little regular expression checks for valid emails.

Regexp Cheatsheat

Code/Char - Definition/Match
^ - Start of string
$ - End of string
* - zero or more times
+ - one or more times
? - zero or one time
. - any character except newline
b - word boundary
B - non-word boundary
d - a digit
D - anything besides a digit
f - form feed
n - new line
r - carriage return
s - any spacing character
S - anything besides a space
t - tab
v - vertical tab
w - any letter, number or underscore
W - anything besides above
[abcde] - (part of) a string that matches any of those characters
[^abcde] - (part of) a string that matches anything but those characters
[a-e] - (part of) a string that matches the range of characters
{x} - exactly x occurrences of the previous character
{x,y} - x to y occurences of the previous character
() - a grouping
x|y - x or y