UPDATED 26-Jan-2012 !

A brief note today to add some more specific validation to your models.  As we all know – “rubbish in == rubbish out”, so let’s get that data nice and clean, right up front!

public function rules()
	{
		// NOTE: you should only define rules for those attributes that
		// will receive user inputs.
		return array(
	                 array('firstname, surname, address', 'match', 'pattern'=>'/^[w-_',.]+$/'),</del>
                         array('firstname, surname, address', 'match', 'pattern'=>'/^[\w\-\_\'\ \,0-9\p{L}]+$/u'),
                         array('startdate, enddate', 'date','format'=>'d/m/yyyy', 'allowEmpty'=>false),
                );
         }

The address regex pattern is using a short-code \w which matches all alphanumeric characters including diacritics (letters with accents etc..).  Added basic puctuation, hyphen(-), underscore(_), single quote (‘),  comma(,) and full-stop/period (.) , 0-9 and lastly, if there is any chance that you might use multi-byte characters like utf-8, then the unicode point {L} matches any multi-byte letter. Note the /u at the end of the regex pattern which tells regex to use multi-byte matching. This combination should allow most european and amercian names and addresses like

O’hara
Smith-Klein
Cajó
King’s Cross St. Pancras
Béziers
and even åæßέ

 It may/should even allow other multi-byte combinations, but I haven’t tested it on anything non-european yet!

 

Please do add ideas for more validation rules below …

 

further reading:  Unicode Regex Patterns