UPDATED 26-Jan-2012 !
A brief note today to add some more specific validation to your models. As we all know – “rubbish in == rubbish out”, so let’s get that data nice and clean, right up front!
public function rules() { // NOTE: you should only define rules for those attributes that // will receive user inputs. return array( array('firstname, surname, address', 'match', 'pattern'=>'/^[w-_',.]+$/'),</del> array('firstname, surname, address', 'match', 'pattern'=>'/^[\w\-\_\'\ \,0-9\p{L}]+$/u'), array('startdate, enddate', 'date','format'=>'d/m/yyyy', 'allowEmpty'=>false), ); }
The address regex pattern is using a short-code \w which matches all alphanumeric characters including diacritics (letters with accents etc..). Added basic puctuation, hyphen(-), underscore(_), single quote (‘), comma(,) and full-stop/period (.) , 0-9 and lastly, if there is any chance that you might use multi-byte characters like utf-8, then the unicode point {L} matches any multi-byte letter. Note the /u at the end of the regex pattern which tells regex to use multi-byte matching. This combination should allow most european and amercian names and addresses like
O’hara Smith-Klein Cajó King’s Cross St. Pancras Béziers and even åæßέIt may/should even allow other multi-byte combinations, but I haven’t tested it on anything non-european yet!
Please do add ideas for more validation rules below …
further reading: Unicode Regex Patterns