Why mysql_real_escape_string is important and some gotchas
Alexander Andonov wrote an article for a Web site called WebAppSec, which understandably deals with Web application security. He talks about the ways of circumventing standard data validation routines in PHP via tricks like Unicode representations of characters, and gives general hints on better application security, which seems to boil down to using mysql_real_escape_string() over anything else, but there are some caveats there as well. Here’s the summary:
- Write properly quoted SQL:
- Single quotes around values (string literals and numbers)
- Backtick quotes around identifiers (databases, tables, columns, aliases)
- Properly escape the strings and numbers:
- mysql_real_escape_string() for all values (string literals and numbers)
- intval() for all number values and the numeric parameters of LIMIT
- Escape wildcard/regexp metacharacters (addcslashes(’%_’) for LIKE, and you better avoid REGEXP/RLIKE)
- If identifiers (columns, tables or databases) or keywords (such as ASC and DESC) are referenced in the script parameters, make sure (and force) their values are chosen only as one of an explicit set of options
- No matter what validation steps you take when processing the user input in your scripts, always do the escaping steps before issuing the query. Validation is not a substitute for escaping!
Pretty frequently in PHP you would see addslashes() used to supposedly clean the user’s input. It’s helpful to know the difference between addslashes() and mysql_real_escape_string(), and Chris Shiflett has written an article to explain exactly that. There’s also a table describing the main differences between the two. Shiflett is the author of O’Reilly’s Essential PHP Security, and has two free chapters from the book available on the site.