http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet http://www.phpwact.org/php/i18n/charsets%23checking_utf-8_for_well_formedness also: A. GOOD LINKS TO UNDERSTANDING UNICODE 1. a short/friendly intro to unicode- minimal needed for programmers: http://www.joelonsoftware.com/articles/Unicode.html 2. a good explanation of how windows-1252 character set differs from normal us-ascii (aka 'iso-8859-1', aka 'latin-1'), and the problems this creates: http://en.wikipedia.org/wiki/Windows-1252 3. a quick cheat-sheet (only for someone who would be technically interested) on how to programmatically character-by-character convert us-ascii to utf-8. we don't need to know this as php programmers. just fyi only: http://intertwingly.net/stories/2004/04/14/i18n.html#utf8 B. CONVERTING FROM ONE CHARSET TO ANOTHER IN PHP 1. there are several ways to do it. one is with the iconv() php call: $utf8_string = iconv("windows-1252", 'utf-8', $windows1252_string); 2. however, conversions like the above should most often be unecessary, if all incoming and outgoing web-pages are in utf-8. this means even html pages containing forms: if the page containing the form is in utf-8, then even with people cutting and pasting text in from windows-1252 on their windows machines, the browser program they use will automatically convert and submit the form's text into utf-8. C. RECOMMENDED POLICY: thus, all our web pages should be in utf-8. and the data stored in the database should be in utf-8. even XML defaults to utf-8. so everything coming and going will be in the same universally-recognized character set. D. CONFIGURING WEB SERVER 1) apache's default character set: if your web server's httpd.conf file still has a line reading: AddDefaultCharset ISO-8859-1 then change it to: AddDefaultCharset UTF-8 and restart: /sbin/service httpd restart NOTE: I've already done this for web1 and our two xml servers (apache on fedora core 4 comes with UTF-8 as default character set, anyway). 2) tags in section: the tag embedded in an .html file itself is not so important. as far as i can tell, both ie and firefox ignore it. perhaps they only pay attention to it if the charset is missing from the http header?? in any case, to be careful, you should either delete it completely, or re-do it from: to 3) header() php commands if you have any header("Content-type: text/html; charset=iso-8859-1"); commands in your .php scripts, then change them to: header("Content-type: text/html; charset=utf-8"); or just delete them completely (since the web server now defaults to utf-8 because of changes made in step 1.).