According to Google UTF8 is now the most popular character set on the web!
UTF-8 is a variable-length character encoding for Unicode. This text is encoded in UTF-8, so the characters you are reading can consist of 1 to 4 bytes. As long as the characters fit within the ASCII range (0-127), there is exactly 1 byte used per character.
But if I want to express a character outside the ASCII range, such as '¢', I need more bytes. The character '¢' for example consists of: 0xC2 and 0xA2. The first byte, 0xC2, indicates that '¢' is a 2-byte character. This is easy to understand if you look at the binary representation of 0xC2:
11000010
As you can see, the bit sequence begins with '110', which as per the
UTF-8 specification means: "2 byte character ahead!". Another character
such as '€' (0xE2, 0x82, 0xAC) would work the same way. The first byte,
0xE2, looks like this in binary:11100010
The prefix '1110' specifies that there are 3 bytes forming the
current character. More exotic characters may even start with '11110',
which indicates a 4 byte character.STEP 1: Set up your text editor / IDE to talk in UTF8
This step is optional but you should take it into considerationSTEP 2: Declaring character encodings in HTML
Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and content attributes (called a pragma directive). The declaration should fit completely within the first 1024 bytes at the start of the file, so it's best to put it immediately after the opening head tag.<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
...
or
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
...
It doesn't matter which you use, but it's easier to type the first one. It also doesn't matter whether you type UTF-8 or utf-8.
Working with XML formats
XHTML5: An XHTML5 document is served as XML and has XML syntax. XML parsers do not recognise the encoding declarations in meta elements. They only recognise the XML declaration. Here is an example:for XHTML documents is to use the XML declaration to set the encoding for your web pages:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html ....
The XML declaration is only required if the page is not being served as UTF-8 (or UTF-16), but it can be useful to include it so that developers, testers, or translation production managers can visually check the encoding of a document by looking at the source.
The accept-charset="UTF-8" attribute is only a guideline for browsers to follow, they are not forced to submit that in that way, crappy form submission bots are a good example
<form accept-charset="UTF-8">
Go through step 5 to for server side handling of input
STEP 3: Configuring PHP to handle UTF8
PHP is not fist place designed to deal with UTF8, for this purpose you have to use ‘Multibyte String’ extension of PHP
mbstring provides multibyte specific string functions that help you deal with multibyte encodings in PHP. In addition to that, mbstring handles character encoding conversion between the possible encoding pairs. mbstring is designed to handle Unicode-based encodings such as UTF-8 and UCS-2 and many single-byte encodings for convenience (listed below).
mbstring is a non-default extension. This means it is not enabled by default. You must explicitly enable the module with the configure option..
The following configure options are related to the mbstring module.
--enable-mbstring : Enable mbstring functions. This option is required to use mbstring functions.
And configure mbstring in php.ini
;; Set default internal encoding
;; Note: Make sure to use character encoding works with PHP
mbstring.internal_encoding = UTF-8 ; Set internal encoding to UTF-8
;; HTTP input encoding translation is enabled.
mbstring.encoding_translation = On
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
Although popular browsers are capable of giving a reasonably accurate guess to the character encoding of a given HTML document, it would be better to set the charset parameter in the Content-Type HTTP header to the appropriate value by header() or default_charset ini setting.
default_charset = "utf-8"
in php.ini OR put it On top of PHP script of you application
header('Content-Type: text/html; charset=UTF-8');
Advantage of using the HTTP header is that user agents can find the character encoding information sooner when it is sent in the HTTP header.
STEP 4: Configuring Apache for UTF8
Apache configuration (in httpd.conf or .htaccess)AddDefaultCharset utf-8
STEP 5: Setup your database to store UTF-8
A example query for creating a UTF8 enabled database on mysql:CREATE DATABASE <DBNAME> CHARACTER SET utf8 COLLATE utf8_general_ci;
We also need to tell our database server that we want to talk to it in UTF-8. The safest way to ensure your scripts are sending and receiving UTF-8 from MySQL is to set the character set of the connection _after_ you connect to the server, by sending these queries:
SET NAMES utf8;
SET CHARACTER SET utf8;
STEP 6: Code php to handle UTF8
Now here is PDO example to establish connection with mysql to ensure that you are communication with in Unicode language To get UTF-8 charset you can specify that in the DSN:// for PHP ≥ 5.3.6,
$link = new PDO("mysql:host=localhost;dbname=DB;charset=UTF8");
// for PHP < 5.3.6,
$db = new PDO('mysql:host=myhost;dbname=mydb', 'login', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES \'UTF8\''));
for mysqli, you can call set_charset():
$mysqli->set_charset('utf8mb4'); // object oriented style
mysqli_set_charset($link, 'utf8mb4'); // procedural style
Share: