What is meta charset in HTML?

Asked

Viewed 26,168 times

15

Can someone explain me this HTML5 code?

<meta charset="utf-8">

What is this standard used for and why?

3 answers

13

According to the specification of HTML, the element <meta>, who must at all times be in the <head>, "represents various types of metadata that cannot be represented with the elements title, groundwork, link, style or script" free translation. Examples of these metadata are content summary, keywords, indications to search robots, among others.

He may have an attribute content, and must have an attribute name, http-equiv or charset (and never more than one among these three). In case there is a charset, it is used to indicate the character encoding format used in the document.

In addition to the 128 basic ASCII characters, the same graphic symbol can be encoded internally in different ways. This applies, for example, to all accented Portuguese characters. If an HTML file is saved with Latin 1 encoding (ISO 8859-1, or Windows 1252, which is similar), the character ã uses only one byte to be stored. In UTF-8, the same character uses 2 bytes, with values different from the byte used in Latin 1. So, if you have the browser display with an encoding and the document is being served with another, the special characters break.

It is important to remember that the element <meta> should not be the main method to be used to indicate to the browser which charset is used. The preferred method is for the server to send an HTTP header with this information. The use of <meta> is a second line of defence against encoding - highly recommended, do not leave your HTML without it (see for example the case mentioned by Miguel Angelo: an HTML can be opened directly, and not sent by a server, and in this case there would be no header indicating the charset used, if not the presence of this HTML element).

  • +1 for indicating that this should not be used as the main way to indicate the encoding, although I think it should always be present because of keeping the encoding when saving the file for example.

  • I included another sentence at the end reinforcing that the use of meta charset is also recommended.

  • @bfavaretto, how can I view the HTTP header sent by the server to my browser?

  • 1

    @James: You can use the fiddle, which is a program to view the requests made, which allows you to view the headers.

  • 1

    @Tiago Depends on the browser. In Chrome, in the developer tools, "Network" tab, you can see the list of all requests, and for each one you can see the request and response headers.

6

This serves to indicate which is the encoding of the html file served.

In this case is indicating the encoding utf-8, which is a pattern defined by Unicode:

Unicode UTF-8

The Unicode UTF-8 is an encoding format that has a variable character size, and can be from 1 to 4 bytes. The most common characters are mapped to 1 byte codes, others less common, such as most accented characters, have 2 bytes.

Tag Meta

The HTML meta tag, allows to indicate meta-information, ie information about the document itself.

How the charset kills works?

This tag, used with charset serves to indicate the coding of the document itself. Note that there is a certain inconsistency in this statement, then comes the question: How could it be possible to read the coding of the document from within the document itself? After all, knowing the coding of the document beforehand is what allows you to read the file.

Answer: Most encodings are very similar to the main characters, then in most encodings it is possible to read parts of the file, considering the ASCII encoding. This is how the browser can read, until it finds the tag <meta charset="utf-8">... when this happens, it resumes reading the file, considering the encoding it just found inside the HTML itself.

Two ways to indicate charset in metadata form

There are two ways to specify the HTML document charset, with meta-information:

Recommended form by HTML5 standard:

<meta charset="utf-8">

or so:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

These two forms are equivalent. In both cases these tags should appear within the tag head of the HTML document... and the sooner the better, because when the tag is found, the reading of the file will have to be restarted.

Reference

Why does it exist?

Why is there such a way to specify the encoding of the HTML document? In the HTTP protocol, there is no longer a way to indicate the charset using a header in the response?

This isn’t just a guarantee. It turns out that an HTML document served through the HTTP protocol could be saved by the receiver, independently of the headers with which the file was served.

There is no restriction in HTML as to which protocol should be used to serve the document, so the meta tag with charset is almost a must for me: it allows the document to remain consistent, even if the/Storage protocol does not support indicating the file charset.

  • If I’m not mistaken, in the two ways you indicated, the first is the one recommended in HTML5.

  • @bfavaretto: Thank you, I will make that clear in my reply so as not to induce to best practices.

0

This tag meta charset="utf-8" is used to show the browser the type of encoding that will be used on your website

Browser other questions tagged

You are not signed in. Login or sign up in order to post.