What is the Unicode (BOM) signature?

Asked

Viewed 1,842 times

16

I noticed that sometimes Dreamweaver puts assinatura unicode (BOM) on some php’s pages, I have to remove most of the time so that there is no spacing on the display page and I’m not sure what it’s for assinatura unicode (BOM) you could explain to me?

2 answers

16


The BOM marker is an order indication of bytes in a text file, for each pair of 2 bytes, in the case of Unicode-16 and for the group of 4 bytes in Unicode-32.

This marker comes exclusively at the beginning of the file.

GOOD means "Byte Order Mark", that in Portuguese would be something like "Byte sorting tag".

EDIT

There is also BOM for Unicode-8, but its use is not recommended since there is only one possible ordering for characters composed by a single byte.

EDIT Print of wikipedia, showing the markers (since it is not possible to assemble tables here in the OS)

Tabela de BOM

Origin: http://en.wikipedia.org/wiki/Byte_order_mark

  • 3

    Byte Order Mark, byte order tag, not Binary Order Mark.

  • 3

    There you go...: https://www.youtube.com/watch?v=Sv8XehRa-cg (Sorry, I couldn’t resist).

  • @Bacco Huahuhua! Good... that is, the female of BOM. (just to get in the spirit of the thing =)

  • This GOOD is most used on Windows systems, right?

6

You can fix this issue as demonstrated below:

  1. http://www.melhorweb.com.br/artigo/5-Problemas-de-espacamento-e-acentuacao-no-Dreamweaver.htm
  2. http://forum.imasters.com.br/topic/511360-desativar-assinatura-unicode-bom/

WHAT IS GOOD?

The byte order mark (BOM) is a Unicode character used to denote the end (byte order) of a text file or data stream, with code U+FEFF. Its use is optional and, if used, should appear at the beginning of the text stream.

In addition to its traditional use, this character can also indicate in which of the different Unicode representations the text is encoded.1 Since Unicode can be encoded in both 16 and 32 bits, the Unicode text reader should know in what format the text being read is encoded.

Source: http://en.wikipedia.org/wiki/Byte_order_mark

  • 1

    "this character can also indicate in which of the different Unicode representations the text is encoded" I think not... if it finds a FF FE 00 00 how it could differentiate between a UTF-32 and a UTF-16 where the first character is null?

  • 1

    @mgibsonbr, you’re right about this ambiguity. In most cases the BOM is sufficient to determine the coding system used, and the generality of the tools interpolates the FEFF0000 as utf-32 (which gives a huge headache for the :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.