How does a machine identify the type of data?

Asked

Viewed 128 times

9

A little while ago I had a question about how a machine defines/identifies the data type. I mean, when we’re a high-level application we have the definitions that that data can be a integer, string, etc. However, "under the table" as it would look?

I imagine that for each language there is a different way in the data structure and the amount of bits/bytes stored for each type. But how this is done in a general way?

  • I believe that does not identify, this is only the same language that uses, in the case of compiled, generating a bytecode understandable by the lower levels.

  • 1

    Each language implementation is free to do as you wish. A simple possibility is for the object to maintain a field that is a reference to the class from which it was created-whether that reference is a memory address or a qualified name (e.g. com.meudominio.minhaaplication.meupacote.Minhaclasse ). In the case of primitive types, it is sufficient to check only during compilation, as there are no complications like polymorphism.

  • "[...] generating an understandable bytecode at lower levels." - Still lower levels of the language itself, you meant, @diegofm?

  • 1

    Summarizing what was said here: A machine only interprets the following: "It has information in a memory region that goes from x to y region"<br> It doesn’t know more than that. The machine (I understand you’re talking about hardware) doesn’t know "oh, that’s an integer, that’s a string". In C you can see this when you start playing with pointers.

  • Exactly what ,as I mentioned, imagined, @Felipediniz

1 answer

8


The machine does not identify anything. Typing is an abstract concept existing in high-level languages. "They" choose the types and rules for them. For the computer there are only a lot of bits, it expects that there are some sets of them forming bytes or words (see below).

Concept of language

Of course no one will invent anything that is not intuitive, that runs away from the standard that creates difficulties for the machine to manipulate.

In specification the language determines this. When someone creates a programming language will think of the types that can be considered primitive or scalar, the derivatives that form a homogeneous or heterogeneous set of other types and even whether the user of it can create its own types.

As first languages were too simple and didn’t even have to worry about it. More complex problems and larger codes required typing.

What exists are bits, the rest is abstraction, although this term is complicated to use since there is always a level of abstraction, until the bit is one of them.

Common types

The primitive types try to take advantage of the processor. Then it is common for an integer to be the size of the processor register, so it can compute atomic and fast. I mean, it’s common to be a word (technical term).

Some processors have specific instructions for a certain data format and their operation varies. This can help define a type abstractly. These instructions expect some specific format. You can build a data bit by bit, but at the end to work with these instructions it needs to be in the expected format, if it is not the result will not be the expected and may even raise some problem signaling in some cases, but not in most.

I will use common type names, but that can vary from language to language.

Integer

It is a sequence of bits (the most common is 32) that can have one of them indicating the signal and the others the number in power of 2. The normal is to find that the bits of the direct are the least significant and those of the left are the most significant, as we do with decimal numbers. This actually occurs on architectures big-endian, like ARM. But architectures like Intel use little-endian, the most significant comes last, which is counter intuitive to us, but is advantageous to the mechanism. This is known as endianess.

Little EndianBig Endian

Behold details on Wikipedia.

Float

Modern processors have special recorders for certain operations, such as binary floating point calculation. This gives a good type, but even when this does not exist the type is useful because the way your bits are arranged has its own characteristics.

Floating Point

Behold details on Wikipedia.

Vector

There are also registers and special instructions for data vectorization. Few applications make use of this and it was common for the language not to provide direct support for them, but more and more in libraries or compiler optimizations.

In theory the language might not have any type and leave everything to library, but there may be loss of performance doing so.

Byte

The byte exists because it is the smallest possible unit of information that is stored directly in memory (8 bits). It is often confused with the character, but that hasn’t been an absolute truth for a long time.

Bit

Boolean is usually a byte. But if you have a sequence of them you can treat as a bit and store up to 8 per byte.

Shorts

Other integer types are created to generate memory savings and not occupy bits beyond what is necessary that the maximum value it should load.

Long

There are types that need to go beyond the register size, it is not ideal, but there are semantic reasons to do it. In some cases larger types may fit in the register.

Pointer

Another common type is the pointer, after all having an address of where is information is one of the most important things there. He doesn’t stop being a whole, there’s nothing special about him except what you expect him to do.

Behold details on Wikipedia.

Primitive compounds

Many types are formed by some sets of bits, where each has some meaning, carries a specific information.

Date, for example [is only a specific way to use an integer type or a set of integers, or even with bits sets holding each information, the specific format varies.

Decimal type or monetary is the same. These are quite different from floating point types and each language has a different way of assembling them. Some have decimal types of different sizes.

Composite types

Nonscalar types, called compounds, are formed by primitive types, either in sequence (array - [Wikipedia], for example), be in conjunction with other types (structure, class, for example) or the Abstract Data Types. Including the string.

Each language or library can have its own way of manipulating a string, but the basic thing is to have a pointer (address) and a sequence of bytes (characters).

The language and mainly the standard library will create new types according to the need or convenience you want to provide to the programmer.

Then you’ll have much more complex things like Stream, HashTable, Window, Cliente and any other class.

Type system

In the language of dynamic typing there are usually several types for the data, only the variable has no fixed type, so it needs a structure to control it, after all in memory, the register has a specific bit format.

In strong typing languages a bit sequence can only be interpreted as a type, if you want it to be another type you have to convert, possibly by saving the result elsewhere. In weak typing language, according to need you can consider those bits as one thing or another. In some languages it is common to make a pointer read as an integer, just to stay in the basic example.

There are monotypic languages or even no type - access everything in a raw way. Of course it is not successful.

Complement

Has a question in specific Java OS.

  • Understood, bigown. Although there is a basic "rule" for defining types, each language will be in charge of how this definition will be made. Correct?

  • Yes, it can, but in practice it doesn’t happen with the most basic types.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.