What are the metadata?

Asked

Viewed 798 times

10

I was researching about "sanitize" data and found this answer by @Maniero, who quotes:

"Delete text snippets in a data entry that you have characteristics of metadata, which may cause some security problem."

I’d like to understand what they would be metadata and what their application/usefulness, if any,?

  • 2

    Jeez, why the downvotes? d:

1 answer

16


Meta, Greek (μετά), means "behind" or "beyond". Metadata is information about a given.

Think of a photo taken from a camera:

The data itself is the image. It’s what you see above. Metadata could be:

  • What camera took this picture?
  • In what role was revealed?
  • Where this photo was taken?
  • Who took this picture?

It is information about the data. It is the goal, the behind and the beyond of the image itself.

It is clear that this is not a technology term, but applicable to any kind of data, object, photograph, software or thing you can imagine. That’s a term that is on the rise since the 1990s, with the popularization of computers and the internet.

Contextualizing

Disk files

To contextualize in information technology, let’s look at the metadata of a file:

  • File size
  • File extension
  • File name
  • Busy disk size

Every file in the operating system has metadata. It is stored on disk and takes up space. Therefore, a 1 kb file is actually 1 kb + the size of its metadata.

JSON Apis

One of the specifications for building Web Apis is the JSON API. Following this specification, suppose you have a paged product listing API:

GET /api/produtos

{
  "produtos": [ ... ],
  "meta": {
    "paginaAtual": 1,
    "proximaPagina": "api/produtos?pagina=2",
    "paginaAnterior": null
  }
}

See that there is a node meta which has no connection with a product, which is what that endpoint provides. It is about paging. It is the information beyond the data and about it.

HTML

On the Internet pages the scheme is no different. A page like your Facebook profile should contain your Facebook, your name, your photo and the like. In addition to all this data, there are a few more, dedicated to metadata.

That’s what the tag is for meta HTML. It is meant to be read by machines and not by an end user. It is information about the page.

<head>
  <meta charset="UTF-8">
  <meta name="description" content="Free Web tutorials">
  <meta name="keywords" content="HTML,CSS,XML,JavaScript">
  <meta name="author" content="John Doe">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>

Here, in this W3C example, the page metadata would be a description, the keywords, the page author and etc.

Databases

DBMS of relational databases also have metadata. A table of persons may contain entities with name, date of birth and CPF. The content is the entities. But if I ask the question, "what are the columns in the people table?" , this is a query about the metadata of this table.

Some examples of metadata from a database table:

  • The name of the table
  • The size of the table
  • The number of rows in the table
  • Table columns and their typing

These data, in fact, are stored somewhere. In SQL Server you can view them in the table sys.columns. An example:

SELECT name, column_id
FROM sys.columns
WHERE object_id = OBJECT_ID(N'Pessoa.Cpf');

If you want to read more about it, see Querying the SQL Server Catalog.

Programming languages

Metaprogramming deals with this part in languages. Maniero replied a my question about the difference between metaprogramming and reflection, worth reading. There he defined the metaprogramming briefly:

a paradigm that allows the manipulation of the more generally, you program how the code should be programmed.

An example of code reading itself in Ruby:

class Developer 
  def self.backend
    "I am backend developer"
  end

  def frontend
    "I am frontend developer"
  end
end

p Developer.class   => Class
p Class.superclass  => Module
p Module.superclass => Object
p Object.superclass => BasicObject

BMP image format

An interesting example is the format .bmp (bitmap image file) of images. If you observe the contents of that file in a raw way, you will see that it follows the following pattern:

You can see that a lot of that information is not the image matrix itself, but metadata of that image. As you mentioned there is no problem in sanitizing part of this information in many cases.

  • 1

    Mt good explanation. I already knew what it was in other things, but under the programming I was in doubt.

  • Excellent. Thank you!

  • I get it. These are things "beyond" what the eyes can see. Information that is not explicit but exists.

  • 1

    I like to define as data information.

  • Metadata is metadata about a given. But then I would have to ask another question "What is metainformation?" rss

  • 1

    On metadata, if you want to go deeper (if you are lazy I can open an answer with other examples complementing, but I see no need): a database to know at query level which are the tables, which are the columns within these tables, what type within each column, to which columns a foreign key refers to the primary class and how it maps in the class that points to the primary; if you have this access during the query, you are taking data about the database data. Semantic web also tried to use many metadata to make service discoveries, SOAP also

  • 1

    @Jeffersonquesado this!! I will complement yes. Thank you

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.