Why is using "SELECT * FROM table" bad?

Asked

Viewed 16,426 times

60

It is often said to take all columns of a table through the command SELECT * FROM tabela is a bad practice. Any recommendation without explanation is not helpful. So...

If it is bad practice, there must be a way to use it properly. There must be a situation where it does not cause problems. If it were always bad, there would probably not even be the possibility of using.

  • What problems are caused by its use?

  • Do these problems always occur? Is there a way around? Or are there situations that make no difference?

  • Recommendations are valid for any database?

  • 4

    Briefly it is for you not to get more data than you need, if you want only the value id, no need to bring together another 50 columns.

8 answers

66


Not bad/bad practice to use wildcards (*). Depends on your goal.

I want to be a good programmer (probably your case)

A good programmer knows that or he will need all columns of a row (for example, in a "show all data" style listing) and will use * or you’ll use one or only a few (in a subquerie, for example) and will select ONLY those you need.

As a good programmer, you know which columns will be added to your tables, and you know the only way to make your system scalable - running without changes - is to use a wildcard (*). A framework In general, it uses BD metadata to find out which columns exist, or even has it in its settings. But this information is cached by someone, whether on a host running the applications, such as PHP, or the BD host. That task always stays with someone.

But in joins, where we join different tables, columns may appear with equal names (now and in the future), and the best way to do this is by specifying column-by-column, as we will use different "nicknames" for columns with the same name.

Short money

Well, you’re either someone who’s responsible for the project budget, or you’re aware that you should minimize money spent on the comic. There are servers that limit traffic in terms of numbers of darlings, and others limiting by amount of data trafficked. In general, you will only pay attention to * when you can’t traffic too many bits, or when the requirements survey step (pre-programming steps) established that there will be too many BD records or there will be intense network use.

Ah, if there is heavy network traffic, the number of darlings becomes a problem, because you save a lot on traffic data, but you don’t have much to do with multiple/concurrent access (hence you already enter into theories CHAP/ACID, but from there already escapes from the scope of basic programming and in general do not use Sgbds and use Nosql).

The price of the Internet is getting lower, and prices are falling - from BD servers; that way, you will less and less worry about how your system will access the BD and more on how it presents itself to the user.

Well, I want my client (or boss/my company) in my hand (you won’t be able to).

Well, in the past, some programmers had the bad faith to ignore the principles of programming and wanted to "keep the job/client". They thought that hindering changes in the system would be the way, and so they never/little used wildcard (*). Because this has changed a lot: what matters is the data that is in the comic book, and not the system that uses it.

If you study a little, you’ll see that it’s easy to find out everything about a comic - which you didn’t do - and that way you’ll know how the system should work. This is called reverse engineering. And yet, seeing the system working, there is reengineering, which is to make one system emulating another.

I mean, there’s no longer any dependence on the guy who created the query without wildcard, and you will be fired or will tarnish your reputation in the market with other customers.

In short

There is no foundation in this note currently (before, in the era of chipped bit, yes), and you will probably always use wildcard, except in joins, where columns with equal names can appear from different tables and treat this is more difficult.


I think the people who "negatively" didn’t understand. It all depends on the goal, and it can change. Today, I delivered a system to the customer and he is 100% suited to what he asked. But tomorrow it may not be (customer requests maintenance). If we always think of "a system prepared for the future", we will fill the solution with theorems, creating a heavy, large and complex software, which will give a negative note to you today. A robust software should be used for problems that require robustness, but not for cases where most likely the system will not have greater needs.

  • 3

    I think the people who "negatively" didn’t get it. It all depends on the goal, and it can change. Today, I delivered to the client and he is 100% suited to what he asked. But tomorrow may not be. If we always think of "a system prepared for the future" we will fill the solution with theorems, creating a heavy, large and complex software, which will give a negative note to you today. A robust software should be used for problems that require robustness, but not for cases where most likely the system will not have greater needs.

31

SELECT * FROM Tabela is considered a bad practice because the fields that should be designed for the return of the query are not specified.

Every time you use *, it is necessary that the database understands which are the fields existing in the table to assemble the correct query (a query in the table metadata). This begets a certain overhead to the database.

If you need to query values, theoretically you should know which values you want to return, this optimizes the query. The cost of returning 1, 2, or 15 fields are different, so it is good practice and, gain performance when informed the return fields of the query.

Example:

Suppose you have a table with 10 rows and 10 columns. You need to make one select in this table. Suppose:

SELECT * FROM Tabela10por10;

If each field in the table is 32 bytes in size, its return will be the size: 32 * 10 * 10 = 3200 bytes.

If what you need is just one field, then the best would be:

SELECT campo1 FROM Tabela10por10;

The result for this search is: 32 * 1 * 10 = 320 bytes. Only 10% of what you returned in the not optimized query.

To understand more:

To understand the optimizations of this type, it is necessary to understand the Relational algebra for database.

Edited

... outside the overhead database avoidance (query in metadata tables).

I had previously assumed that informing the fields you want to project in the query would avoid the query to metadata. Trinidad corrected that mistake. It is then stated, according to @Trinidad, that informing the fields you want to select/design does not prevent the DBMS query in metadata.

This begets a certain overhead to the database.

The one overhead always occurs, since the DBMS metadata is always consulted. According to @Trinidad.

Accretion

The semantics

As a complement to the answer, I believe that one of the biggest problem in the select * from Tabela is the semantic meaning of the query. When writing:

SELECT 
  Nome 
FROM 
  Cliente;

It is clear that what you want is the customer. Even in more complicated queries it is possible to understand the purpose of the query, such as:

SELECT 
  SUM(Total) AS TotalDaNota
FROM 
  ItemNotaFiscal;

The SELECT * does not inform us anything about the purpose of the consultation (the semantics - the meaning), which makes it very difficult, mainly, in the maintainability of the system.

  • 3

    You said it’s not good because the DBMS needs to query the metadata, but I’m sure the DBMS at all times consult the metadata, this explanation there sincerely is bullshit.

  • 1

    Yes, you’re right @Trinidad. The metadata query is always done by DBMS. I will correct my answer on this point, but everything else is correct. Thank you for the correction.

  • I think the main point of the answer is the addition. Performance is not the best point of this answer because it does not substantiate how and when relevant performance loss occurs. And it is not in all cases.

  • Maybe I’m confused, but why send one select * if you only want the campo1? I mean, if you send a select *, will return 10 fields.

  • There are few situations where you select only one field; for example, in Queries. Use only one field if even adding OTHER COLUMNS does not change the meaning. For example, if you are going to split a name field into two (first and last names), you will have to change the querie (in addition to the system) but if you have used wildcard(*) you only need to change it in the system (and depending on how the system was done, you do not need to). But if the other table fields generate large data traffic (see my answer), you will go back to field-to-field.

26

"What problems are caused by its use?"

The whole problem is that you’ll be (lazily) not specifying which columns you want from its clause SELECT:

  • Inefficiency by quantity: You’ll be manipulating a volume of data you don’t need. Think about the network transmission from the database to the application, or the data set generated in the application itself; you may be consuming more memory and more network bandwidth without realizing.
  • Inefficiency by indexing: Maybe you just need to use columns indexed by the bank; in this case if only these columns had been selected the operation would be less costly.
  • Confusion of column names: In his SELECT may be involved columns with the same name (very common in joins extended). This can cause problems not only for some human being reading the result of your query as well as some system interpreting this result.
  • The meaning of the query is not explained: It becomes clearer to someone else (or even to yourself later) what that query is doing. A simple example is: imagine that a column is to be removed from the database. The usual way is to pass query by query seeing which ones use such column; in the case of these queries with asterisk not only check the query, but it becomes necessary to check all manipulation done with the selected result (checking whether the same column is used).

"Or there are situations that make no difference?"

The only situation I’ve ever heard of that makes no difference is in the case of:

SELECT COUNT(*) FROM ...

But I have also heard the opposite (which makes a difference). Maybe this behavior is different from SGDB to SGDB (maybe even varies between versions of it).

I wouldn’t worry about small tests; in production I always used SELECT COUNT(0) FROM ... as a precaution.

"Recommendations are valid for any database?"

When we speak of the main Sgdbs with certainty yes; it is rare to run away from it.

But nowadays there are a multitude of database systems; it is convenient confer when this type of doubt arises which exact behavior of the database you are using.

  • 4

    Just to be clear, COUNT(*) is something totally different to use *.

13

Many have already given the most important reasons:

  • impairs performance because it returns more data than necessary, including those that are not indexable.
  • is less readable, it is not known which data are really relevant in return.
  • does not explain which column name, which is important when making joins. Dbms usually know how to handle this, but it is less clear.

There is, however, another very important reason:

  • The code that uses the search result may, in some scenarios, be coupled to the order of the columns in the database.

If you have the table below...

CREATE TABLE pessoa (
    id       INTEGER PRIMARY KEY,
    nome     VARCHAR(100),
    endereco VARCHAR(200)
);

...and you use an API that returns the result sets as a numerical array, as below...

ResultSet results = query.execute('SELECT * FROM pessoa');

for (Result result : results) {
    String name = result.get(1);
    String endereco = result.get(2);
    // ... mais código aqui
}

...your code will depend on whether the first column of the table is id, the second is name and the third is address. This can, however, change; for example, so;

ALTER TABLE pessoa ADD COLUMN sobrenome AFTER nome;

Ready: your code above no longer works.

This might not be as important if the results work as dictionaries/maps - String nome = result.get('nome'); - but still the above scenario is important.

(Note that almost everything here is pseudocode, I do not guarantee that the code runs :) )

10

What problems are caused by its use?

From the technical point of view there is loss of performance because:

  • may require reading more information on mass storage that is slower
  • may require that a query that would just pick up index data has to do lookup in the main table and read this
  • can increase memory consumption and decrease chances of caching other things (yes, this can affect the performance of other darlings)
  • The increased memory consumption may require that the selection of results be played on the mass storage temporarily, which is much more expensive. And it can inhibit the use of certain more performatic algorithms when there is a risk of not being all in memory
  • the transmission of so much data is more time consuming and can cost more if the network has measurement (go, cloud! :) )

But there are cases, rare, that can occur the opposite and a query that could be reused is not because it takes fewer columns. So it’s not an absolute truth.

In terms of performance what is the difference between using the * and list all the columns? I doubt it’s anything other than zero, and if it does, it’s derisory, it’s just because of the metadata query form.

Real problem

But actually the biggest problem is not technical, it’s conceptual: it’s don’t think about what you’re doing. It’s bad to use * because people do it out of laziness. If she knows what she’s doing, what the implications are, if it’s what she wants, it’s okay for someone to use it.

And the bigger problem is that people don’t know the full implications of its use. In addition to performance, she does not understand that this makes her application dependent on the database structure. If it changes the order of the columns in the table or if the database does it differently for some reason, it may not work anymore (It is true that most applications are very bureaucratic and do not even use sophisticated mechanisms that would be affected by this). Of course it depends on the application. When you misconception something can go wrong in the future.

You may look like an "asshole," but you want to all columns database is semantically very different from wanting each of the columns, without exception. Use what you want. Of course, if a person fails to interpret text, they can’t decide what to do properly. If a person is not detail-oriented, he makes mistakes he doesn’t even realize.

Of course there are contexts that doing one way or another will change nothing. Like every recommendation that seems "good practice" needs context.

What I understand is that if the person does not know what they are doing, by default should not use the *, because it tends to do more damage than use. But the people who most need to follow the safest standard by default are just the ones who will least opt for it.

So using the column names is as bad as asterisk if you don’t think about what you’re doing.

Do these problems always occur? Is there a way around? Or are there situations that make no difference?

The technical question even occurs virtually always, but in low volumes it is tiny and even if it is proportionally large, the whole is low and does not matter much.

The clear concept that always occurs, just does not mean that it will bring real problems always. Some people are lucky. Or else they work with other problems so much more serious that this becomes irrelevant.

Recommendations are valid for any database?

The conceptual ones, yes, of course.

The technique does not necessarily, although it is highly probable.

I answered because I think something like this was missing and deep down I wanted to know. Today I have more confidence that this is the right thing to do.

6

It really is better to use the names instead of *, but this does not mean that the command cannot be used or that it is bad. The * is very useful when you make a query just to query directly in SGBD (Database Manager System).

Now if you are going to work with this data with a programming language it is much better to put the fields for performance and organization issues. (This doesn’t mean you can’t use one * (all) ), because in very small systems it can be much easier to put the * than all the fields you want (Not that it’s the best solution).

Now if you are in a company, or if you have done a serious project it would be impractical to put all the fields since you will not use all.

The * is used to make direct queries on SGBD, facilitates the time you type and is very useful.

inserir a descrição da imagem aqui

Now from the time you make queries on a system is much better a query that returns data faster.

 select Host, User, Password from user

This way the queries will return much faster in addition to the fields being easier to visualize (that this second option is more a personal freshness, does not interfere in the performance in the system, only in the organization of the code). The only relevant reason for not using the * is that not using the data returns much faster, improving the system performance.

5

When you use a clause SELECT, you can specify the list of columns or expressions:

select Coluna, Coluna, ..., Coluna from Tabela

Or * (asterisk) as the selection list:

select * from Tabela

Avoid the use of an asterisk (*) in instructions SELECT, mainly due to reasons for performance, as the search in all fields will be costly as the table size increases.

Use this during development/testing, to view part of the table content for developing queries, using for example the TOP to display only the desired amount of records:

SELECT TOP (100) * FROM tabela
  • 1

    TOP controls the amount of data vertically, while avoiding the use of * horizontally. The idea of the table growing is more significant for conditions without WHERE and with TOP than in relation to the *

  • A homemade analogy, select * would be to bring the soup, the ladle of the beans, the glass of wine, etc for breakfast, if not use do not bring ...

0

They say they use SELECT * FROM tabela is bad because normally you do not need all columns and not all rows of the table.

If the table has few rows, using query like this has no burden, but see this example: The guy has a table of over 10,000 records, with 25 columns, makes SELECT * FROM tabela to click on a Javascript component, which displays only 4 columns, and display on the screen. This is an error.

Why?

  • Why a person doesn’t view 10,000 records at the same time.
  • Because the user browser will crash.
  • Because the cost of bringing 10,000 server records to the person’s machine will make the execution slow.
  • Why bring all the table fields, when normally you want only 3 or 4, will be even slower.

How to get out of this impasse?

  • Include in place of the asterisk the names of the fields you want.
  • Use the clause WHERE to filter and bring only the records that matter.
  • Create indexes for columns used in clauses WHERE.
  • Paginate the data using LIMIT, passing the start point and the amount of records that should be brought in the query.
  • In fact to be able to have burdens always, even if it is of little relevance in small volumes. This occurs mainly when the columns that really matter are in the indexes, thus eliminating the search in the table. It also affects because it can make optimization difficult. See the other answers. The asterisk only affects the columns, not the rows.

  • Yes, I know the asterisks only affect the columns, but as no one said anything about not including any filter in select and this can bring more records than necessary I found interesting to include in the answer. Including a filter with WHERE to reduce the number of rows returned, and LIMIT to page the records, in my view, is more relevant to improving performance than reducing the number of columns. The first time I read the question I did not notice that it was referring exclusively to * for the columns and so my answer became more general for optimizations.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.