How to measure the amount of disk space required for an ALTER TABLE operation?

Asked

Viewed 201 times

1

I need to make some changes to one of the tables to add new columns. The problem is that I cannot do it. I get the following error:

PG::Diskfull: ERROR: could not extend file "base/1672994/5557167.4": No space left on device. HINT: Check free disk space.

Well, there is no doubt that I need to increase my disk space, since not even an operation VACUUM FULL worked, presenting this same problem.

To query what I tried to do is this:

ALTER TABLE messages ADD COLUMN readed BOOLEAN DEFAULT FALSE;

It’s a simple field, and Boolean doesn’t take up much space. But reading on other forums, I saw something related to the fact that PG needs more space to do an operation like this. I’m not sure how it works, but this table I’m trying to change has over 6,700,000 lines, and taking that into account, it might make sense that the current space isn’t enough for the operation.

But anyway. What I want to know is: Is there a way that I can calculate on average how much extra space I’ll need to buy for this operation to work? And how does this question of ALTER TABLE? It really takes a lot more room to work?

  • I understand that as Postgresql uses MVCC (Multiversion Concurrency Control) it will generate a copy of the entire table by adding the new field, and only at the end remove the old table. Maybe a pg_dumpall (on another storage medium), modify the table definition in the generated file, delete the database and recreate it so psql can be a solution.

1 answer

2


It is difficult to give a definitive answer on this and trial and error turns out to be an effective method to figure out how far to go.

It’s actually kind of weird to need so much space because the way you’re doing it shouldn’t change the whole table. Postgresql has a mechanism in which it changes the lines as it has written on it. While there is no writing, she takes the line without the column and considers the value default to the query reading. It’s pretty smart. But for some reason it doesn’t seem to be working.

The basic idea is to see the table size. To list the tables:

SELECT pg_database.datname, pg_database_size(pg_database.datname),
    pg_size_pretty(pg_database_size(pg_database.datname))
FROM pg_database ORDER BY pg_database_size DESC

I put in the Github for future reference.

This may not be enough since there may be changes in the index as well or need other manipulations that only analyze the internal functioning of Postgresql to be sure.

  • You mentioned that there may be changes in the index... In this table there are 15 indexes. They are about INT, DATE, and TIMESTAMP fields. All of them are bTree. This influences?

  • I think it won’t even change, because it only makes sense to change the primary index, which should be clustered, I mean, it’s the table itself. I only put it because there may be something I don’t know and you have information that may be more than the table size. You have to test. Start by releasing the size of the table. Although as I said, it’s strange that you even need this.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.