What is the best way to save files, Bank or File System?

Asked

Viewed 3,824 times

5

I have a demand to attach Pdfs to my application. The files have varying sizes, from 5 to 200 MB or more.

Here comes my question: what is the best option to store?

  • Direct in the database (Postgresql)
  • In the file system

What are the advantages and disadvantages?

  • I don’t know if this question can be answered objectively or not, but I thought it was a good question anyway! If you can elaborate a little more what do you mean by "best option" (faster? that takes less space? easier to maintain? etc), I think you will have more chances to get a good answer.

  • In the same database as your application start storing files that can reach 200MB, I wouldn’t do that. Maybe in a separate database.

  • A disadvantage of saving on disk is when owning a high traffic service that requires mirrors. Then you would have to mirror these files on disk as well. If they are in a database, it is easier to solve through re-application. In practice it is the same, but the difference is the work on implementation.

2 answers

5


The question seems to be off-topic, because it is too broad and even based on personal opinions, there is nothing "exact". Even so I will share an experience that I had, besides mentioning some things about the approaches.

Regarding your specific question, let’s first talk about persisting in Postgresql, advantages and disadvantages according to them own:

  • perks:

    • security and access control are simplified;
    • version control is easy;
    • ACID
    • backups are simple
  • disadvantages:

    • performance: depends on the performance of your filesystem;
    • greater need for memory for processing;
    • backups will be bigger, of course;
    • access by your application will be higher. Database access clients usually generate temporary files to access and modify the files. (Have you ever thought about doing this for a 200MB file? o_O)

What about using normal file system (solution from Scratch):

  • perks:

    • fast access and less overhead, but highly dependent on the below;
    • easier to manipulate than some DBMS;
    • does not lose performance as it grows (consider creating something that does not persist so many files in the same directory, for example)
  • disadvantages:

    • worry about backups, mirroring;
    • without transactional support, can generate garbage, you will have to control it;
    • in a home solution may have problem in simultaneous access;
    • worry about security issues (storage location of the files, who may or may not access, etc.);
    • worry about not storing too many files;
    • tend to get fragmented, access slower;

This above is always considering that the infrastructure is yours, that is, it is you who controls everything, because some of the problems are treated if you use something in the cloud, as memory and everything else. In a home solution to store in file system you have to worry about a lot, I see no reason to spend time on it nowadays. A reference on the Postgresql wiki is very good, consider reading and seeing the other references there =)

Now, the personal experience that can even solve problems mentioned before: if you can choose this, your environment supports consuming third party services, opt for services like Amazon S3 and Glacier. In my case, I use S3 for newer files and move them to Glacier after a certain "age", because of the same cost. In this case, I have persisted based only on the location where the files are saved, very performative and do not worry about backups, etc.

There are other services on the market, such as Google Blobstore and Google Cloud Storage and in Azure the Blob Storage.

This approach will solve problems mentioned before, such as access security, backups, scalability, etc, beyond what lets you focus on your problem, no need to reinvent the wheel.

If you can’t adopt things like this in your environment, consider adopting a life GED to make your life easier.

Might be of interest:

1

There is no correct answer to your question, there are many things to take into account, for example :

  1. the amount of files to be generated
  2. frequency of use
  3. security guard
  4. others...

However, I would do it this way:

  1. Files saved in directories
  2. Location of files saved in the bank

Tips: Use some encryption algorithm to generate random directory names, making it difficult for users to access other files. Also, do not forget to correctly set the access permissions to these directories.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.