Guarantee to keep a file continuously
nothing that guarantees that my file will be written on a single disc block
Generally is not possible, at least not only in its code in a simple and universal way.
Most file systems cannot give this guarantee. At least none I know. In some it is possible to access the system at a lower level to check where the sectors are and reserve them for that file. Besides being something complicated to do, requiring special privilege, the performance would be terrible, where the worst case would sweep the entire storage device and maybe not find a continuous block. Could still manipulate to try to create a continuous block, ie you would have to create a defragmenter within your application. Madness. It has a filing system that won’t even allow it. Can have some system that gives that guarantee by controlling the size of all free blocks and providing this information.
What you can do is reserve the space for the file and not increase its size. Depending on the case you have a chance of getting a continuous block. The fact that you no longer mess with the file size will not fragment it later.
Another trick would be to decrease the size of the partition and increase again if the system allows and if you can do this without losing data and reach the size you want. The new area should be continuous. But it’s crazy.
Another solution is not to use a file system or at least to have a partition just for your application. If you can guarantee that, you can control the allocation. It’s very rare to have any application that needs it all.
Depending on the device having a continuous block does not change anything in performance. It’s the case of the SSD that is the storage gift that needs performance, the only reason I imagine the block needs to be continuous.
Small blocks
What you may have read is that these functions read or write at once on each access, you do not need to access each byte individually. You always access every cluster individually, then record 1 byte or 4096 (the most typical size nowadays) gives in it in performance. These 4096 (this may vary, but it should not be less than 512 which is the size of the sector formerly, or even 4096, the size of the sector in more modern devices, and not usually more than 65536) will always be continuous. If the file has a size up to the size of the cluster, then it will be continuous. If you have more than one cluster, even if recorded at the same time, nothing guarantees that it will be continuous.
If you want guarantees that everything will be recorded continuously ensure that it fits the size of the cluster of that partition.
found places saying it is possible to accomplish using read
, pread
, write
and pwrite
from POSIX, but nowhere has it been said that it is guaranteed that everything will be read/written in a block
'Cause it’s not guaranteed at all unless that block fits into one cluster. There not only is it possible, but you don’t need to do anything.
Why it needs to be continuous?
I can only imagine one reason: performance. If you have another reason, the question does not speak, but it is almost certain that it would not be an important reason.
In fact on disks it is important that the data is continuous if there is a certain pattern to access them sequentially. The mechanics of the disk may require scrolling of the read head, or even recording, to go to the next cluster to be accessed, may even require a new turn of the disk. If all is together this is avoided. Note that disk firmware often perform optimizations in the queue to try to avoid waste, but if it is not continuous it will always have some inefficiency.
Of course there are access patterns that are essentially random. There being all continuous will not bring benefits. It is somewhat the case of traditional databases. Not entirely because even it has sequential access standards in some cases. It does not fit here a complete explanation of how a database works, including by detailing each one works in a way, despite maintaining a general pattern. Each database is different precisely to try to make the most of what is available.
If using an SSD this need is greatly diminished, especially in database.
When accessing data on tapes it was sequential. The disk made the access part sequential, part random. SSD made access really random.
There is some gain in being sequential in the SSD because there is a cost in the access process that is minimized if the access is sequential. This cost has been decreasing with better firmwares, and is small.
Databases by doing essentially random accesses fit very well with SSD and there is little gain in being all continuous. In most scenarios a database will spread the data everywhere and trying to be continuous doesn’t help much. I’m not saying there’s zero help.
And of course I’m not talking about logs, they benefit from being all continuous, but less than can be imagined, since his normal is only to write, and in general in small portions of data, smaller than the cluster. But in SSD the gain is low even in this scenario.
Just remembering that a considerable part of the database work is not access to storage devices.
All standards benefit from the adoption of the SSD, some brutally, and mostly it allows people to think less about having to find the best access solution because it is already approaching the best possible. And I’m not even thinking about the RAM-based or Nvrams-based Ssds that are still very expensive, but that’s the solution if you want maximum performance.
I have already spoken on the subject at Speed difference in HD Sata/SSD.
And I’ve talked about fragmentation in Hard disk defragmentation can assist in the performance of my server?
Database
They need less continuous allocation than one thinks because their normal is fragmentation. Its main data structure is the tree that exists precisely to facilitate fragmentation.
Some, such as SQL Server or Oracle (as LINQ comments) make privileged access to the operating system to obtain some guarantees, but even they do not perform miracles.
In general it does not compensate except in very complex scenarios. So much so that some of the faster databases do not do any of this and use a traditional way (some accessing by memory Mapped file).
Completion
If you need the cluster individual be recorded in a unique way, the operating system already does it for you. If you need more than one cluster continuous, there are no guarantees in normal scenarios (use what already exists ready), but this is not necessary in most cases, especially if using SSD.
Note that recordings are atomic, or record everything, or record nothing in a single operation requested for the file system. Separate operations are not atomic unless you use mmap
, in the right way.
What do you call a block? Cluster? It has to see the nomenclature right, because the meaning changes a lot according to each concept.
– Bacco
Sorry, the only nomenclature I came across is this and I never thought there was any different to "disk blocks". The following explanation: http://stackoverflow.com/questions/12345804/difference-between-blocks-and-sectors
– thomaz andrade
That would be clusters then. Now the question is, did you even understand what that link your speak? To fit a file in a block, it has to be a very small file. The standard block of most Oses today is 4096 bytes.
– Bacco
Sorry I’m not understanding, clusters have nothing to do with blocks, sectors and disk. Following explanation: https://pt.wikipedia.org/wiki/Cluster
– thomaz andrade
Edit’s answer: I know this is the default, and I also know that it is possible to ask the OS in the program for the size of the block, but if I write a 4096bytes file (and subtract the part of the Metadata), how I guarantee that everything will be written in a block?
– thomaz andrade
The metadata is usually not in the block (which is the same as cluster - in general terms, because you have not defined what filesystem you are dealing with). If you write 8192 bytes, half is in the 1st block and half in the 2nd;
– Bacco
I made the first comment precisely to clarify the terminology, to understand what you want to know exactly. It seems to me that the question started precisely from this confusion of terms, which varies a lot of discussion groups from one OS to another, one distro to another.
– Bacco
I understand thanks for the clarification! So writing using the example you did, if I write a file of 8192 bytes it will be written using 2 blocks only? It doesn’t depend on how I wrote? (Using C fwrite or Posix write etc...)
– thomaz andrade
Who determines this is the filesystem (ext3, NTFS, reiserfs, FAT), and not the language or API. If you want to write with control over what falls on each block, you need to do something lower level. In the most common systems, you just have to keep a multiple sense. But between us, first of all, you need to see if this brings you any gain at all. In general, about the way the file will look on the disk is what @Bigown said. I think to be complete (after the extra hints here) just mention the multiples.
– Bacco
Only that I think you should edit and explain the question better then, because I could only confirm the doubt after clarifying with you in the comments, and it is ideal that the question is self-sufficient. If you think you really need to hit with most Fss and disks, do 512 multiples, but even the Hds are abandoning this measure and going to a kind of "big sector" of 4k too
– Bacco
Thanks for the reply and help, I will mark @Bigown’s as correct even if I don’t agree. Sorry for your time!
– thomaz andrade
I think it’s really cool you read what he explained, mainly about SSD.
– Bacco
I don’t get it, you’re assuming I didn’t read it?
– thomaz andrade
I imagine that you have read, by the comments, but now with a passed on in the concepts, it may be that you see better how it looks in the context that was explained. But it was just a suggestion, of course. I don’t know how important the adjoining file side is to what you’re doing. A curiosity, the DBF files of the old Dbase and Clipper yielded well aligned in 512 bytes, pq at the time, the sector reading was well "expensive".
– Bacco
Thanks for the help and patience, now with the clarification I managed to understand why the question is not coherent, but I hope you understand that it is difficult to write one on a subject you are trying to learn. I preferred your approach trying to understand and help me the question that starting an answer with no background, at least even rereading the @bigown answer I could not understand where I would have to read or seek to know about the subject, already coming from your comment I know I should seek to learn about filesystems and what they allow Apis to do. Thank you very much!
– thomaz andrade