How to Ensure Writing File on a Single Disk Block?

Question

How to Ensure Writing File on a Single Disk Block?

Asked 8 years, 7 months ago

Viewed 156 times

8

Performing several searches in a few days I found nothing that guarantees that my file will be written in a single block of the disk (I know that block is only a metaphor for sectors of the disk created by only).

From what I realized to accomplish this I can not use a filesystem as the Ansi of C. Looking for more found places saying that it is possible to perform using read, pread, write and pwrite of POSIX, but nowhere has it been said that it is guaranteed that everything will be read/written in a block.

Is there any way to do it?

1

What do you call a block? Cluster? It has to see the nomenclature right, because the meaning changes a lot according to each concept.

– Bacco

2016/12/28 at 16:23
Sorry, the only nomenclature I came across is this and I never thought there was any different to "disk blocks". The following explanation: http://stackoverflow.com/questions/12345804/difference-between-blocks-and-sectors

– thomaz andrade

2016/12/28 at 16:57
1

That would be clusters then. Now the question is, did you even understand what that link your speak? To fit a file in a block, it has to be a very small file. The standard block of most Oses today is 4096 bytes.

– Bacco

2016/12/28 at 17:12
Sorry I’m not understanding, clusters have nothing to do with blocks, sectors and disk. Following explanation: https://pt.wikipedia.org/wiki/Cluster

– thomaz andrade

2016/12/28 at 17:13
Edit’s answer: I know this is the default, and I also know that it is possible to ask the OS in the program for the size of the block, but if I write a 4096bytes file (and subtract the part of the Metadata), how I guarantee that everything will be written in a block?

– thomaz andrade

2016/12/28 at 17:15
1

The metadata is usually not in the block (which is the same as cluster - in general terms, because you have not defined what filesystem you are dealing with). If you write 8192 bytes, half is in the 1st block and half in the 2nd;

– Bacco

2016/12/28 at 17:19
I made the first comment precisely to clarify the terminology, to understand what you want to know exactly. It seems to me that the question started precisely from this confusion of terms, which varies a lot of discussion groups from one OS to another, one distro to another.

– Bacco

2016/12/28 at 17:21
I understand thanks for the clarification! So writing using the example you did, if I write a file of 8192 bytes it will be written using 2 blocks only? It doesn’t depend on how I wrote? (Using C fwrite or Posix write etc...)

– thomaz andrade

2016/12/28 at 17:25
1

Who determines this is the filesystem (ext3, NTFS, reiserfs, FAT), and not the language or API. If you want to write with control over what falls on each block, you need to do something lower level. In the most common systems, you just have to keep a multiple sense. But between us, first of all, you need to see if this brings you any gain at all. In general, about the way the file will look on the disk is what @Bigown said. I think to be complete (after the extra hints here) just mention the multiples.

– Bacco

2016/12/28 at 17:32
2

Only that I think you should edit and explain the question better then, because I could only confirm the doubt after clarifying with you in the comments, and it is ideal that the question is self-sufficient. If you think you really need to hit with most Fss and disks, do 512 multiples, but even the Hds are abandoning this measure and going to a kind of "big sector" of 4k too

– Bacco

2016/12/28 at 17:33
Thanks for the reply and help, I will mark @Bigown’s as correct even if I don’t agree. Sorry for your time!

– thomaz andrade

2016/12/28 at 17:38
1

I think it’s really cool you read what he explained, mainly about SSD.

– Bacco

2016/12/28 at 17:39
I don’t get it, you’re assuming I didn’t read it?

– thomaz andrade

2016/12/28 at 17:42
I imagine that you have read, by the comments, but now with a passed on in the concepts, it may be that you see better how it looks in the context that was explained. But it was just a suggestion, of course. I don’t know how important the adjoining file side is to what you’re doing. A curiosity, the DBF files of the old Dbase and Clipper yielded well aligned in 512 bytes, pq at the time, the sector reading was well "expensive".

– Bacco

2016/12/28 at 17:45
Thanks for the help and patience, now with the clarification I managed to understand why the question is not coherent, but I hope you understand that it is difficult to write one on a subject you are trying to learn. I preferred your approach trying to understand and help me the question that starting an answer with no background, at least even rereading the @bigown answer I could not understand where I would have to read or seek to know about the subject, already coming from your comment I know I should seek to learn about filesystems and what they allow Apis to do. Thank you very much!

– thomaz andrade

2016/12/28 at 18:24

Show 10 more comments

2 answers

7

Guarantee to keep a file continuously

nothing that guarantees that my file will be written on a single disc block

Generally is not possible, at least not only in its code in a simple and universal way.

Most file systems cannot give this guarantee. At least none I know. In some it is possible to access the system at a lower level to check where the sectors are and reserve them for that file. Besides being something complicated to do, requiring special privilege, the performance would be terrible, where the worst case would sweep the entire storage device and maybe not find a continuous block. Could still manipulate to try to create a continuous block, ie you would have to create a defragmenter within your application. Madness. It has a filing system that won’t even allow it. Can have some system that gives that guarantee by controlling the size of all free blocks and providing this information.

What you can do is reserve the space for the file and not increase its size. Depending on the case you have a chance of getting a continuous block. The fact that you no longer mess with the file size will not fragment it later.

Another trick would be to decrease the size of the partition and increase again if the system allows and if you can do this without losing data and reach the size you want. The new area should be continuous. But it’s crazy.

Another solution is not to use a file system or at least to have a partition just for your application. If you can guarantee that, you can control the allocation. It’s very rare to have any application that needs it all.

Depending on the device having a continuous block does not change anything in performance. It’s the case of the SSD that is the storage gift that needs performance, the only reason I imagine the block needs to be continuous.

Small blocks

What you may have read is that these functions read or write at once on each access, you do not need to access each byte individually. You always access every cluster individually, then record 1 byte or 4096 (the most typical size nowadays) gives in it in performance. These 4096 (this may vary, but it should not be less than 512 which is the size of the sector formerly, or even 4096, the size of the sector in more modern devices, and not usually more than 65536) will always be continuous. If the file has a size up to the size of the cluster, then it will be continuous. If you have more than one cluster, even if recorded at the same time, nothing guarantees that it will be continuous.

If you want guarantees that everything will be recorded continuously ensure that it fits the size of the cluster of that partition.

found places saying it is possible to accomplish using read, pread, write and pwrite from POSIX, but nowhere has it been said that it is guaranteed that everything will be read/written in a block

'Cause it’s not guaranteed at all unless that block fits into one cluster. There not only is it possible, but you don’t need to do anything.

Why it needs to be continuous?

I can only imagine one reason: performance. If you have another reason, the question does not speak, but it is almost certain that it would not be an important reason.

In fact on disks it is important that the data is continuous if there is a certain pattern to access them sequentially. The mechanics of the disk may require scrolling of the read head, or even recording, to go to the next cluster to be accessed, may even require a new turn of the disk. If all is together this is avoided. Note that disk firmware often perform optimizations in the queue to try to avoid waste, but if it is not continuous it will always have some inefficiency.

Of course there are access patterns that are essentially random. There being all continuous will not bring benefits. It is somewhat the case of traditional databases. Not entirely because even it has sequential access standards in some cases. It does not fit here a complete explanation of how a database works, including by detailing each one works in a way, despite maintaining a general pattern. Each database is different precisely to try to make the most of what is available.

If using an SSD this need is greatly diminished, especially in database.

When accessing data on tapes it was sequential. The disk made the access part sequential, part random. SSD made access really random.

There is some gain in being sequential in the SSD because there is a cost in the access process that is minimized if the access is sequential. This cost has been decreasing with better firmwares, and is small.

Databases by doing essentially random accesses fit very well with SSD and there is little gain in being all continuous. In most scenarios a database will spread the data everywhere and trying to be continuous doesn’t help much. I’m not saying there’s zero help.

And of course I’m not talking about logs, they benefit from being all continuous, but less than can be imagined, since his normal is only to write, and in general in small portions of data, smaller than the cluster. But in SSD the gain is low even in this scenario.

Just remembering that a considerable part of the database work is not access to storage devices.

All standards benefit from the adoption of the SSD, some brutally, and mostly it allows people to think less about having to find the best access solution because it is already approaching the best possible. And I’m not even thinking about the RAM-based or Nvrams-based Ssds that are still very expensive, but that’s the solution if you want maximum performance.

I have already spoken on the subject at Speed difference in HD Sata/SSD.

And I’ve talked about fragmentation in Hard disk defragmentation can assist in the performance of my server?

Database

They need less continuous allocation than one thinks because their normal is fragmentation. Its main data structure is the tree that exists precisely to facilitate fragmentation.

Some, such as SQL Server or Oracle (as LINQ comments) make privileged access to the operating system to obtain some guarantees, but even they do not perform miracles.

In general it does not compensate except in very complex scenarios. So much so that some of the faster databases do not do any of this and use a traditional way (some accessing by memory Mapped file).

Completion

If you need the cluster individual be recorded in a unique way, the operating system already does it for you. If you need more than one cluster continuous, there are no guarantees in normal scenarios (use what already exists ready), but this is not necessary in most cases, especially if using SSD.

Note that recordings are atomic, or record everything, or record nothing in a single operation requested for the file system. Separate operations are not atomic unless you use mmap, in the right way.

Got it, but let’s put a program as a sgdb, doesn’t it have this control? If yes how does it do it? It’s not a special program with special privileges. What I want is not the address and not to guarantee sectors and yes blocks, to mess with sectors would have to create a disk driver and this would not be universal.

– thomaz andrade

2016/12/28 at 15:22
Not in general. Of course, SQL Server might access something undocumented from Windows. It is possible that some can do what I said and go over the file system, but it is rare to do this, in general it does not compensate the effort. One of the reasons it is said that it is good to put a DB in SSD is precisely because fragmentation will not change anything. Although DB fragments itself, it’s almost silly that he cares about it (he has some useful cases).

– Maniero

2016/12/28 at 15:32
I will not mark the question as correct because it does not answer the question. I would not like you to have assumed that I did not want the effort, for everything there is a reason and at no time I told for what application I wanted this information. I do not use windows and mostly sql server, for anyone who has the same doubt mysql uses C Ansi so I will use the same.

– thomaz andrade

2016/12/28 at 15:57
Do you want someone to explain everything you have to do to get it? Then the question is too wide. I don’t know what ANSI C has to do with it.

– Maniero

2016/12/28 at 16:01
The question is clearly "Is there any way to do it?" and not that anyone explains everything they have to do. Don’t answer questions if you don’t know the answer. Using bd on ssd is not recommended because the purpose and creation of it was simply for the reason of avoiding access to disk since random access can be very costly. For ssd perhaps the sgdb hinders the same, since the random access is fast and constant. I don’t know any studies in the sgdb area specifically for ssd. Thanks anyway for the time.

– thomaz andrade

2016/12/28 at 16:05
1

Then it was answered what you asked. If you did not like the answer, I can only regret.

– Maniero

2016/12/28 at 16:11
1

Just complementing - in fact, some Dbms do this using a "raw" partition - no filesystem or Oracle that I know of - but not even Postgresql - and here it’s well explained why it’s not worth it: http://dba.stackexchange.com/questions/80036/is-there-a-way-to-store-a-postgresql-database-directly-on-a-block-device-not-fi/80046#80046? newreg=e669562317fa44ac8bfac6f53441a764

– jsbueno

2016/12/29 at 01:39

Show 2 more comments

Browser other questions tagged c filing-cabinet posix

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2016-12-29T01:48:41+00:00

"Is there any way to do it?"

Not.

Unless you use a raw partition and implement, in practice, your own file system.

On Linux systems, partitions can be accessed as normal files in the /dev folder, just your program has privileges for this.

But even if you deploy your own file system on a raw partition, it only allows you to scroll down to the abstraction level exposed to the Operating System itself - the disk controller (and possibly even other firmware layers) can re-divide your partition into other pieces and you will have no way of knowing.

Note that in several Linux file systems, you can use the system call fallocate to change the file’s pre-allocated size - but even this fairly low-level and non-standard function has some option to ensure that the data is contiguous on the disk.