Optimial file buffer size

Started by
9 comments, last by b2b3 18 years, 10 months ago
If I want to copy a file (say about 700mb in size) to another file, using the fread, fwrite functions, what is the optimal buffer size used for this operation ?
There is nothing that can't be solved. Just people that can't solve it. :)
Advertisement
I'd say, the same as your disk's R/W cache, though a fraction of it would do OK too.

The only real solution is for you to try different buffer sizes, and see what gives the best result. Be curious. Experiment.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
However, be warned that what works well on one machine may not perform well on another due to different machine differences such as amount of ram and disk attributes.

With that being said if you've identified file coping as speed bottleneck of your overall application time you might want to look into using a memory memory mapped file copy if the system file copy or the fread/fwrite ones are not fast enough for you. The nice thing about memory mapped versions is that it eliminates the copy to and from the temporary buffer. In some cases that can spped things up alot.

---CyberbrineDreamsSuspected implementation of the Windows idle loop: void idle_loop() { *((char*)rand()) = 0; }
Quote:Original post by Fruny
I'd say, the same as your disk's R/W cache, though a fraction of it would do OK too.


How would you actually determine this number, if you don't already know what exact hard drive is in the machine?
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
You could do something really cool, like trying a range while copying the file, benchmarking each one then selecting that for the rest of the transfer [grin]
Anything posted is personal opinion which does not in anyway reflect or represent my employer. Any code and opinion is expressed “as is” and used at your own risk – it does not constitute a legal relationship of any kind.
That is question, what size to use that fits most of the machines of today and still give me good results ?
There is nothing that can't be solved. Just people that can't solve it. :)
The problem is that I don't think there's one good answer to your question. The "right" value can vary even on the same machine depending on whats running on it at a given time much less trying to coming up with a good value for all machines.

If this is something that's really critical for your application and you don't want to make it adaptive like paulecoyote suggested you might want to make it a user-configurable parameter. If you're using Windows use the cluster size of the disk as the default value. Look at GetDiskFreeSpace() and multiply lpSectorsPerCluster with lpBytesPerSector. This will get you the native granularity of the given disk.

I hope this helps.
---CyberbrineDreamsSuspected implementation of the Windows idle loop: void idle_loop() { *((char*)rand()) = 0; }
ASPI requests are or were limited to 64KiB, so best not to exceed that; 32KiB works well.
If you care about performance, use your platform's AIO mechanism instead. That bypasses any caching and queues commands to the drive so that throughput is maximized.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
I'd recommend a meg or so. And you should probably use a double-buffer (or triple-buffer) system with asynchronous access, if possible. That'd allow both reading and writing to occur simultaneously if they're not both on the same drive. A similar solution would be to use two threads: a reader and a writer thread, which alternate buffers. That would allow you to use synchronous access yet still overlap reads and writes.
Noone has mentioned it yet, but just to state the (hopefully) obvious. The buffer size really should be a power of two!

I'd suggest 64K i.e. 65536 bytes. It most likely isn't going to perform much faster increasing it beyond that.

Oh and just remember that optimal can mean different things, so when you ask what is optimal, make sure you state what it is to be optimal in terms of, i.e. speed / memory usage etc...[smile]
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement