blog: optimal I/O buffer size

Post by **nikos** » 2013 Jun 23, 06:52

here's the comment area for today's blog post found at
http://zabkat.com/blog/buffered-disk-access.htm

fgagnon · Post by **fgagnon** » 2013 Jun 23, 13:01

I've never been a fan of putting much significance in sample-of-one demonstrations. What happens with significantly different file sizes? 100K, 10M, 100M?

Kilmatead · Post by **Kilmatead** » 2013 Jun 23, 13:54

We will also note that stating "In fact, as disk access is orders of magnitude slower than memory access" in a world where NAND-based SSD's are de rigueur for pretty much all machines sold these days (at least as the primary drive), one needs to be more careful about stating what is "fact" and what is not - your order of magnitude assumption isn't as logarithmically sound as it used to be.

Post by **nikos** » 2013 Jun 23, 16:20

computer science is a big field, there's always room for more investigation by other curious authors

snemarch · Post by **snemarch** » 2013 Aug 08, 16:40

That's a very simplistic benchmark, and it doesn't really fit any real-world data access patterns

- Since you're repeatedly reading an extremely small file, with buffering on, you're essentially measuring the cost of security checks and usermode<>kernelmode switching.

If you want any kind of sensible benchmark, you need to pick real-world scenarios. There's quite a difference between the operations involved in copying a file, performing a hash of a file, searching for a text string within a file, or loading complex data structures from a file. For file copying and hashing, FILE_FLAG_NO_BUFFERING might make a lot of sense - do you really want to thrash the filesystem cache for data that's only going to be used once? (Note FFNB doesn't cause read-through, if data is already cached it will be served hot).

And if you're doing synchronous I/O and processing, then you might find out that larger buffers end up causing poorer wall-clock performance in real life scenarios - so you might want to reduce buffer sizes, switch to async I/O, or both. And of course the media you're using also affects optimal I/O sizes (you already touch on that, but you need to consider network and SSDs as well - possibly even ramdisks, if you want all bases covered.)

On top of that, there's several things you can measure. One thing would be looking at buffer sizes and the wall-clock time to read in a file - what you're looking at there is the minimum buffer size to read your device's throughput (assuming the file isn't cached). But you should have a look at CPU usage as well - you might hit the max transfer rate at a small buffer size, but be able to reduce CPU consumption from the user<>kernel switches by bumping up the buffer size a bit. Perhaps you'll find this micro-benchmark interesting. And Mark Russinovich has an article about file copying in Vista's explorer with some interesting observations - notice he mentions that the internal max I/O size done pre-Vista is 64kb.

And then there's Memory-Mapped Files, of course. You're entirely at the mercy of Windows' caching strategy if you use them, so they're not a silver bullet for everything, but you do get the nice bonus of reading the cache system's buffers directly instead of having those buffers memcpy'd to your application buffers before processing - mostly an advantage when dealing with random I/O to data that has a high probability of being cached, though, rather than large sequential reads.

High-performance I/O isn't trivial

Post by **nikos** » 2013 Aug 09, 06:47

this is the age of reason but we need quantifiable data, not armchair philosophy

if you have a counterexample that you can back up with numbers, please enlighten us!

Hydranix · Post by **Hydranix** » 2015 Sep 25, 20:00

TL:DR > Benchmark data is flawed, do not trust. <

Found this through google when seeking info on the best buffer size for WinAPI file IO.
I want to make a note to anybody reading this, that the data here is completely unreliable and methodology is flawed.

By reading a file into memory, you invoke the filesystem cache automatically. This cache stores data in memory outside the address space of the process which read the file.
When the memory is released by that process upon termination, the data does not leave the filesystem cache's memory.

Reading the same file again, automatically invokes the filesystem cache, which returns from memory the requested data.
The author even observes this happening, yet still fails to realize the invalidates all his data and renders this test worthless.

The only time you need to be concerned with buffered reads are when the file is large enough to present an undesired performance or resource impact on the system. Unless your programming on an embedded or ancient machine (with WinAPI?), a <2MB JPEG isn't going to be worth part-by-part.

(there are some rare cases when you need to do buffered filesystem reads irrespective to file size but thats neyond the scope of this post)

I suspect the author used a simple for-loop to read the file, I could be wrong but I wouldn't be surprised if I was right.

Nearly all modern compilers, and even some script interpreters will optimize a loop requesting the same data in the same way down to the least amount of work required. 1000 iterations won't show much difference than 100 iterations, or 100000 iterations. Any difference observed will be unrelated the to buffer size performance beyond 1 iteratiopn regardless. This can be proven by noting the dramatic flucuations which are seemingly random in the data, when expecting the iterations and total time soent should scale linearly.

A better, perhaps proper way, to benchmark the ReadFile buffer would be:

Prepare in a filesystem containing several thousand unique files.
Wipe the entire memory of the computer. (power off, power on)
Perform ReadFile with a defined buffer size on the files.
Record time spent.
Wipe the entire memory of the computer.
Repeat with next buffer size.
??????
profit

-

Post by **nikos** » 2015 Sep 26, 08:02

you have completely missed the point of the article. As you can see from the results when the "no buffering" flag is used, again after 32KB buffers the speed doesn't improve, which again is the same as in the cached reads.

snemarch · Post by **snemarch** » 2015 Sep 30, 14:28

nikos wrote:you have completely missed the point of the article. As you can see from the results when the "no buffering" flag is used, again after 32KB buffers the speed doesn't improve, which again is the same as in the cached reads.

Nikos, did you get to perform some tests with fast drives, like modern-gen SSDs?
Did you get to look at how buffer size affects CPU load, especially on older machines?

I didn't reply to your post back in 2013 since I found "quantifiable data, not armchair philosophy" slightly offensive, given both my StackOverflow post as well as Mark Russinovichs (you do realize who he is, right?) contains quantifiable data in addition to theory

Anyway, the important things to keep in mind is that that there's a lot of different hardware out there, that maxing out speed on your hardware might not mean you can reach the max throughput on other hardware, that cached vs. uncached makes a difference, and finally that there's also CPU consumption to consider.

(Oh, and for some scenarios, memory-mapped I/O will give you the best performance - but that's a much more complex discussion).

Finally, apparently it might be possible to discard the Windows filesystem read cache without rebooting, which would make benchmarking a lot easier. Haven't gotten around to trying this suggestion yet, though.

zabkat support forum

blog: optimal I/O buffer size

blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size

Re: blog: optimal I/O buffer size