[C++] MultiThreaded file I/O

Started by
4 comments, last by Jan Wassenberg 15 years, 9 months ago
I am working with some very large lidar digital elevation models. Each file is around 150mb in size and at any one time, I have 5 of these bad boys open. They are loaded in as tbb::concurrent_vector<T>(row*col). So single array indexed as a 2D array. These files are ascii. Loading them and then parsing them takes *a while*, especially on my slower (1.2Ghz Core Duo) laptop. Then the save can take a while too, as there are multiple versions of the file which I have to save (ie, xy raster, xyz column, etc) Given this, I'm looking to speed up my code. I am going to parralize the main for loop of my code (which is doing the work on the rasters) with tbb:parallel_for. However, the most time consuming operations are still the read and then the saves. Therefore, my question is as follows: Is there a while to parallelize the saving and loading of these large files? If so, how? Googling has failed me so far, so hopefully someone can point me in the correct
Advertisement
Overlapped IO is about as good as it gets.

Overlapped IO allows you to do just that, overlap operations. You're doing reading, processing and writing at same time, up to available resources.

But with slow disk, you can't really expect magic.
Quote:Original post by Antheus
Overlapped IO is about as good as it gets.

Overlapped IO allows you to do just that, overlap operations. You're doing reading, processing and writing at same time, up to available resources.

But with slow disk, you can't really expect magic.


Anything cross platform?
//edit, ok I see the unix link -> does that work for mac too?

so no other options? No way for two threads to "read" the same file?

[Edited by - _Sigma on June 24, 2008 7:34:03 PM]
Thinking about it, is there any reason one can't just put each I/O into it's own thread?
Two threads reading one file isn't what you want. That will just thrash the drive and make things slower.

The best you can hope for is to be doing the loading / saving in parallel with the processing. The easiest way to do that is with two threads. Here's a rough idea of how you want to set it up:

Thread1         Thread2----------      --------------                Load file #1Process #1      Load file #2Process #2      Save file #1 then load file #3Process #3      Save file #2 then load file #4Process #4      Save file #3                Save file #4


Note that if you're CPU botlenecked then multiple threads doing the processing is an option, but you only want one of them touching the disk as writing goes quicker if the drive heads don't need to seek.

To implement this you just need to make three thread safe queues - one of files to load, one of data to save, and one of data to process. The I/O thread just works through the load and save queues. The other threads just do processing.

Note that if you save to a different drive to the one you load from then having one read thread and one write thread will help performance.
> These files are ascii.
I realize LIDAR data is often exchanged in ASCII form, but why not convert to binary (if only to speed up iteration during development)?
You could also speed up the parsing of floats from ASCII (which is a killer) by taking advantage of the known layout of the file.

Even if you need to produce ASCII outputs, you can compress the data on-the-fly and write it directly into a Zip archive, which will probably result in an overall savings in write time.

Quote:Anything cross platform?
//edit, ok I see the unix link -> does that work for mac too?

Mac OS X is based on BSD, BSD supports most of POSIX, POSIX includes aio, so yes.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement