• ### Popular Now

• 13
• 14
• 27
• 9
• 9

This topic is 3553 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am working with some very large lidar digital elevation models. Each file is around 150mb in size and at any one time, I have 5 of these bad boys open. They are loaded in as tbb::concurrent_vector<T>(row*col). So single array indexed as a 2D array. These files are ascii. Loading them and then parsing them takes *a while*, especially on my slower (1.2Ghz Core Duo) laptop. Then the save can take a while too, as there are multiple versions of the file which I have to save (ie, xy raster, xyz column, etc) Given this, I'm looking to speed up my code. I am going to parralize the main for loop of my code (which is doing the work on the rasters) with tbb:parallel_for. However, the most time consuming operations are still the read and then the saves. Therefore, my question is as follows: Is there a while to parallelize the saving and loading of these large files? If so, how? Googling has failed me so far, so hopefully someone can point me in the correct

##### Share on other sites
Overlapped IO is about as good as it gets.

Overlapped IO allows you to do just that, overlap operations. You're doing reading, processing and writing at same time, up to available resources.

But with slow disk, you can't really expect magic.

##### Share on other sites
Quote:
 Original post by AntheusOverlapped IO is about as good as it gets.Overlapped IO allows you to do just that, overlap operations. You're doing reading, processing and writing at same time, up to available resources.But with slow disk, you can't really expect magic.

Anything cross platform?
//edit, ok I see the unix link -> does that work for mac too?

so no other options? No way for two threads to "read" the same file?

[Edited by - _Sigma on June 24, 2008 7:34:03 PM]

##### Share on other sites
Thinking about it, is there any reason one can't just put each I/O into it's own thread?

##### Share on other sites
Two threads reading one file isn't what you want. That will just thrash the drive and make things slower.

The best you can hope for is to be doing the loading / saving in parallel with the processing. The easiest way to do that is with two threads. Here's a rough idea of how you want to set it up:

Thread1         Thread2----------      --------------                Load file #1Process #1      Load file #2Process #2      Save file #1 then load file #3Process #3      Save file #2 then load file #4Process #4      Save file #3                Save file #4

Note that if you're CPU botlenecked then multiple threads doing the processing is an option, but you only want one of them touching the disk as writing goes quicker if the drive heads don't need to seek.

To implement this you just need to make three thread safe queues - one of files to load, one of data to save, and one of data to process. The I/O thread just works through the load and save queues. The other threads just do processing.

Note that if you save to a different drive to the one you load from then having one read thread and one write thread will help performance.

##### Share on other sites
> These files are ascii.
I realize LIDAR data is often exchanged in ASCII form, but why not convert to binary (if only to speed up iteration during development)?
You could also speed up the parsing of floats from ASCII (which is a killer) by taking advantage of the known layout of the file.

Even if you need to produce ASCII outputs, you can compress the data on-the-fly and write it directly into a Zip archive, which will probably result in an overall savings in write time.

Quote:
 Anything cross platform?//edit, ok I see the unix link -> does that work for mac too?

Mac OS X is based on BSD, BSD supports most of POSIX, POSIX includes aio, so yes.