File loader using boost::threads

Started by
8 comments, last by Antheus 13 years, 7 months ago
Hi all,

I just want to get some advice before I embark on a massive coding spree... After looking at boost::threads, I am going to try to code a simple file loader.

I am working on a real time 3D simulation where by meshes can "stream" in and be rendered on screen. My plan is to spawn a thread for each mesh file and write the data on to Mesh objects. Once completed, the thread will set Mesh.isLoaded to true. And then in every frame render(), I will read the Mesh.isLoaded flag before rendering it. Meshes are created by and stored on heap in a Singleton MeshManager.

My questions
1. Is this design sound?
2. Do I need a mutex on Mesh or Mesh.isLoaded? I am thinking I do not need any mutex at all, for anything... I think I am wrong, but I cannot think of an error-prone zone.
3. What will happen if 2 or more threads attempt to read the same mesh file? How to prevent this? Currently I am just going to use a std::map to store each filename loaded and check against it and reuse the Mesh.
4. If the program is terminated prematurely, eg. ctrl+alt_del, what will happen to all the heap resources in the running threads?

Thanks for advices.
==============================================Rage - Really Amateurish Graphics EngineCollada Parser / Serializer
Advertisement
Creating threads is very expensive. I would not create one for each model you are going to load. You might create a couple of worker threads that are always in the background checking for jobs to run in a queue.

You also don't want to try and load tons of files at the same time from your loading medium. You'll probably end up reading even slower, since it is trying to access several things at once and may have to go all over the media to try and do each of the reads.

You would need a mutex if there is any chance you could read and write at the same time. If you create an object, then feed it data after the memory is allocated, you will probably want some kind of mutex.

Memory should all be reclaimed when the program terminates. If there are other kinds of resources (like windows handles), these are not always reclaimed on termination.

[Edited by - Rattrap on September 1, 2010 10:24:11 AM]

"I can't believe I'm defending logic to a turing machine." - Kent Woolworth [Other Space]

This is just speculation, but I wouldn't think using more than one thread to load files would help you anyway because they are reading off the same disk. Having 2+ threads reading different files "simultaneously" would cause the drive to have to jump around, giving worse performance than just doing one after the other.
--- krez ([email="krez_AT_optonline_DOT_net"]krez_AT_optonline_DOT_net[/email])
hehe. was about to post here... then noticed this is almost the same topic you posted earlier that I gave advice on...
http://www.gamedev.net/community/forums/topic.asp?topic_id=581179&whichpage=1�


Quote:
1. Is this design sound?

Above posters covered most the finer points.

Quote:
2. Do I need a mutex on Mesh or Mesh.isLoaded? I am thinking I do not need any mutex at all, for anything... I think I am wrong, but I cannot think of an error-prone zone.

Oh. But you DO need a mutex. Without a mutex or other memory barrier, the "isLoaded" flag may end up written out to memory before the last parts of your data are, so the other thread will see the flag is true, and read in garbage.

Quote:
3. What will happen if 2 or more threads attempt to read the same mesh file? How to prevent this? Currently I am just going to use a std::map to store each filename loaded and check against it and reuse the Mesh.

Two threads from the same process should have no problem reading the same file. Unless you use the OS specific file open commands to lock access to the file.

Quote:
4. If the program is terminated prematurely, eg. ctrl+alt_del, what will happen to all the heap resources in the running threads?

Taskman will kill of everything it can. Some resources it will not be able to return when it does this. Usually it will successfully return memory and file handles. What you really need to worry about is having your threads continue running after the main thread is done (they are deadlocked or something, and don't exit of their own accord).
@KulSeran Yea, both post are on the same issue. Many thanks for taking time to write such a detail reply. It is very helpful. I was initially look for a ready file loader, but it led me to boost::thread instead, so i decided to give it a try.

@all with all your help i managed to put together something crude but working for my needs now. Hope no major bugs. The idea that the disk seek head jumping around was insightful, it would not be apparent to me if you did not bring up.

==============================================Rage - Really Amateurish Graphics EngineCollada Parser / Serializer
Quote:Original post by krez
This is just speculation, but I wouldn't think using more than one thread to load files would help you anyway because they are reading off the same disk. Having 2+ threads reading different files "simultaneously" would cause the drive to have to jump around, giving worse performance than just doing one after the other.


It actually does help. The reason is that blocking reads will almost never read as fast as possible, even if the file is completely defragmented.

Non SSDs have multiple platters, and data can be read from multiple platters at the same time (literally). If you're only reading one file, even completely defragmented, chances are you aren't hitting both all the platters at once, and some of the read heads of your disk will be idle. Reading from multiple threads can address this problem.

One problem that reading from multiple threads CAN'T address is that of request ordering. Only by using the native async i/o features provided by your kernel can you get the best performance. Consider the following two files:

File 1  Block 0    |     Block 1     |    Block 2---------------------------------------------     1               100               5File 2  Block 0    |     Block 1     |    Block 2---------------------------------------------   200               10               2


Here, this means that first block of file 1 is at block 1 on the physical disk, and the second block of file 1 is at block 100 on the physical disk, etc. In other words, these two files are fragmented.

Now let's suppose you read just read these all from the same thread, back to back. The reads will be ordered like this:

1 -> 100 -> 5 -> 200 -> 10 -> 2

Terrible right? That's 587 blocks seeked over total.

Now let's suppose you read from multiple threads, and the reads come in like this:

1 | 200
100 | 10
5 | 2

where | means that the 2 threads issue those reads at the same time. Let's just say thread 1 always wins and is first. Then the seeking will be like this:

1 -> 200 -> 100 -> 10 -> 5 -> 2

This is 199 + 99 + 90 + 5 + 3 = 386 blocks seeked over. A little better (although change the numbers around and it could just have easily been worse).

How does async i/o come to the rescue? The reason is that you give the kernel information about ALL DESIRED BLOCKS immediately, before it issues anything.

You issue the first read, for block 1. The read queue is empty so it goes ahead and fires off the read and returns immediately. Then you immediately issue reads for 200, 100, 10, 5, and 2. Doesn't matter what order. All of these happen before 1 is finished. The kernel can re-order these for you since by definition of async you're giving it permission to return at a later date, and so the kernel can issue reads in the order 1 -> 2 -> 5 -> 10 -> 100 -> 200. This is a total of 199 blocks seeked over, which you can see is the theoretical minimum possible.


This is a contrived example, and when you take into consideration multiple heads, platters, etc it gets more complicated. But hopefully you can see that fast i/o is not as simple as it first seems :)

[Edited by - cache_hit on September 2, 2010 2:17:08 AM]
Here's some tests on multithreaded file i/o (more contrived examples heh heh), I guess it depends on your usage and what disk setup you have.

Async I/O is probably your best bet, although that is a whole other animal.
--- krez ([email="krez_AT_optonline_DOT_net"]krez_AT_optonline_DOT_net[/email])
Quote:Original post by krez
Here's some tests on multithreaded file i/o (more contrived examples heh heh), I guess it depends on your usage and what disk setup you have.

Async I/O is probably your best bet, although that is a whole other animal.


Interesting article. But yea, basically same conclusion. It might hurt, it might help, depending on variables such as file fragmentation and the location of the specific files you're trying to read.
If you are looking to use multiple threads with boost, you might want to check out threadpool, a sourceforge project. As previously stated, thread creation is expensive...

GCS584
boost comes with boost::asio. It supports either native asynchronous file IO or facilities to implement such yourself.

See the logger example on asio's tutorial page.

This topic is closed to new replies.

Advertisement