Asynchronous Asset Loading (data streaming)

Started by
16 comments, last by mokaschitta 13 years, 5 months ago
Say I have two threads in my game, the main thread and an asset loader thread. For now, my system consists of the following:

1. a manager class that allows you to Get() assets by name
2. an actor class that references assets like textures and meshes by name
3. when an actor needs to render a mesh, it calls Get() on the manager. The manager checks within a <string,asset pointer> map to see if it has the asset. If it returns non-null, it uses this mesh. Else, it uses a default pre-loaded mesh and the manager kicks off an asset loading task on the loader thread.
4. same as #3 for textures when a mesh needs them.
5. When an asset loading task is done, the manager adds the asset to its <string,asset pointer> map. This means that the map has a mutex or critical section that is locked whenever the user calls Get() and also whenever the loader thread is done.

I think locking a mutex every time Get() is called is not a good idea. I'm trying to think of alternate ways to solve this. Can anybody help?

One thing I can think of is to reference assets by pointer instead of by name. If an actor has a non-null pointer, it uses it. And the loader thread, when done, sets the pointer. This still requires a mutex though, for the actor's asset pointer.

One other thing is to keep some sort of state on the actor that indicates whether or not it has the asset. If it's loading, then it'll keep calling Get() on the manager. When it gets a valid pointer, it'll cache it, and stop calling Get(). The pointer would have to be a shared_ptr or weak_ptr because multiple actors could use the same assets.

Can someone please give me pointers on how to implement a good asynchronous loader? Also, would the implementation change if I had another thread that processes the loaded data (decompress/create object(s) from data)?

Thanks in advance.
Advertisement
Quote:Original post by gsamour
I think locking a mutex every time Get() is called is not a good idea. I'm trying to think of alternate ways to solve this. Can anybody help?


"Think" is a poor metric for making a choice in software development. Prove it with data or a use case rather than 'think'.

In reality the lock/unlock of a mutex is the safest and simplest method of doing this and should be the first method taken (and other methods can cause all manner of headaches you simple don't need).

As for writing a good async loader; worry about getting it done with two threads first, as scaling up from there is a deep hole of async IO, IO completion ports, async procedure call, thread states and all manner of fun things which require a decent grounding in threaded coding to do 'right'.

(I do intent on making a journal entry on this subject, in the context of a task based game system, at some point in the future, but as we are now about to hit 'beta crunch' at work it'll probably be a couple of weeks off now).
You're right, "think" is a lousy metric. I recently downloaded a trial of Intel's VTune. I'm still trying to understand the results it gives me, but here's what I've got so far:

VTune shows a list of "hotspots" which seem to be the functions that get the most cpu time. My manager's Get() is listed as a hotspot, which is why I think it's a bad idea to call it so much (especially when my program "knows" that the resource is already in memory).
If you allow the Actor to 'own' the mesh then you will avoid calling Get() so much. If your actor normally uses the same mesh each time then you could rewrite it so when the Actor requests a Mesh it gets an AssetHandle back. When it comes to render time it could do something like this:

if(handle.isLoaded())   return handle.Asset();else   return ActorDefaultMesh();


AssetHandle::isLoaded() could examine a flag and if you're super careful with memory barriers etc, then you can use atomic operations to read and set the value of that flag from different threads. Bear in mind you are opening yourself up to all kinds of horrendous bugs when you start messing with this stuff.

It would be so much easier if you ensured all assets are loaded during some kind of loading screen!
Quote:Original post by gsamour
You're right, "think" is a lousy metric. I recently downloaded a trial of Intel's VTune. I'm still trying to understand the results it gives me, but here's what I've got so far:

VTune shows a list of "hotspots" which seem to be the functions that get the most cpu time. My manager's Get() is listed as a hotspot, which is why I think it's a bad idea to call it so much (especially when my program "knows" that the resource is already in memory).


That might be but how much other work is your program doing? If its doing a trival amount of work and hitting the code alot then yes, it will show up as a hotspot, however when it is doing a realistic work load that might well not be the case.
Thanks Noggs & phantom for your replies so far :)

@ Noggs:

I'd like to have streaming, and I found your comment about flags and memory barriers interesting. Could you please go into more detail? Or do you have a link to an article or tutorial?

@ phantom:

I'll see if I can post more information, like the VTune output. My program has hitches every now and then, and I'm trying to track them down. From the responses and my previous knowledge, I'm starting to doubt that mutexes in general are my problem.


---

One of the asset types I'm streaming are textures, and when a texture is loaded, my processing thread calls D3DXCreateTextureFromFileInMemoryEx(). The direct3d device can't be accessed from multiple threads, which means I use a mutex for the device. This means my main thread blocks. If this function takes a long time, this would explain the hitches.


EDIT: I double checked my code as I was not sure about the "D3DXCreateTextureFromFileInMemoryEx" part (I wrote that section a while ago). I'm not using D3DXCreateTextureFromFileInMemoryEx anymore. I'm just using D3DXCreateTexture() to create an empty texture, then using LockRect() and Unlock() to fill it. Is there a faster way to do it?
If you are locking/unlocking the mutex you use to check if something is loaded then you are unlikely to see 'hitches' every now and then, it would be a pretty stable impact on your update rate.

That said, now that I'm more awake, it occurs to me that you technically shouldn't need a mutex; mutexs are good when you are going to have multiple threads writing to a resource and thus you need to prevent out of ordered updates and other data hazards.

However, you only have a single thread writing and a single thread reading as such this doesn't matter too much. Worse case while you reading the var in the 'get' function it gets updated in the 'loading' thread and you don't see it for an extra frame.

The only time you need to setup a mutex is if you are going to modify the map in the 'get' function.
I'm fairly certain a std::map is in no way thread safe, which means that if you have one thread adding to it while another is reading from it you may get a crash or other unexpected behaviour. For example consider what might happen if one thread is rebalancing the tree, while the other is trying to use it. Two threads reading the same std::map should be safe, although I don't think there's any guarantees in the standard.


My guess for what's causing the hitching would be that you're creating lots of textures in one frame. Try spreading the work out over multiple frames.

Smaller textures can also be loaded and created more quickly, so consider storing textures using D3DFMT_DXT1 or D3DFMT_DXT5 where appropriate.
Do some research on InterlockedCompareExchange(). Memory barriers are used to ensure that memory writes that you would expect to have occurred when you call the function have definitely occurred - but it looks like the MS Interlocked functions implicitly have memory barriers built in.

Basically:

Thread A requests asset X. This creates an Asset entry with mState=LoadPending.
Load Thread picks up the load request and use CompareExchange to set the state to Loading
Load Thread finishes loading the asset and uses CompareExchange to set the state to Loaded

At any time any thread can query the mState and should only use it if the state becomes Loaded (that's what the isLoaded() function would do).
The reason CompareExchange must be used will become apparent when you come to handle unloading (imagine the case where Thread A decides to unload when the state is Loading). This is where things start to get complicated!

The key is to manage the state transitions very carefully and to always be aware of which thread has control of the asset in which state.

Not that I'm trying to discourage you but lock-free async loading can be a big time sink while you try and iron out all the niggly little edge cases. I would suggest sticking with the mutex version until you're sure that it needs optimising.
Quote:Original post by Adam_42
I'm fairly certain a std::map is in no way thread safe, which means that if you have one thread adding to it while another is reading from it you may get a crash or other unexpected behaviour. For example consider what might happen if one thread is rebalancing the tree, while the other is trying to use it.


Hmmm, you make a good point I hadn't considered at the time of my posting...

That said, inserting/deleting doesn't seem to invalidate iterators into the container, which seems to imply that things shouldn't go wonky if you were reading while something was adding; but I wouldn't swear to that without seeing the code for it of course [smile]

Edit;
Truth be told I wouldn't touch the map for this anyways.

A simple solution would be;
- On requesting the data I would pass back a handle which has a bool you can query for 'ready'ness.
- Renderer/main thread queries this value when it goes to use it
- Loader set to 'true' when the data is ready

Another option is a callback to an object to let it know that 'data X is ready', which maps to the handle you obtained earlier.

This topic is closed to new replies.

Advertisement