Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


Asynchronous Asset Loading (data streaming)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 gsamour   Members   -  Reputation: 140

Like
0Likes
Like

Posted 15 November 2010 - 06:33 AM

Say I have two threads in my game, the main thread and an asset loader thread. For now, my system consists of the following:

1. a manager class that allows you to Get() assets by name
2. an actor class that references assets like textures and meshes by name
3. when an actor needs to render a mesh, it calls Get() on the manager. The manager checks within a <string,asset pointer> map to see if it has the asset. If it returns non-null, it uses this mesh. Else, it uses a default pre-loaded mesh and the manager kicks off an asset loading task on the loader thread.
4. same as #3 for textures when a mesh needs them.
5. When an asset loading task is done, the manager adds the asset to its <string,asset pointer> map. This means that the map has a mutex or critical section that is locked whenever the user calls Get() and also whenever the loader thread is done.

I think locking a mutex every time Get() is called is not a good idea. I'm trying to think of alternate ways to solve this. Can anybody help?

One thing I can think of is to reference assets by pointer instead of by name. If an actor has a non-null pointer, it uses it. And the loader thread, when done, sets the pointer. This still requires a mutex though, for the actor's asset pointer.

One other thing is to keep some sort of state on the actor that indicates whether or not it has the asset. If it's loading, then it'll keep calling Get() on the manager. When it gets a valid pointer, it'll cache it, and stop calling Get(). The pointer would have to be a shared_ptr or weak_ptr because multiple actors could use the same assets.

Can someone please give me pointers on how to implement a good asynchronous loader? Also, would the implementation change if I had another thread that processes the loaded data (decompress/create object(s) from data)?

Thanks in advance.

Sponsor:

#2 phantom   Moderators   -  Reputation: 7586

Like
0Likes
Like

Posted 15 November 2010 - 10:27 AM

Quote:
Original post by gsamour
I think locking a mutex every time Get() is called is not a good idea. I'm trying to think of alternate ways to solve this. Can anybody help?


"Think" is a poor metric for making a choice in software development. Prove it with data or a use case rather than 'think'.

In reality the lock/unlock of a mutex is the safest and simplest method of doing this and should be the first method taken (and other methods can cause all manner of headaches you simple don't need).

As for writing a good async loader; worry about getting it done with two threads first, as scaling up from there is a deep hole of async IO, IO completion ports, async procedure call, thread states and all manner of fun things which require a decent grounding in threaded coding to do 'right'.

(I do intent on making a journal entry on this subject, in the context of a task based game system, at some point in the future, but as we are now about to hit 'beta crunch' at work it'll probably be a couple of weeks off now).


#3 gsamour   Members   -  Reputation: 140

Like
0Likes
Like

Posted 15 November 2010 - 11:35 AM

You're right, "think" is a lousy metric. I recently downloaded a trial of Intel's VTune. I'm still trying to understand the results it gives me, but here's what I've got so far:

VTune shows a list of "hotspots" which seem to be the functions that get the most cpu time. My manager's Get() is listed as a hotspot, which is why I think it's a bad idea to call it so much (especially when my program "knows" that the resource is already in memory).

#4 Noggs   Members   -  Reputation: 141

Like
0Likes
Like

Posted 15 November 2010 - 11:54 AM

If you allow the Actor to 'own' the mesh then you will avoid calling Get() so much. If your actor normally uses the same mesh each time then you could rewrite it so when the Actor requests a Mesh it gets an AssetHandle back. When it comes to render time it could do something like this:


if(handle.isLoaded())
return handle.Asset();
else
return ActorDefaultMesh();



AssetHandle::isLoaded() could examine a flag and if you're super careful with memory barriers etc, then you can use atomic operations to read and set the value of that flag from different threads. Bear in mind you are opening yourself up to all kinds of horrendous bugs when you start messing with this stuff.

It would be so much easier if you ensured all assets are loaded during some kind of loading screen!

#5 phantom   Moderators   -  Reputation: 7586

Like
0Likes
Like

Posted 15 November 2010 - 12:00 PM

Quote:
Original post by gsamour
You're right, "think" is a lousy metric. I recently downloaded a trial of Intel's VTune. I'm still trying to understand the results it gives me, but here's what I've got so far:

VTune shows a list of "hotspots" which seem to be the functions that get the most cpu time. My manager's Get() is listed as a hotspot, which is why I think it's a bad idea to call it so much (especially when my program "knows" that the resource is already in memory).


That might be but how much other work is your program doing? If its doing a trival amount of work and hitting the code alot then yes, it will show up as a hotspot, however when it is doing a realistic work load that might well not be the case.

#6 gsamour   Members   -  Reputation: 140

Like
0Likes
Like

Posted 15 November 2010 - 12:43 PM

Thanks Noggs & phantom for your replies so far :)

@ Noggs:

I'd like to have streaming, and I found your comment about flags and memory barriers interesting. Could you please go into more detail? Or do you have a link to an article or tutorial?

@ phantom:

I'll see if I can post more information, like the VTune output. My program has hitches every now and then, and I'm trying to track them down. From the responses and my previous knowledge, I'm starting to doubt that mutexes in general are my problem.


---

One of the asset types I'm streaming are textures, and when a texture is loaded, my processing thread calls D3DXCreateTextureFromFileInMemoryEx(). The direct3d device can't be accessed from multiple threads, which means I use a mutex for the device. This means my main thread blocks. If this function takes a long time, this would explain the hitches.


EDIT: I double checked my code as I was not sure about the "D3DXCreateTextureFromFileInMemoryEx" part (I wrote that section a while ago). I'm not using D3DXCreateTextureFromFileInMemoryEx anymore. I'm just using D3DXCreateTexture() to create an empty texture, then using LockRect() and Unlock() to fill it. Is there a faster way to do it?

#7 phantom   Moderators   -  Reputation: 7586

Like
0Likes
Like

Posted 16 November 2010 - 04:38 AM

If you are locking/unlocking the mutex you use to check if something is loaded then you are unlikely to see 'hitches' every now and then, it would be a pretty stable impact on your update rate.

That said, now that I'm more awake, it occurs to me that you technically shouldn't need a mutex; mutexs are good when you are going to have multiple threads writing to a resource and thus you need to prevent out of ordered updates and other data hazards.

However, you only have a single thread writing and a single thread reading as such this doesn't matter too much. Worse case while you reading the var in the 'get' function it gets updated in the 'loading' thread and you don't see it for an extra frame.

The only time you need to setup a mutex is if you are going to modify the map in the 'get' function.

#8 Adam_42   Crossbones+   -  Reputation: 2619

Like
0Likes
Like

Posted 16 November 2010 - 06:24 AM

I'm fairly certain a std::map is in no way thread safe, which means that if you have one thread adding to it while another is reading from it you may get a crash or other unexpected behaviour. For example consider what might happen if one thread is rebalancing the tree, while the other is trying to use it. Two threads reading the same std::map should be safe, although I don't think there's any guarantees in the standard.


My guess for what's causing the hitching would be that you're creating lots of textures in one frame. Try spreading the work out over multiple frames.

Smaller textures can also be loaded and created more quickly, so consider storing textures using D3DFMT_DXT1 or D3DFMT_DXT5 where appropriate.

#9 Noggs   Members   -  Reputation: 141

Like
0Likes
Like

Posted 16 November 2010 - 06:52 AM

Do some research on InterlockedCompareExchange(). Memory barriers are used to ensure that memory writes that you would expect to have occurred when you call the function have definitely occurred - but it looks like the MS Interlocked functions implicitly have memory barriers built in.

Basically:

Thread A requests asset X. This creates an Asset entry with mState=LoadPending.
Load Thread picks up the load request and use CompareExchange to set the state to Loading
Load Thread finishes loading the asset and uses CompareExchange to set the state to Loaded

At any time any thread can query the mState and should only use it if the state becomes Loaded (that's what the isLoaded() function would do).
The reason CompareExchange must be used will become apparent when you come to handle unloading (imagine the case where Thread A decides to unload when the state is Loading). This is where things start to get complicated!

The key is to manage the state transitions very carefully and to always be aware of which thread has control of the asset in which state.

Not that I'm trying to discourage you but lock-free async loading can be a big time sink while you try and iron out all the niggly little edge cases. I would suggest sticking with the mutex version until you're sure that it needs optimising.

#10 phantom   Moderators   -  Reputation: 7586

Like
0Likes
Like

Posted 16 November 2010 - 07:09 AM

Quote:
Original post by Adam_42
I'm fairly certain a std::map is in no way thread safe, which means that if you have one thread adding to it while another is reading from it you may get a crash or other unexpected behaviour. For example consider what might happen if one thread is rebalancing the tree, while the other is trying to use it.


Hmmm, you make a good point I hadn't considered at the time of my posting...

That said, inserting/deleting doesn't seem to invalidate iterators into the container, which seems to imply that things shouldn't go wonky if you were reading while something was adding; but I wouldn't swear to that without seeing the code for it of course [smile]

Edit;
Truth be told I wouldn't touch the map for this anyways.

A simple solution would be;
- On requesting the data I would pass back a handle which has a bool you can query for 'ready'ness.
- Renderer/main thread queries this value when it goes to use it
- Loader set to 'true' when the data is ready

Another option is a callback to an object to let it know that 'data X is ready', which maps to the handle you obtained earlier.


#11 Noggs   Members   -  Reputation: 141

Like
0Likes
Like

Posted 16 November 2010 - 07:29 AM

Quote:
Original post by phantom
A simple solution would be;
- On requesting the data I would pass back a handle which has a bool you can query for 'ready'ness.
- Renderer/main thread queries this value when it goes to use it
- Loader set to 'true' when the data is ready


This is a case where using a memory barrier is essential. This page has a good example of why under Fixing a Race Condition.

If you use InterlockedExchange to set the bool to true then that implicitly creates a memory barrier, whereas using assignment won't. Additionally please don't use the 'volatile' keyword to try and fix as it creates a false sense of security and may not have the same effect on non MS compilers.

#12 voguemaster   Members   -  Reputation: 179

Like
0Likes
Like

Posted 16 November 2010 - 08:56 PM

Hmm,

It sounds to me like this is the wrong method to address the problem. Not the sync behavior using atomic operations but the whole concept presented here.

I admit I haven't implemented such a scheme but wouldn't it be easier if the loader thread would load a portion of the scene graph, along with its assets and once its loaded, either it or the master thread (it depends on the design) will attach the completed scene subtree to the current scene graph.

Attaching is as simple as assignment of parent/child, possible in several places but still. Connecting and disconnecting entire subtrees under a lock every once in a while is perfectly viable and shouldn't incur any performance problems.

The concept presented above, to my understanding, means the following:

1. I have an entire scene graph
2. In order to conserve and recycle memory I'm going to offload and load textures and other big resources based on several criteria.
3. All nodes that need access to assets then must be able to query a resource manager to verify their assets are loaded and if not, request loading.

This method seems more difficult. Not only to code properly but to debug. Not just because of threading problems but we also introduced loading states and when an asset it loaded it is held by the graph node (albeit with a handle of some sort).

Now consider that you have specific unloading criteria and when you unload you probably unload entire chunks of resources that fit entire subtrees in your scene graph so it boils down to something along the line of what I suggested earlier but with a lot more code overhead.


I've done streaming of other things, not a scene graph, but that seems to be a better solution to me.


#13 gsamour   Members   -  Reputation: 140

Like
0Likes
Like

Posted 17 November 2010 - 10:38 AM

I'll definitely look into memory barriers and returning handles upon asset requests. I also find loading a whole portion of the scene graph interesting. I appreciate everyone posting their thoughts on this, I'll post any results as soon as I change my implementation. That being said, I encourage everyone to keep posting thoughts if you feel like it.

Thanks again!


P.S.

@voguemaster

Quote:
I've done streaming of other things


What have you streamed before?

#14 voguemaster   Members   -  Reputation: 179

Like
0Likes
Like

Posted 17 November 2010 - 08:56 PM

I've worked on projects that stream medical images and do so with prioritization that is dynamic. Its a medical viewing application and memory constraints also dictate how you unload images.

That's in its simple form of course :) technically the data structures more likely resemble several trees when you manage them. Only in the final rendering you actual present images in a "list" of some sort.

#15 gwihlidal   Members   -  Reputation: 708

Like
0Likes
Like

Posted 17 November 2010 - 09:28 PM

Some great replies here so far. One suggestion I wanted to add is avoid doing look ups into your data manager by string. This pattern never scales (huge performance hit with a large catalog), uses a lot of memory, and often causes fragmentation. Instead, I would generate a hash of your asset names (32bit or 64bit, possibly working in a bucketed hash to handle collisions) and make your requests against hash values instead.

Cheers!
Graham

----

Senior Software Engineer @ Frostbite \ DICE (Rendering)

Previously at BioWare

Author of Game Engine Toolset Development

http://blog.bioware.com/2013/07/25/staff-blog-graham-wihlidal-senior-software-engineer/

https://www.linkedin.com/in/gwihlidal


#16 voguemaster   Members   -  Reputation: 179

Like
0Likes
Like

Posted 18 November 2010 - 01:30 AM

Heck, I'd like to work in bioware too :). Can't say enough good words about their games. Kudos.

#17 Triton   Members   -  Reputation: 138

Like
1Likes
Like

Posted 18 November 2010 - 04:47 AM

I've made an asynchronous loading system similiar to what you want.

I use a task system, that keeps some threads (you can control how many you want) in a pool, waiting for tasks, and executes them in the background. I always use thread-safe message queues to communicate between the threads and the queue blocks when it doesn't have anything, so the threads only wake up when there's anything to do.

I also have a ResourceManager class, that the rest of the engine uses to request resources. If they are not already cached, then I create a new task which I submit to the task system. When the background thread completes the loading, it queues a message in the ResourceManager saying the loading finished, and once that happens the Resource itself is changed to the Loaded state, and I notify whoever subscribed to the onResourceLoaded event.

It currently returns reference counting pointers. These are safe for multithreaded use by using atomic operations, the InterlockedExchange functions that were talked about earlier. I'm currently thinking of switching to a handle-based system, as pointers makes it harder to support resource reloading at runtime. Currently I need to notify everything that there the pointer of a resource changed.

Here is the code where the magic happens, hope it helps.

Task Manager
Concurrent Queue
Resource Manager
Resource Loader
Resource Task



#18 mokaschitta   Members   -  Reputation: 124

Like
0Likes
Like

Posted 18 November 2010 - 09:16 PM

I just started working on a threaded jobQueue aswell, and I came across a gamedev thread were Hodgeman proposed the idea of using double buffering for the jobQueue (vector list or whatever you prefer). Basically that way you would have only one list you would add jobs to each frame, and another one that get's read from, you would only have to use a mutex when swapping the buffers. (that idea requires that each subthread is using it's own queue though. basically my manager class simply tries to equally distribute the work over all subthread queues). Anyways it has the advantage that you don't have to get into lockless queues which are hard to get right and only have one mutex lock for each thread per frame, instead of each time you add and read from it.
And as triton points out thread blocking when no task a there is a great win.
Currently I am doing it like this in the while() loop runnin my jobThread(boost way):

boost::mutex::scoped_lock l(m_mutex);
if(m_bPause)
{
m_cond.wait(l);
}


m_bPause becomes true if there are no more jobs left in the queue, and get's false if new jobs get added to it.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS