Multithreaded resource creation

Started by
3 comments, last by AxeGuywithanAxe 8 years ago

Hey guys, so i'm currently working on the backend of my multithreaded framework and wanted to get some opinions on creating render resources asynchronously. I know that engines such as unreal engine 4 split their code base to allow an api abstraction via "FRHIXXX" and creation/update logic via FRenderResource. There is a base class of FRenderResource for each type of GPU Resource, so there is a FVertexBuffer class to instantiate/update an FRHIVertexBuffer when on the render thread. The resource is sent to the render thread via BeginInitResource(...). This seems like a solid approach to render thread resource creation, and I'm hard pressed to find an implementation cleaner than that. Anyways, I just wanted to know if anyone had any different implementations, and possibly the benefits and weaknesses for such implementations. Thanks.

Advertisement
Modern APIs allow resource creation to occur on any thread. It's only legacy APIs that force you to initialize resources on a "render thread".
So, I would make my high-level, platform-agnostic rendering API follow this modern convention, and simply allow free-threaded resource creation.
All the ugliness can then be hidden in my back-end code for those legacy APIs.

I follow D3D11's idea of splitting single-threaded functions into a context class, and thread-safe functions into a device class. On modern API's, you can create many contexts and give one to each of your rendering threads.

#include <PrimitiveWrap.h>
TYPEDEF_ID( Texture );
//Use by one thread only:
class GpuContext
{
  void Draw(...);
};
 
//Safe for use by many threads concurrently:
class GpuDevice
{
  GpuContext& CreateGpuContext(...);
 
  Texture CreateTexture(...);
};

On an old crusty API that doesn't support this modern abstraction, the implementation will create a new dummy texture object and return it to the caller immediately so it looks like the resource has been created, but internally will be sending a message to the rendering thread to actually create the resource:
Texture GpuDevice::CreateTexture(..., Texture recycleExistingHandle)
{
  Texture newHandle;
  if( recycleExistingHandle == Texture(0) )
    newHandle = m_textures.AllocateHandle();
  else
  {
    m_textures[recycleExistingHandle]->FreeResources();
    newHandle = recycleExistingHandle;
  }

  if( CurrentThread() != m_theRendererThread )
  {
    m_resourceEventQueue.push_back( [&](){ CreateTexture(..., newHandle); }
  }
  else
  {
    InternalTexture* actualTextureResource = ...;//platform specific code
    m_textures[newHandle] = actualTextureResource;
  }
  return newHandle;
}
As long as the rendering thread executes all the messages in m_resourceEventQueue before issuing any upcoming draw-calls, the user won't know that you've pulled a sneaky one over them.

On modern API's, you can just create the damn resource immediately instead of bothering with the event queue.

Thank you for another great response Hodgman! That makes a lot of sense, and is really clean, If you don't mind me asking, I'm guessing that you would handle resource updating in the same manner? How would you handle releasing transient cpu data? If the user does not know that the cpu operation may be pushed to the job system, they may delete the temporary memory for a ::CreateVertexBuffer call for example, even though the job that will eventually be called on the renderer thread needs that data?

I'm guessing that you would handle resource updating in the same manner? How would you handle releasing transient cpu data? If the user does not know that the cpu operation may be pushed to the job system, they may delete the temporary memory for a ::CreateVertexBuffer call for example, even though the job that will eventually be called on the renderer thread needs that data?

The easiest solution for the memory management (in the sense that the user is unaware), is for that creation message to allocate some temporary memory and copy the user's data into it -- e.g.
Texture GpuDevice::CreateTexture(..., const void* initialData, size_t initialDataSize, ...)
{
...
  if( CurrentThread() != m_theRendererThread )
  {
    void* cloned = malloc(initialDataSize);
    memcpy( cloned, initialData, initialDataSize );
    m_resourceEventQueue.push_back( [&](){ CreateTexture(..., cloned, initialDataSize, ...); free(cloned); }
  }
This is obviously a performance issue - the user fills in one blob of memory, you copy it into another for the message's sake, and then the render thread copies it into the API (which may involve copying into a driver allocation, which is later copied to the GPU)...

The way that D3D11 handles this, is that you can query the device for a capability flag, which indicates whether the current driver is good at multi-threaded resource creation or not. If that flag is set, there's probably minimal extra copies, but if it's not, then creating resources on other threads will likely involve extra copying.
I do the same, and just let the user know if they'll be paying this hidden penalty for creating resources on threads other than the renderer thread.

You could avoid this if you like, at the cost of added complexity. You could require the user to keep their initialData allocation valid until some later point in time. Either have a call-back or a polling mechanism that informs the user when it's safe for them to free this allocation.

As for updating - yep, pretty much.
e.g. you could have:
UpdateResource( void* newData, size_t newDataSize )
which would actually create a new memory allocation as above, copy this data into it, and then enqueue this update operation for the "main thread" to perform later.

For "map" operations, it's a little more complicated. API's support different resource mapping modes - read, write, read&write, discard (a.k.a. orphan / rename), no-overwrite (a.k.a. unsynchronized), persistent, etc...
On "non-main" threads, I only support the discard and no-overwrite mapping modes, as these can be implemented by simply mallocing new memory inside map, and then enqueing the operation inside unmap.

Hmm, those seem like major draw backs in api design is it not? With subclassing you would have clear knowledge of when and where memory is used. It would avoid the need for allocating memory, even if it's temporary to hide information from the user, and you would also be able to support all features of the graphics api. The user would know that no matter what the backend api is, this code would execute in the same manner i.e




class CSomeReadingVertexBuffer : public CRenderResource
{
public:

   virtual void InitResource() override
   {
       //create the transient data
       CVec3* TransientCPUData = new CVec3[];
       .....       
       //create the api specific vertex buffer
       m_VertexBuffer = g_pGfx->CreateVertexBuffer(...);
       //lock the vertex buffer
       CVec3* GPUData = g_pGfx->LockVertexBuffer(m_VertexBuffer);
       .... process data 
       g_pGfx->UnLockVertexBuffer(m_VertexBuffer);

     //release cpu data 
      delete [] TransientCPUData;

   }

   //api specific back end version of vertex buffer 
   CGFXVertexBuffer m_VertexBuffer;
}

The user would know that no matter what rendering back end there is, the ::InitResource() function would work, some backends may decide to call it immediately , ala DX11 if the graphics card supports it, or it will be created via a deferred task. The Render Resource class effectively becomes the "logic" behind a task function entrypoint to be ran on the render thread. This approach would also allow features such as :






class CSomeRenderResource_Hub : public CRenderResource
{
public:

    virtual void InitResource() override
{
    m_SourceVertexBuffer = g_pGfx->CreateVertexBuffer(...);
    etc..
}

    virtual void UpdateResource() override
    {
       //lock all the buffers and read write all of their data for kicks
      CVec3* Data1 = (g_pGfx->LockVertexBuffer(m_SourceVertexBuffer, LOCK_READWRITE);
      CVec3* Data2 = ....
      CVec3* Data3 =....

      memcpy(Data1,Data3,...);
      memcpy(Data2,Data3,....);
      CVec3* NewData = new CVec3[..];
      memcpy(Data3, NewData);
      
      //unlock the vertex buffers
      g_pGfx->UnlockVertexBuffer(....);
    }


private:
    CGfxVertexBuffer*   m_SourceVertexBuffer;
    CGfxVertexBuffer*   m_SourceVertexBuffer1;
    CGfxVertexBuffer*   m_SourceVertexBufferXXX;
    


};

This topic is closed to new replies.

Advertisement