• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Paul__

DX11
Map/unmap, CopyStructureCount and slow down

9 posts in this topic

Hey all,

Profiling has shown that there's a massive slow down at a point in my game app.

In each frame, I use the compute shader to create vertices which are written to a default usage append buffer. Then the code reads the amount of vertices written by the compute shader with CopyStructureCount(). The target buffer for CopyStructureCount() is a D3D11_USAGE_STAGING buffer which is four bytes long, created with D3D11_CPU_ACCESS_READ. Then my app calls map() -> memcpy() -> unmap(). This last process causes the cpu to stop for 4 ms and the gpu to stop for 1 ms.

Without the call to the staging buffer's map/unmap, other dx calls and the app generally seem to take the right amount of time.

It's possible for me to calculate from the game data how many verts should be written, and therefore not call CopyStructureCount(). But it's a huge headache, involving tracking lots of data that I otherwise wouldn't need to.

The amount of pause is directly related to the length of the compute shader call. More vertices to create, longer pause. Seems likely the cpu is waiting for it to finish.

Now, I know that with some dx calls the cpu is forced to wait for the gpu, because the gpu is already using that resource. But why does the GPU pause too? And surely double buffering won't help? Because the *same* frame needs to know how many primitives to write in the soon-to-follow draw() call.

Any other suggestions? I'm sort of guessing here, but could I swap the order of each frame? Maybe:
- <Frame starts>
- Get the struct count from last frame
- Draw the verts
- Generate the next frame's verts
- Present

It's very hard to get *general* info about dx11 and the temporal relationship between the gpu and cpu, so any experienced help would be great!
0

Share this post


Link to post
Share on other sites
Normally the CPU and GPU work asynchronously, with the CPU submitting commands way ahead of when the GPU actually executes them. When you read back a value on the CPU (which is what you're doing with the staging buffer), you force a sync point where the CPU flushes the command buffer and then sits around waiting for the GPU to execute all pending commands. The amount of time it has to wait depends on the number of pending commands and how long they take to execute, which means it could potentially get much worse as your frames get more complex. I'm not sure how you're determining that the GPU is "pausing", but I would doubt that is the actual case.

Swapping the order can potentially help, if you can keep the CPU busy enough to absorb some of the GPU latency .
1

Share this post


Link to post
Share on other sites
Thanks for your answer. I'm not sure if I can really reorganise the way a frame is structured. Which means I might have to go the hard way and maintaining counts of all the geometry, rather than read the count from the append buffer. Damn!

So just to clarify, when an app *reads* a GPU buffer using map/unmap() will *always* cause the CPU to wait for the GPU? Compared to when an app *writes* to a dynamic buffer, which doesn't always cause the cpu to wait (I guess because under the hood dx seems to maintain multiple buffers for dynamic writes).

Also, when you say that the CPU "sits around waiting for the GPU to execute all pending commands", does that truly mean that all dx commands queued up for that frame have to be executed before a buffer can be read, or does it mean that only commands involving the particular append buffer to be read have to be waited for?

I'm using dx queries to time the gpu. I could well have made a mistake though!

Thanks again.
Paul
0

Share this post


Link to post
Share on other sites
Until the GPU has executed the instruction queue, the data that you are going to read doesn't exist.

However, you don't have to wait for read access to resources that are not used as targets for the currently running operations. Edited by Nik02
1

Share this post


Link to post
Share on other sites
Okay, thanks Nik02, I think I understand the GPU/CPU relationship a bit better now.
0

Share this post


Link to post
Share on other sites
It is best to think about the GPU as a remote machine to which you send requests, and from which you can then download the responses (if you need them). It actually is a remote machine, even though the physical distance from the CPU isn't usually very long.
2

Share this post


Link to post
Share on other sites
[quote name='Nik02' timestamp='1337259480' post='4940920']
It is best to think about the GPU as a remote machine to which you send requests, and from which you can then download the responses (if you need them). It actually is a remote machine, even though the physical distance from the CPU isn't usually very long.
[/quote]
That's quite a good analogy.

Another that may work is that it's like sending radio signals to the moon. Assuming the speed of light, a signal will arrive in about 2 seconds. If all you're doing is sending signals you can just send them as fast as you possibly can - one signal every millisecond if you so wish. However, if at any point you need to wait on a response before you can send the next signal you've a 2 second wait for the signal to reach the moon, an unknown amount of time while it's being processed and acted on there, and another 2 seconds before it can get back to you. During this time you're sitting there doing nothing; you can't send the next signal until you get the response.
1

Share this post


Link to post
Share on other sites
[quote name='Paul__' timestamp='1337241494' post='4940863']
So just to clarify, when an app *reads* a GPU buffer using map/unmap() will *always* cause the CPU to wait for the GPU?
[/quote]

Yup. The data you need doesn't exist until the GPU actually writes it, which means that the command that writes the data (and all previous dependent commands) have to be executed before the data is available for readback.

[quote name='Paul__' timestamp='1337241494' post='4940863']
Compared to when an app *writes* to a dynamic buffer, which doesn't always cause the cpu to wait (I guess because under the hood dx seems to maintain multiple buffers for dynamic writes).
[/quote]

Indeed, the driver can transparently swap through multiple buffers using a technique known as buffer renaming. This allows the CPU to write to one buffer while the GPU is currently reading from a different buffer.

[quote name='Paul__' timestamp='1337241494' post='4940863']
Also, when you say that the CPU "sits around waiting for the GPU to execute all pending commands", does that truly mean that all dx commands queued up for that frame have to be executed before a buffer can be read, or does it mean that only commands involving the particular append buffer to be read have to be waited for?
[/quote]

That would depend on the driver I suppose. I couldn't answer that for sure.
1

Share this post


Link to post
Share on other sites
Do you actually need to have the number of vertices available on the CPU? If you could use a DrawIndirect call instead, you wouldn't need to read back the buffer count on the CPU and you would avoid the sync.
0

Share this post


Link to post
Share on other sites
Thanks for all your replies -- a great help.

MJP: thanks for clarifying about the app reading gpu resources and the effect it has. I guess this means that programmers avoid reading the gpu in an app if possible, because such an app can't have the cpu working many many frames ahead of the gpu. I suppose it does kind of make the cpu and gpu stuck to each other for every single frame, and means they can't operate independently.

Also, with the driver using buffer renaming, I guess also there's no point in an app multi-buffering its dynamic buffers, because it's already done for it?

About the primitive count and why it's important. In my app, the compute shader generates a variable amount of primitives. Variable, because it's creating water tiles and each chunk of terrain has a variable amount of water tiles. On top of that, the amount of water tiles for each chunk changes throughout the game, based on water physics and other factors. So regardless of whether I use DrawIndirect or Draw, I think I still need to know the amount of water tiles in order to render them, either through reading how many tiles the compute shader made, or by the app keeping track of each chunk's amount of water tiles and updating those counts when the water behaviour changes. Keeping track is difficult, because terrain data is duplicated in video ram, and is updated from the main ram version only when there's a change. But I can and probably will maintain such a tile count, even though it'll be a bit of a pain.

Anyway, thought I'd say why reading the gpu would simplify the code so much. But I'm now persuaded it's probably not a good idea!
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By lonewolff
      Hi Guys,
      I am just wondering if it is possible to acquire the address of the backbuffer if an API (based on DX11) only exposes the 'device' and 'context' pointers?
      Any advice would be greatly appreciated
    • By MarcusAseth
      bool InitDirect3D::Init() { if (!D3DApp::Init()) { return false; } //Additional Initialization //Disable Alt+Enter Fullscreen Toggle shortkey IDXGIFactory* factory; CreateDXGIFactory(__uuidof(IDXGIFactory), reinterpret_cast<void**>(&factory)); factory->MakeWindowAssociation(mhWindow, DXGI_MWA_NO_WINDOW_CHANGES); factory->Release(); return true; }  
      As stated on the title and displayed on the code above, regardless of it Alt+Enter still takes effect...
      I recall something from the book during the swapChain creation, where in order to create it one has to use the same factory used to create the ID3D11Device, therefore I tested and indeed using that same factory indeed it work.
      How is that one particular factory related to my window and how come the MakeWindowAssociation won't take effect with a newly created factory?
      Also what's even the point of being able to create this Factories if they won't work,?(except from that one associated with the ID3D11Device) 
    • By ProfL
      Can anyone recommend a wrapper for Direct3D 11 that is similarly simple to use as SFML? I don't need all the image formats etc. BUT I want a simple way to open a window, allocate a texture, buffer, shader.
    • By lucky6969b
      Q1:
      Since there is no more fixed pipeline rendering in DX11, for every part of rendering in DX11, do I need to create a brand-new vertex shader and pixel shader... or at least I have to find one relevant online. If you work on skinned meshes and other effects originally worked in DX9 fixed pipeline, do I have to rework everything by now?
       
      Q2:
      For assimp, if it originally was designed for DX9, like it is coupled to a DX9 device for creating meshes and materials etc. Do I have to add in the DX11 device in the assimp, or can I just leave the assimp to remain in DX9 and after the meshes are loaded, I just convert the vertex buffers and index buffers into DX11 buffers?
      Thanks
      Jack
    • By MarcusAseth
      This header is mentioned in the book I'm reading but there is no documentation on msdn... Is it like an... outdated and abandoned header?
      If so, what's the current default/recomended library for handling errors with directX?
  • Popular Now