• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
jakovo

About GPU-Memory interaction

5 posts in this topic

Hi all,

I want to understand better how the GPU works, specially what causes why and when the performance issues related to it.

for example,

A) if a texture needed for a triangle is stored in VRAM, that means that when using a text2d(...) instruction within the shader code, the GPU stalls waiting to get the appropriate pixel from VRAM, am I right?... or does the whole texture get stored in cache?... if so, that means that all texture used are stored in cache (bump, diffuse, etc)?

B) when rendering, the GPU needs to write on the appropriate render target, would the whole RT be also on a local cache?... so that menas that when changing RT's it needs to send the old RT to VRAM and bring the new one to cache?

C) when changeing render states, I beleive this would be a matter of just changeing a flag in the GPU, so that wouldn't cause any performance issues, would it?... that is, I could go crazy changing states without changeing RT or textures or shader code and it would not have any relevant penalty, right?

D) if VRAM runs out of space, the textures, would be stored in System RAM?

Thanks!
0

Share this post


Link to post
Share on other sites
A) Yes there will usually be a stall here. But rather than letting the GPU sit idle it will start to work on other pixels/vertices instead. GPUs can have many thousands of pixels/ vertices in some stage of execution at any point in time. One of the limiting factors is each element currently in progress requires some registers to store intermediate values so optimizing the shader to use less registers can help with ensuring there are enough elements in flight to hide these stalls.

B) Typically RTs are not in cache but they do have local ROP tiles which can cache data. These ROP tiles are flushed to VRAM when they are finished being written to or there is a RT switch.

C) Some render states can be pipelined with the draw call. Some can't and are set in one of many state contexts. Potentially some render state changes could cause the pipeline to flush or partially flush leading to bubbles of the GPU going idle. Which can and can't is very much hardware dependent. Also note that some render state switches could potentially cause a lot of work in the driver on the CPU side if the hardware doesn't directly support the feature or the CPU has to do some kind of processing on the data first.
2

Share this post


Link to post
Share on other sites

D) if VRAM runs out of space, the textures, would be stored in System RAM?

 

I'm not sure, actually. When I was doing OpenCL work, I seemed to observe some surprising memory paging effects (unused buffers getting swapped to system memory when required), so I suspect this would be the case. This could be implementation-defined behaviour, however.

0

Share this post


Link to post
Share on other sites

WDDM on Vista/Win7/Win8 allows for limited paging of data to and from GPU memory. You'll know when it happens, because your performance will tank. :P

1

Share this post


Link to post
Share on other sites

A) if a texture needed for a triangle is stored in VRAM, that means that when using a text2d(...) instruction within the shader code, the GPU stalls waiting to get the appropriate pixel from VRAM, am I right?... or does the whole texture get stored in cache?... if so, that means that all texture used are stored in cache (bump, diffuse, etc)?

B) when rendering, the GPU needs to write on the appropriate render target, would the whole RT be also on a local cache?... so that menas that when changing RT's it needs to send the old RT to VRAM and bring the new one to cache?

C) when changeing render states, I beleive this would be a matter of just changeing a flag in the GPU, so that wouldn't cause any performance issues, would it?... that is, I could go crazy changing states without changeing RT or textures or shader code and it would not have any relevant penalty, right?

D) if VRAM runs out of space, the textures, would be stored in System RAM?

C) Pixels are batched up into "segments" on the GPU-side. If multiple successive draw-calls have the same state, then their pixels will probably end up in the same "segment". Some state changes will force the end of a segment and the start of a new one, while other state-changes won't. There's no rules here, each card may be different. Generally, bigger changes, like changing the shader program will definately end a segment, while smaller changes, like changing a texture may not.

 

Also, as mentioned by AliasBinman, changing states may have a significant CPU-side overhead within the driver or API code.

A) As above, when processing pixels, the GPU has a whole "segment" worth of pixels that need to be processed. It can break the pixel shader up into several "passes" of several instructions each, and then perform pass 1 over all pixels in the segment, then pass 2, and so on.
For example, given this code, and the comments pretending how it's been broken up into passes:

	float3 albedo = tex2D( s_albedo, input.texcoord ).rgb;//pass 1
	albedo = pow( albedo, 2.2 );//pass 2
	return float4(albedo,1) * u_multiplier;//pass 3

Say we've got 400 pixels, and 40 shader units, the GPU would be doing something like:

for( int pass=0; pass != 3; ++pass )
  for( int i=0; i<400; i+=40 )
    RunShader( /*which instructions*/pass, /*which pixel range*/i, i+40 );

So to begin with, it executes pass#1 - issueing all the texture fetch instructions, which will read texture data out of VRAM (or the cache) and write that data into the cache. Then after it's issued the fetch instructions for pixels #360-400, it will move onto pass #2 for pixels #1-40. Hopefully by this point in time, the fetch instructions for these pixels have completed, and there's no waiting around (if the fetches are still in progress, there will be a stall). Then, after this pass has performed all it's pow calls, the next pass is run, which does some shuffling and multiplication, generating the final result. These results are then sent to the ROP stage.

 

The bigger your "segments", the more able the GPU is able to hide latency by working on many pixels at once. Shaders that require a lot of temporary variables will reduce the maximum segment size, because the current state of execution for every pixel shader needs to be saved when moving on to other pixels (and more temporary variables == bigger state). Also, certain state-changes -like changing shaders- will end a segment. So if you have a shader with lots of fetches, you want to draw hundreds (or thousands) of pixels before switching to a different shader.

 

B) Some GPUs work this way, especially older ones, or ones that boast having "EDRAM" -- there's a certain (small) bit of memory where render targets must exist to be written to. When setting a target, it has to be copied from VRAM into this area (unless you issue a clear command before drawing), and afterwards it has to be copied from this area back to VRAM (unless you issue a special "no resolve" request). On other GPUs, render-targets can exist anywhere in VRAM (or even main RAM) and there is no unnecessary copying. The ROP stage will perform buffering of writes to deal with the latency issues, similar to the above ideas in (A).

 

D) This depends on the API, driver and GPU. On some systems, the GPU may be able to read from main RAM just like it reads from VRAM, so storing texutres in main RAM is not much of a problem. On other systems, the driver will have to reserve an area of VRAM and move textures back and forth between main/VRAM as required... On other systems, texture allocation may just fail when VRAM is full.

 

 

* Disclaimer -- all of this post is highly GPU dependent, and the details will be different on different systems. This is just an illustration of how things can work.

Edited by Hodgman
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0