Some question about cpu, gpu communication in d3d

Started by
3 comments, last by SeanMiddleditch 7 years, 8 months ago

Hellow !!!!!!

I have some question about cpu, gpu communication in d3d.

1) Does the d3d api return right now when call it? or return after it completely execute ? such as drawXXXX , setXXXBuffer

2) the UpdateSubresource method modify the ram in graphic card directly or modify a temporary buffer in cpu side?

3) If the buffers are live in ram of graphic card, the setXXXBuffer(buffer) is just call gpu to use that buffer in graphic ram?

4) What discrepancy between context->Map/unmap and context->UpdateSubresource , the context->Map will cause lock?

5) If I have n cbuffer, it sounds context->SetConstantBuffer(0, n, &buffers[0]) will much faster than SetConstantBuffer one by one,

but some guys say perframe cbuffer must set once per frame, and per object cbuffer will be set many times per frame, Is it necessary

for me to set perframe cbuffer together with perobject cbuffer to gain the "one commit will faster than multi commit" ? or set them dependently?

Advertisement

I am going to answer this as accurately as I can.

1) Does the d3d api return right now when call it? or return after it completely execute ? such as drawXXXX , setXXXBuffer

It returns immediately, Unless it is a Async one.

2) the UpdateSubresource method modify the ram in graphic card directly or modify a temporary buffer in cpu side?

That is upto the driver, But AFAIK the driver generally stores it in RAM until an appropriate time to upload to VRAM.

3) If the buffers are live in ram of graphic card, the setXXXBuffer(buffer) is just call gpu to use that buffer in graphic ram?

AFAIK from D3D10+ is that the data could be moved to system RAM and then sent back when needed due to VRAM constraints.

But yes, The data has to be in VRAM for the GPU to use it.

4) What discrepancy between context->Map/unmap and context->UpdateSubresource , the context->Map will cause lock?

I don't recall the difference off hand, It will be best if you read the API doc's on what they do.

5) If I have n cbuffer, it sounds context->SetConstantBuffer(0, n, &buffers[0]) will much faster than SetConstantBuffer one by one,

but some guys say perframe cbuffer must set once per frame, and per object cbuffer will be set many times per frame, Is it necessary

for me to set perframe cbuffer together with perobject cbuffer to gain the "one commit will faster than multi commit" ? or set them dependently?

The fewer commands sent to the driver/GPU are better.

A buffer stays bound to the D3D pipeline till something else is bound in its place, But you can update the contents without having to rebind it.

A per object cBuffer HAS to be set for each object, hence its name as it is useless to render object N+1 with N's data.

Also: Profile, profile, profile.

Not everything will affect the performance of your game/app the same way as others do, Outside of general API usage.

HTH

Never say Never, Because Never comes too soon. - ryan20fun

Disclaimer: Each post of mine is intended as an attempt of helping and/or bringing some meaningfull insight to the topic at hand. Due to my nature, my good intentions will not always be plainly visible. I apologise in advance and assure you I mean no harm and do not intend to insult anyone.

2) the UpdateSubresource method modify the ram in graphic card directly or modify a temporary buffer in cpu side?

That is upto the driver, But AFAIK the driver generally stores it in RAM until an appropriate time to upload to VRAM.

3) If the buffers are live in ram of graphic card, the setXXXBuffer(buffer) is just call gpu to use that buffer in graphic ram?

AFAIK from D3D10+ is that the data could be moved to system RAM and then sent back when needed due to VRAM constraints.

But yes, The data has to be in VRAM for the GPU to use it.

device->UpdateSubresource call driver to update buffers, but it is still stored in RAM-side, when buffers are needed, it upload to VRAM,

and when graphic card is out of VRAM, it download to RAM ?

from https://msdn.microsoft.com/en-us/library/windows/desktop/ff476486(v=vs.85).aspx I hear about cpi-gpu contention ,

what is the command buffer and non-mappable memory ?

Here's another question, I have read some articles(forget which one) says the driver will pack something when draw, pack what?

1) Does the d3d api return right now when call it? or return after it completely execute ? such as drawXXXX , setXXXBuffer

It depends on the API. Some APIs will wait for the GPU, while some will return immediately. A good rule of thumb is to check if you're allowed to use it on a deferred context - if so, it should return immediately on the immediate context. If not, it probably needs to wait for the GPU. One example is Map() with D3D11_MAP_READ.

2) the UpdateSubresource method modify the ram in graphic card directly or modify a temporary buffer in cpu side?

Ryan was pretty correct on this one. I suppose it's technically possible that the update could be done in lock-step with the GPU, but I don't know of a driver that does that. Instead, they allocate some storage, copy what you pass in, and then record the equivalent of a CopySubresourceRegion() to get the contents into the destination on the GPU timeline.

3) If the buffers are live in ram of graphic card, the setXXXBuffer(buffer) is just call gpu to use that buffer in graphic ram?

It depends on the type of buffer. If it's a DEFAULT buffer, then yes it is expected to live in VRAM, but an DYNAMIC buffer probably does not. The GPU has the ability to read from system RAM. The Set*Buffer call typically just records some state in the driver, which gets translated into a series of operations at draw time to ensure the right resources are used by that draw.

4) What discrepancy between context->Map/unmap and context->UpdateSubresource , the context->Map will cause lock?

You can think of UpdateSubresource (on a buffer) as a sequence of:

1. CreateBuffer() with type DYNAMIC.

2. Map(WRITE_DISCARD).

3. Memcpy().

4. Unmap().

5. CopySubresourceRegion().

Map with WRITE_DISCARD can technically block for the GPU to finish reading from previous contents, but typically performs "renaming", which gives you a new region of memory to write to. Future references to that buffer will refer to the new memory instead of the old contents. Map with READ or WRITE will block until the GPU is done with the buffer.

5) If I have n cbuffer, it sounds context->SetConstantBuffer(0, n, &buffers[0]) will much faster than SetConstantBuffer one by one, but some guys say perframe cbuffer must set once per frame, and per object cbuffer will be set many times per frame, Is it necessary for me to set perframe cbuffer together with perobject cbuffer to gain the "one commit will faster than multi commit" ? or set them dependently?

I'd say the rule of thumb is:

1. Batch bindings together when you can.

2. Avoid redundant API calls when you can.

3. Don't worry too much about this until you see it as a performance issue. Like Ryan said, profile.

device->UpdateSubresource call driver to update buffers, but it is still stored in RAM-side, when buffers are needed, it upload to VRAM, and when graphic card is out of VRAM, it download to RAM ?

The Windows memory manager does shuffle resources in and out of video memory depending on whether they're needed or not, but UpdateSubresource is unrelated. The memory that gets written during the API call doesn't get migrated like that, it gets copied. The preferred location of a given allocation is an immutable property, and whether or not the CPU can write to it depends on that. The CPU typically cannot write directly to VRAM, so if something prefers to live in VRAM, then CPU access is typically not requested.

I hear about cpi-gpu contention , what is the command buffer and non-mappable memory ?

The command buffer is a buffer that stores commands. It's allocated by drivers, and is the thing that is submitted to the GPU to trigger work to happen. Non-mappable memory is memory that can't have Map() called on it.

Here's another question, I have read some articles(forget which one) says the driver will pack something when draw, pack what?

You need to be more specific. I don't know what you're talking about.

device->UpdateSubresource call driver to update buffers, but it is still stored in RAM-side, when buffers are needed, it upload to VRAM, and when graphic card is out of VRAM, it download to RAM ?


Yes. That's part of the magic that an OpenGL or D3D<=11 driver does for you.

The driver will batch resource uploads when it can to reduce the per-transfer overhead. Doing a big block of resource updates in one go is faster, on some hardware at least.

The driver also optimizes how and when resources are uploaded, which final formats they're stored in, and which GPU heaps they're in. An immutable resource with no CPU access flags for instance can perhaps be safely transcoded into a faster tiled format and copied into the fastest of the GPU memory regions.

The OS and driver collude to allow multiple apps to safely share a GPU. This requires ensuring that apps are never starved for GPU memory. Just like your OS will "swap" application memory to the harddrive if too many apps are running or using too much memory, they'll also "swap" things out of GPU memory into system memory, and swap them back in if they're needed again.

from https://msdn.microsoft.com/en-us/library/windows/desktop/ff476486(v=vs.85).aspx I hear about cpi-gpu contention , what is the command buffer and non-mappable memory ?


A command buffer is just a list of commands sent to the GPU. It generally likes to operate on nice long command buffers and not many little ones; there's overhead to starting any particular command buffer, and some GPUs can parallelize work inside a command buffer but not between command buffers.

Your GL/D3D driver will automatically generate such a buffer for you. Every call you make to the API is recorded into an active buffer; they are _not_ submitted to the GPU right away. At some point when the driver knows that the buffer must be executed (e.g., you asked for the results, or flushed the API, or called Present) the driver will then upload the work. It may also do this periodically based on various heuristics it has in an attempt to keep the GPU busy.

Non-mappable memory is memory on the GPU that cannot be accessed by the CPU. This non-mappable memory might be faster or it might just be memory on a large card that can't be accessed by the bus.

Getting data into this memory from the CPU requires copying the resource to the mappable GPU memory and then copying it again from the mappable memory to the unmappable memory. This is slower on upload of course, but makes subsequent GPU-only access faster (or just enables more memory to be used). If you have a resource that you change rarely and use frequently, it's probably best to keep it in the unmappable GPU memory. The driver does this for you based on the flags you set when making the resource.

Here's another question, I have read some articles(forget which one) says the driver will pack something when draw, pack what?


Hard to say without seeing the article.

It likely meant either that the driver will combine API commands into a single (or small number of) command buffers in an attempt to better utilize the GPU.

It may also be referring to how a driver can make some decisions about how/when to upload resources and which format to store them in on the GPU based on various flags.

Sean Middleditch – Game Systems Engineer – Join my team!

This topic is closed to new replies.

Advertisement