noodleBowl

DX11 Binding buffers and then updating them

Recommended Posts

I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:

IASetVertexBuffers
IASetIndexBuffer
VSSetConstantBuffers
PSSetConstantBuffers

 and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.

Do I need to rebind the buffer again in order for my update to take effect? If I dont rebind is that really bad as in I get a performance hit? My thought process behind this is that if the buffer is already bound why do I need to rebind it? I'm using that same buffer it is just different data

 

Share this post


Link to post
Share on other sites

You don't need to rebind. There may be a hit if you Map a buffer that is still in use by the GPU, depending on the flags. Choose wisely between Discard and No Overwrite. Discard will only stall if you Map it a lot without letting frames complete. It's a good choice if you only Map that buffer once a frame. No Overwrite should not stall, but may corrupt if you overwrite in-use data. It's good for streaming purposes where you never go backwards. Remember that this still counts across frames, you should Discard once before No Overwrite, or else set GPU fences.

If you're lost at this stage, take a look here: https://msdn.microsoft.com/en-us/library/windows/desktop/dn508285(v=vs.85).aspx

Share this post


Link to post
Share on other sites
16 hours ago, Promit said:

Discard will only stall if you Map it a lot without letting frames complete. It's a good choice if you only Map that buffer once a frame. No Overwrite should not stall, but may corrupt if you overwrite in-use data. It's good for streaming purposes where you never go backwards. Remember that this still counts across frames, you should Discard once before No Overwrite, or else set GPU fences.

In the article you linked there is a part where they go over using D3D11_MAP_WRITE_NO_OVERWRITE and D3D11_MAP_WRITE_DISCARD when mapping. Where you use D3D11_MAP_WRITE_DISCARD starting off and then successive map/unmap calls should use D3D11_MAP_WRITE_NO_OVERWRITE until your buffer becomes full again and you switch back to the D3D11_MAP_WRITE_DISCARD flag.

So now that has me wondering should I be placing everything in one large buffer? That way I don't have keep rebinding all these different buffers. Something like:

//The big buffer was already bound at this point (the intialization point)
for(std::vector<Renderable>::iterator renderable = renderables.begin(); renderable != renderables.end(); ++renderable)
{
	if(bigBufferIsNotFull)
	{
		D3D11_MAPPED_SUBRESOURCE resource = Map(bigBuffer, D3D11_MAP_WRITE_NO_OVERWRITE)
		memcpy(resource.pData, (*renderable).data, (*renderable).dataSize); //(*renderable).data in this case is a vector or array of vertex data
		Unmap();
	}
	else //Our big buffer is full
	{
		D3D11_MAPPED_SUBRESOURCE resource = Map(bigBuffer, D3D11_MAP_WRITE_DISCARD)
		memcpy(resource.pData, (*renderable).data, (*renderable).dataSize); //(*renderable).data in this case is a vector or array of vertex data
		Unmap();
	}
	Draw((*renderable).vertexCount);
}

Opposed to:

for(std::vector<Renderable>::iterator renderable = renderables.begin(); renderable != renderables.end(); ++renderable)
{
  	//(*renderable).buffer is a ID3D11Buffer and at some point (renderable creation) the buffer was mapped with the needed data
  	BindVertexBuffer((*renderable).buffer); 
  	Draw((*renderable).vertexCount);
}

 

Share this post


Link to post
Share on other sites
41 minutes ago, noodleBowl said:

So now that has me wondering should I be placing everything in one large buffer? That way I don't have keep rebinding all these different buffers.

Yes, with caveats. You can only Discard a buffer so many times* before the driver stalls. Think of it like this: when you are doing this, your buffer is actually three buffers internally. Each time you discard, it switches from buffer 0 to 1, 1 to 2, 2 to 3, 3 to 0. But if that switch to 0 arrives while 0 is still in use, you're going to stall. So the buffer has to be big enough that you don't do this more than once or maybe twice a frame in order for the GPU to clear pending operations. It's best to make the buffer big enough to accommodate an entire frame's rendering in one go with the occasional spill. 

* The limit doesn't apply to constant buffers, which are magical buffers that can service thousands of discards per frame.

Share this post


Link to post
Share on other sites
2 hours ago, Promit said:

Yes, with caveats

What is kind of tripping me up is how this big buffer exists. In the sense that I could have multiple vertex types like this

class VertexTypeA
{
	Vector3 pos;
	Color color;
}

class VertexTypeB
{
	Vector3 pos;
	Color color;
	Vector2 texCoord
}

class VertexTypeC
{
	//Some data to represent VertexTypeC
}

But surly these can't go all into the same buffer. I would need a VertexTypeA buffer, VertexTypeB buffer, VertexTypeC buffer, etc right? 

In the real world, are there renders that solely focus on one object to render? EG: You have a SpriteRenderer and its only job is to render sprites, then you have a CarRenderer and its only ever going to render cars, a CharacterRenderer that only renders characters so on and so forth. Or is there some single renderer entity that does everything (I really don't think this is the case)?

Maybe a combo, where everything is fed into single renderer but then its really delegated out to the sub renderers (SpriteRenderer, CarRenderer, etc) to handle what you are actually wanting to draw?

Share this post


Link to post
Share on other sites
4 hours ago, noodleBowl said:

In the real world, are there renders that solely focus on one object to render? EG: You have a SpriteRenderer and its only job is to render sprites, then you have a CarRenderer and its only ever going to render cars, a CharacterRenderer that only renders characters so on and so forth. Or is there some single renderer entity that does everything (I really don't think this is the case)?

You can partition your objects based on what data you need in your vertices. Like you said, sprites will probably be different from static objects: sprite vertices can have a position, pair of texture coordinates and color, whereas a static object vertex could have a position, pair of texture coordinates and normal. In the end, you will have few partitions. You could use some kind of "subrenderer" for each partition like you described it or at least for the partitions which are very different. Partitions which use for instance a different vertex shader, but the same pixel shader (the same lighting calculations for instance) could be handled by the same "subrenderer" which just switches the vertex shader or sorts based on the vertex shader beforehand.

Note, however, that a dynamic rigid (or deformable) object is more abstract than a concrete "car". So if the car is not the main focus by itself in your game, you do not necessarily need special car-only treatment in your game engine.

Edited by matt77hias

Share this post


Link to post
Share on other sites
15 hours ago, matt77hias said:

You can partition your objects based on what data you need in your vertices ... In the end, you will have few partitions. You could use some kind of "subrenderer" for each partition like you described it or at least for the partitions which are very different. 

Note, however, that a dynamic rigid (or deformable) object is more abstract than a concrete "car". So if the car is not the main focus by itself in your game, you do not necessarily need special car-only treatment in your game engine.

So its not necessarily a renderer per thing (sprite, car, etc), but really a renderer per vertex type (VertexTypeA renderer, VertexTypeB renderer, or etc) and or task (lighting)

 

Previously I also I had some psudeo code like:

D3D11_MAPPED_SUBRESOURCE resource = Map(bigBuffer, D3D11_MAP_WRITE_NO_OVERWRITE)
memcpy(resource.pData, (*renderable).data, (*renderable).dataSize); //(*renderable).data in this case is a vector or array of vertex data
Unmap();

And the renderable's data was in a std::vector or array. But is it possible to have the vertex data already in a ID3D11Buffer and then copy this data directly into the big buffer owned by the renderer? Or would this be really bad because it could trigger a read from the GPU and that in turn would be super slow and cause stalling

How should I be copying data over into the renderer's big buffer? Is it really just a simple memcpy from a std::vector (renderable's vertex data) into the ID3D11Buffer (renderers buffer) using map/unmap

Edited by noodleBowl

Share this post


Link to post
Share on other sites
8 hours ago, noodleBowl said:

So its not necessarily a renderer per thing (sprite, car, etc), but really a renderer per vertex type (VertexTypeA renderer, VertexTypeB renderer, or etc) and or task (lighting)

Sort of. You can render static objects, skeleton animated objects, sprites, etc.

For static objects you can use a single vertex layout or a few vertex layouts (if you want to pre-compute tangents for some of them for instance). Your renderer for static objects knows which layouts are supported for static objects and is capable of selecting the appropriate one for each static object. Then you can for example sort static objects based on the layout and render all static objects with the same layout consecutively.

Share this post


Link to post
Share on other sites
11 hours ago, matt77hias said:

For static objects you can use a single vertex layout or a few vertex layouts (if you want to pre-compute tangents for some of them for instance). Your renderer for static objects knows which layouts are supported for static objects and is capable of selecting the appropriate one for each static object. Then you can for example sort static objects based on the layout and render all static objects with the same layout consecutively.

Maybe I'm missing something here, but wouldn't a different input layout essentially mean a different vertex type?

//Vertex Type 1
D3D11_INPUT_ELEMENT_DESC layout1[] =
{
    {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
};

//Vertex Type 2
D3D11_INPUT_ELEMENT_DESC layout1[] =
{
    {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"TEXCOORDS", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 28, D3D11_INPUT_PER_VERTEX_DATA, 0},
};

 

Share this post


Link to post
Share on other sites

You can have a single vertex shader that takes positions + colours as inputs, but create three different input layouts so that you can use that one shader with three different storage formats, e.g.

D3D11_INPUT_ELEMENT_DESC layout1[] =
{
    {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
};
//maps to:
struct Layout1Stream0 { float posiion[3]; float color[4]; };


D3D11_INPUT_ELEMENT_DESC layout2[] =
{
    {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
};
//maps to:
struct Layout2Stream0 { float posiion[3]; };
struct Layout2Stream1 { float color[4]; };


D3D11_INPUT_ELEMENT_DESC layout3[] =
{
    {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
    {"COLOR", 0, DXGI_FORMAT_R8G8B8A8_UNORM, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0},
};
//maps to:
struct Layout3Stream0 { float posiion[3]; unsigned char color[4]; };

Share this post


Link to post
Share on other sites
1 hour ago, Hodgman said:

You can have a single vertex shader that takes positions + colours as inputs, but create three different input layouts so that you can use that one shader with three different storage formats, e.g.

When it comes to your structs like Layout1Stream0 and my VertexTypeA I think we are talking about the same thing here. EG:

//The vertex types
class VertexTypeA
{
	Vector3 pos; //Holds 3 floats: x, y, z
	Color color; //Holds 4 floats: r, g, b, a
};

class VertexTypeB
{
	Vector3 pos; //Holds 3 floats: x, y, z
};

class VertexTypeC
{
	Color color; //Holds 4 floats: r, g, b, a
};

//======== Create and use the first input layout
D3D11_INPUT_ELEMENT_DESC layout1[] = {
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
	{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};
d3dDevice->CreateInputLayout(layout1, 2, shaderCode->GetBufferPointer(), shaderCode->GetBufferSize(), &inputLayout1);

D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeA) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeA);

//Use the buffer/layout for input layout 1
UINT stride = sizeof(Vertex);
UINT offset = 0;
graphicsDevice->deviceContext->IASetVertexBuffers(0, 1, &bufferVertexTypeA, &stride, &offset);


//======== Create and use the second input layout
D3D11_INPUT_ELEMENT_DESC layout2[] = {
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
	{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};
d3dDevice->CreateInputLayout(layout2, 2, shaderCode->GetBufferPointer(), shaderCode->GetBufferSize(), &inputLayout2);

//Position buffer
D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeB) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeB);

//Color buffer
D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeC) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeC);

//Use the buffers/layout for input layout 2
UINT strides[2];
strides[0] = sizeof(VertexTypeB); 
strides[1] = sizeof(VertexTypeC); 

UINT offsets[2];
offsets[0] = 0;
offsets[1] = 0;

ID3D11Buffer* buffers[2];
buffers[0] = bufferVertexTypeB;	
buffers[1] = bufferVertexTypeC;

deviceContext->IASetInputLayout(inputLayout2);
deviceContext->IASetVertexBuffers(0, 2, buffers, strides, offsets);

But yeah that makes sense that you could just have the one vertex shader to handle all of the different formats assuming they are more or less the same

I feel like it might be better to just create a new layout, but could you omit a property from use? Eg: you setup a layout to have 3 properties (Position, Color, Normal), but one of the things you want to draw only uses the properties Position and Normal where potentially everything else uses all 3 properties

Edited by noodleBowl

Share this post


Link to post
Share on other sites

It's totally fine to have a "fat" vertex with many attributes and only use a subset in various vertex shaders. For example the shadow map VS will only need the position and maybe uv0 (for alpha test), but the gbuffer VS will use all the attributes.

You don't need to prepare two vertex buffers of the same mesh, just reuse the same one in all the necessary passes. It doesn't have to be multiple "streams", the data can be interleaved. Both options have slightly different performances, but I wouldn't be concerned with this at all, in the beginning.

A layout is a "view" of the vertex data, which is just a bunch of bytes in memory. Layout tells the VS which offset each "variable" rests on, where to fetch it from.

Switching a vertex shader has a cost. Switching layouts also has a cost. Switching buffers has no cost.

Share this post


Link to post
Share on other sites
On 10/16/2017 at 2:27 AM, noodleBowl said:

I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:


IASetVertexBuffers
IASetIndexBuffer
VSSetConstantBuffers
PSSetConstantBuffers

 and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.

You can do that. What you cannot do is to issue Draw commands (or compute dispatches) and update the buffers later; which is something you could do with D3D12 as long as the command buffer hasn't been submitted.

 

As for performance, if you use D3D11_MAP_WRITE_NO_OVERWRITE and then issue one D3D11_MAP_WRITE_DISCARD when bigBufferIsNotFull is false (do not forget to reset this bool! the pseudo code you posted doesn't reset it!) you'll be fine.

Also allocating everything dynamic in one big pool is fine. Just a few caveats to be aware:

  • Texture buffers you cannot use D3D11_MAP_WRITE_NO_OVERWRITE unless you're on D3D11.1 on Windows 8 or higher. You always have to issue D3D11_MAP_WRITE_DISCARD.
  • Discarding more than 4MB per frame overall will cause stalls on AMD drivers. And while NVIDIA drivers can handle more than 4MB, it will likely break in really bad ways (I've seen HW bugs to pop up)
  • In Ogre3D 2.1 we do the following on D3D11 systems (i.e. not D3D11.1 and Win 8):
    • Dynamic vertex & index buffers in one big dynamic pool with the no_overwrite / then discard pattern.
    • Dynamic const buffers separately; one API const buffer per "buffer" as in our representations. Though the ideal with D3D11 is to reuse the same const buffer over and over again using MAP DISCARD. We do not use many const buffers though.
    • Dynamic texture buffers also separately, one API tex buffer per "buffer" as in our representations.

 

Share this post


Link to post
Share on other sites
On 10/19/2017 at 12:55 PM, Matias Goldberg said:

What you cannot do is to issue Draw commands (or compute dispatches) and update the buffers later

Not sure what you mean here? Wouldn't this be issuing draw commands and then updating the buffers?

//Draw a Cube
UpdateBufferWithCubeData();
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Pyramid
UpdateBufferWithPyramidData();
graphicsDevice->deviceContext->Draw(pyramid.vertexCount, 0);

//Present everything
graphicsDevice->swapChain->Present(0, 0);

 

On 10/19/2017 at 12:55 PM, Matias Goldberg said:

one API const buffer per "buffer" as in our representations

one API tex buffer per "buffer" as in our representations

Not entirely sure what you mean here either. Do you mean that you have one const/tex buffer per set of APIs as in you have const buffer for the camera, a const buffer for setting colors on primitives, etc. Instead of having a singular const/tex buffer that could handle all of that?

On 10/19/2017 at 12:55 PM, Matias Goldberg said:

As for performance, if you use D3D11_MAP_WRITE_NO_OVERWRITE and then issue one D3D11_MAP_WRITE_DISCARD when bigBufferIsNotFull is false (do not forget to reset this bool! the pseudo code you posted doesn't reset it!) you'll be fine.

With the D3D11_MAP_WRITE_NO_OVERWRITE/D3D11_MAP_WRITE_DISCARD pattern or even in general is mesh/vertex data held in an intermediate place traditionally and then copied into the buffer?

I've seen a lot of tutorials like this one Lesson 5: Drawing a Triangle where they just place the data into an array and copy it into the buffer. Wasn't sure if this is just because its a beginners tutorial and they are showing the basics or if there is a better way to do it

Edited by noodleBowl

Share this post


Link to post
Share on other sites
2 hours ago, noodleBowl said:

Not sure what you mean here? Wouldn't this be issuing draw commands and then updating the buffers?


//Draw a Cube
UpdateBufferWithCubeData();
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Pyramid
UpdateBufferWithPyramidData();
graphicsDevice->deviceContext->Draw(pyramid.vertexCount, 0);

//Present everything
graphicsDevice->swapChain->Present(0, 0);

 

The example you posted is fine. What I meant is that you cannot do the following:

//Draw a Cube
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);
UpdateBufferWithCubeData(); //Update the cube that will be used in the draw above^

This is not valid in D3D11, but it is possible (with certain care taken) in D3D12 and Vulkan.

 

2 hours ago, noodleBowl said:

Not entirely sure what you mean here either. Do you mean that you have one const/tex buffer per set of APIs as in you have const buffer for the camera, a const buffer for setting colors on primitives, etc. Instead of having a singular const/tex buffer that could handle all of that?

No, I meant what is explained here and here. Basically the following is preferred:

//Draw a Cube
void *data = constBuffer->Map( DISCARD );
memcpy( data, ... );
bindVertexBuffer( constBuffer );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Sphere
data = constBuffer->Map( DISCARD );
memcpy( data, ... );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

over the following:

//Draw a Cube
void *data = constBuffer0->Map( DISCARD );
memcpy( data, ... );
bindVertexBuffer( constBuffer0 );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Sphere
data = constBuffer1->Map( DISCARD ); //Notice it's constBuffer1, not constBuffer0
memcpy( data, ... );
bindVertexBuffer( constBuffer1 );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

This difference makes sense if we're talking about lots of const buffer DISCARDS per frame (e.g. 20k const buffer discards per frame). It doesn't make a difference if you have like 20 const buffer discards per frame.

Btw I personally never have 20k const buffer discards, as I prefer to keep large data (such as world matrices) in texture buffers.

 

2 hours ago, noodleBowl said:

With the D3D11_MAP_WRITE_NO_OVERWRITE/D3D11_MAP_WRITE_DISCARD pattern or even in general is mesh/vertex data held in an intermediate place traditionally and then copied into the buffer?

I've seen a lot of tutorials like this one Lesson 5: Drawing a Triangle where they just place the data into an array and copy it into the buffer. Wasn't sure if this is just because its a beginners tutorial and they are showing the basics or if there is a better way to do it

This pattern is used with  D3D11_USAGE_DYNAMIC buffers. These buffers are visible to both CPU and GPU. This means that actual memory is either stored in GPU RAM and your writes from CPU go directly through the PCIE bus, or that the buffer is stored in CPU RAM and GPU reads fetch directly via the PCIE bus. Whether is one or the other is controlled by the driver, though probably D3D11_CPU_ACCESS_READ and D3D11_CPU_ACCESS_WRITE provide good hints (a buffer that needs read access will likely end up CPU side, a buffer that has no read access will likely end up GPU side, but this is not a guarantee!).

 

What you're saying about an intermediate place, must be done by hand via staging buffers. Create the buffer with D3D11_USAGE_STAGING instead of DYNAMIC. Staging buffers are visible to both CPU and GPU, but the GPU can only use them in copy operations.

The idea is that you copy to the staging area from CPU, and then you copy from staging area to the final GPU RAM that is only visible to the GPU (i.e. the final buffer was created with D3D11_USAGE_DEFAULT). Or vice versa as well (copy from GPU to staging area, then read from CPU).

There's a gotcha: with staging buffers you can't use D3D11_MAP_WRITE_NO_OVERWRITE nor D3D11_MAP_WRITE_DISCARD. But you have the D3D11_MAP_FLAG_DO_NOT_WAIT flag. If you get a DXGI_ERROR_WAS_STILL_DRAWING when you tried to map the staging buffer with this flag, then the GPU is not done yet copying from/to the staging buffer and you must use another one (i.e. create a new one, or reuse an old one from a pool).

What's the difference between STAGING and DYNAMIC approaches? The PCIE has lower bandwidth than GPU's dedicated memory (and probably higher latency). If you write from CPU once, and GPU reads that data once, then use DYNAMIC.

But if the data will be read by the GPU over and over again, you may end up fetching the data multiple times from CPU RAM through the PCIE; therefore use the STAGING approach to perform the transfer through the PCIE once, and then the data is kept in the fastest RAM available.

This advice holds for dedicated GPUs. Integrated GPUs using staging aggressively may hurt since there is no PCIE, you'll just be burning CPU RAM bandwidth doing useless copies.

And for reading GPU -> CPU, you have no choice but to use staging.

So it's a good idea to write a system that can switch between strategies based on what's faster depending on each system.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites
On 10/23/2017 at 12:30 AM, Matias Goldberg said:

What you're saying about an intermediate place, must be done by hand via staging buffers. Create the buffer with D3D11_USAGE_STAGING instead of DYNAMIC. Staging buffers are visible to both CPU and GPU, but the GPU can only use them in copy operations.

The idea is that you copy to the staging area from CPU, and then you copy from staging area to the final GPU RAM that is only visible to the GPU (i.e. the final buffer was created with D3D11_USAGE_DEFAULT). Or vice versa as well (copy from GPU to staging area, then read from CPU).

I'm not sure if the intermediate place you are describing and the one that I'm thinking about are the same based on the description of CPU to Staging area to GPU. I was thinking like this

//Declared in renderItem's class. Intermidiate place that holds the renderItem's vertex data
Vertex vertices[6];

//Init vertex data for the renderItem in its constructor
vertices[0] = Vertex(Vector3(0.0f, 0.0f,     -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[1] = Vertex(Vector3(0.0f, 100.0f,   -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[2] = Vertex(Vector3(100.0f, 100.0f, -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[3] = Vertex(Vector3(0.0f, 0.0f,     -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[4] = Vertex(Vector3(100.0f, 100.0f, -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[5] = Vertex(Vector3(100.0f, 0.0f,   -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));


//Somewhere else in the application outside of the renderItem class
//Create the vertex buffer
VertexBuffer *vertexBuffer = createVertexBuffer(D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE, sizeof(Vertex) * 6);

//Place the vertex data from the renderItem ( data in vertex array [the intermidiate place] ) in the vertex buffer
D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(D3D11_MAP_WRITE_DISCARD);
Vertex *data = (Vertex*)resource.pData;
data[0] = renderItem.vertices[0];
data[1] = renderItem.vertices[1];
data[2] = renderItem.vertices[2];
data[3] = renderItem.vertices[3];
data[4] = renderItem.vertices[4];
data[5] = renderItem.vertices[5];
vertexBuffer->unmap();

 

On 10/23/2017 at 12:30 AM, Matias Goldberg said:

(such as world matrices)

Wait! Speaking of model matrix (another name for the world matrix right?) when using this big buffer / D3D11_MAP_WRITE_NO_OVERWRITE and D3D11_MAP_WRITE_DISCARD pattern I need to pretransform everything before I put it in the big buffer don't I? In order to achieve 1 draw call per full buffer excluding any state changes (shader change, texture change, etc). Otherwise I need to issue multiple draw calls because of potential differences in the model matrix per renderable. Right?

I'm guessing this D3D11_MAP_WRITE_NO_OVERWRITE and D3D11_MAP_WRITE_DISCARD pattern is only good for certain situations such as rendering sprites and particles. Where as rendering things like meshes/models should use some other technique

Edited by noodleBowl

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Forum Statistics

    • Total Topics
      628282
    • Total Posts
      2981822
  • Similar Content

    • By GreenGodDiary
      I'm attempting to implement some basic post-processing in my "engine" and the HLSL part of the Compute Shader and such I think I've understood, however I'm at a loss at how to actually get/use it's output for rendering to the screen.
      Assume I'm doing something to a UAV in my CS:
      RWTexture2D<float4> InputOutputMap : register(u0); I want that texture to essentially "be" the backbuffer.
       
      I'm pretty certain I'm doing something wrong when I create the views (what I think I'm doing is having the backbuffer be bound as render target aswell as UAV and then using it in my CS):
       
      DXGI_SWAP_CHAIN_DESC scd; ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC)); scd.BufferCount = 1; scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT | DXGI_USAGE_UNORDERED_ACCESS; scd.OutputWindow = wndHandle; scd.SampleDesc.Count = 1; scd.Windowed = TRUE; HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &scd, &gSwapChain, &gDevice, NULL, &gDeviceContext); // get the address of the back buffer ID3D11Texture2D* pBackBuffer = nullptr; gSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&pBackBuffer); // use the back buffer address to create the render target gDevice->CreateRenderTargetView(pBackBuffer, NULL, &gBackbufferRTV); // set the render target as the back buffer CreateDepthStencilBuffer(); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); //UAV for compute shader D3D11_UNORDERED_ACCESS_VIEW_DESC uavd; ZeroMemory(&uavd, sizeof(uavd)); uavd.Format = DXGI_FORMAT_R8G8B8A8_UNORM; uavd.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D; uavd.Texture2D.MipSlice = 1; gDevice->CreateUnorderedAccessView(pBackBuffer, &uavd, &gUAV); pBackBuffer->Release();  
      After I render the scene, I dispatch like this:
      gDeviceContext->OMSetRenderTargets(0, NULL, NULL); m_vShaders["cs1"]->Bind(); gDeviceContext->CSSetUnorderedAccessViews(0, 1, &gUAV, 0); gDeviceContext->Dispatch(32, 24, 0); //hard coded ID3D11UnorderedAccessView* nullview = { nullptr }; gDeviceContext->CSSetUnorderedAccessViews(0, 1, &nullview, 0); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); gSwapChain->Present(0, 0); Worth noting is the scene is rendered as usual, but I dont get any results from the CS (simple gaussian blur)
      I'm sure it's something fairly basic I'm doing wrong, perhaps my understanding of render targets / views / what have you is just completely wrong and my approach just makes no sense.

      If someone with more experience could point me in the right direction I would really appreciate it!

      On a side note, I'd really like to learn more about this kind of stuff. I can really see the potential of the CS aswell as rendering to textures and using them for whatever in the engine so I would love it if you know some good resources I can read about this!

      Thank you <3
       
      P.S I excluded the .hlsl since I cant imagine that being the issue, but if you think you need it to help me just ask

      P:P:S. As you can see this is my first post however I do have another account, but I can't log in with it because gamedev.net just keeps asking me to accept terms and then logs me out when I do over and over
    • By noodleBowl
      I was wondering if anyone could explain the depth buffer and the depth stencil state comparison function to me as I'm a little confused
      So I have set up a depth stencil state where the DepthFunc is set to D3D11_COMPARISON_LESS, but what am I actually comparing here? What is actually written to the buffer, the pixel that should show up in the front?
      I have these 2 quad faces, a Red Face and a Blue Face. The Blue Face is further away from the Viewer with a Z index value of -100.0f. Where the Red Face is close to the Viewer with a Z index value of 0.0f.
      When DepthFunc is set to D3D11_COMPARISON_LESS the Red Face shows up in front of the Blue Face like it should based on the Z index values. BUT if I change the DepthFunc to D3D11_COMPARISON_LESS_EQUAL the Blue Face shows in front of the Red Face. Which does not make sense to me, I would think that when the function is set to D3D11_COMPARISON_LESS_EQUAL the Red Face would still show up in front of the Blue Face as the Z index for the Red Face is still closer to the viewer
      Am I thinking of this comparison function all wrong?
      Vertex data just in case
      //Vertex date that make up the 2 faces Vertex verts[] = { //Red face Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), //Blue face Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), };  
    • By Rannion
      Hi,
      I'm trying to fill a win64 Console with ASCII char.
      At the moment I have 2 solutions: one using std::cout for each line, let's say 30 lines at once using std::endl at the end of each one.
      The second solution is using FillConsoleOutputCharacter. This method seems a lot more robust and with less flickering. But I'm guessing, internally it's using a different table than the one used by std::cout. I'm trying to fill the console with the unsigned char 0xB0 which is a sort of grey square when I use std::cout but when using FillConsoleOutputCharacter it is outputted as the UTF8 char '°'.
      I tried using SetConsoleOutputCP before but could not find a proper way to force it to only use the non-extended ASCII code page...
      Has anyone a hint on this one?
      Cheers!
    • By Vortez
      Hi guys, i know this is stupid but i've been trying to convert this block of asm code in c++ for an hour or two and im stuck
      ////////////////////////////////////////////////////////////////////////////////////////////// /////// This routine write the value returned by GetProcAddress() at the address p /////////// ////////////////////////////////////////////////////////////////////////////////////////////// bool SetProcAddress(HINSTANCE dll, void *p, char *name) { UINT *res = (UINT*)ptr; void *f = GetProcAddress(dll, name); if(!f) return false; _asm { push ebx push edx mov ebx, f mov edx, p mov [ebx], edx // <--- put edx at the address pointed by ebx pop edx pop ebx } return res != 0; } ... // ie: SetProcAddress(hDll, &some_function, "function_name"); I tried:
      memcmp(p, f, sizeof(p)); and UINT *i1 = (*UINT)p; UINT *i2 = (*UINT)f; *f = *p; The first one dosent seem to give the right retult, and the second one won't compile.
      Any idea?
  • Popular Now