Sign in to follow this  

Drawing with a nullptr vertex and index buffer in D3D11

Recommended Posts

Apparently, you can render a quad directly in d3d10< without using vertices and indices. The approach is:

"In order to render a full screen quad, you will need to set both index and vertex buffers to null. Set the topology to triangle strip and call Draw with four vertices starting from position zero."

device_context->IASetVertexBuffers(0, 1, nullptr, {???}, {0});

device_context->IASetIndexBuffer(nullptr, ???, 0);

device_context->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP);

device_context->Draw(4, 0);

What index format and stride size do you need to use?

 

Is it a good practice to use this approach for other "frequently" used meshes as well (such as a cube of lines for visualizing bounding boxes)?

Edited by matt77hias

Share this post


Link to post
Share on other sites

You don't need to unbind the buffers, just the input layout. And also make sure that your shaders don't try to access the buffers. Skip the vertex buffer read by leaving the input parameters of the vertex shader. You can still use system generated attributes like SV_VertexID or SV_InstanceID. You can skip reading the index buffer by just calling a regular Draw() instead of DrawIndexed().

I am using this technique very frequently, for debug geometries, full screen triangles (better than full screen quad!) and deferred light volumes for example.

Edited by turanszkij

Share this post


Link to post
Share on other sites
6 minutes ago, turanszkij said:

better than full screen quad!

What do you mean with "better"? You need the quad for the unpacking in deferred shading?

 

8 minutes ago, turanszkij said:

You don't need to unbind the buffers, just the input layout

And the Topology?

Assuming, I have a VS with signature VS(SV_VertexID), couldn't I just use the InputLayout obtained while creating the VS? Or does it have to be nullptr? 

Share this post


Link to post
Share on other sites

You will still need a correct topology, that is something you can't unbind. You are probably good with the input layout reflected from the shader, though I am not used to doing that. I have few layouts which I create by hand and shared across different shaders.

A full screen triangle is also good for unpacking the deferred G-buffer, and even better than a quad because a quad has worse pixel occupancy along the intersection of triangles. This means that the pixel shader could be running on pixels which will be discarded later (these are called helper pixels because pixel shaders actually run in 2x2 pixel blocks so that derivatives can be obtained for instance). Though it probably is not a real problem. :) 

Share this post


Link to post
Share on other sites
23 minutes ago, matt77hias said:

What do you mean with "better"? You need the quad for the unpacking in deferred shading?

Look at fullscreen triangle here: 

It also has the null vertex buffer trick, but no example.

Share this post


Link to post
Share on other sites

This is all pretty standard, you just use sv_vertexid and build whatever you want from it :

 

1 hour ago, turanszkij said:

You will still need a correct topology, that is something you can't unbind. You are probably good with the input layout reflected from the shader, though I am not used to doing that. I have few layouts which I create by hand and shared across different shaders.

A full screen triangle is also good for unpacking the deferred G-buffer, and even better than a quad because a quad has worse pixel occupancy along the intersection of triangles. This means that the pixel shader could be running on pixels which will be discarded later (these are called helper pixels because pixel shaders actually run in 2x2 pixel blocks so that derivatives can be obtained for instance). Though it probably is not a real problem.  

The triangle trick does not apply anymore, it was a ps3/x360 era trick. GPUs this days apply the triangle clipping earlier and you end up with two triangle anyway.

Share this post


Link to post
Share on other sites
49 minutes ago, galop1n said:

The triangle trick does not apply anymore, it was a ps3/x360 era trick. GPUs this days apply the triangle clipping earlier and you end up with two triangle anyway.

Are you sure about that?  Do you have a source?  The document I posted was from 2013 and 14... it must've still been relevent.  Also aren't triangles rasterized in tiles?  Implying that when the triangle trick is used it stays one triangle.

Share this post


Link to post
Share on other sites
5 hours ago, Infinisearch said:

Are you sure about that?  Do you have a source?  The document I posted was from 2013 and 14... it must've still been relevent.  Also aren't triangles rasterized in tiles?  Implying that when the triangle trick is used it stays one triangle.

I do not have anything public that i can think off, this is knowledge from observing gpu counters with very detailed profiler at work. To tip it, the triangle trick was possible thanks to the guard band settings on x360/ps3, and it does not exist anymore. Of course, what i say i very AMD/GCN related, take it with a pinch of salt before transposing to nVidia or intel !

Share this post


Link to post
Share on other sites
6 hours ago, galop1n said:

I do not have anything public that i can think off, this is knowledge from observing gpu counters with very detailed profiler at work. To tip it, the triangle trick was possible thanks to the guard band settings on x360/ps3, and it does not exist anymore. Of course, what i say i very AMD/GCN related, take it with a pinch of salt before transposing to nVidia or intel !

That's not my experience of full-screen triangles on GCN.

Using probably the same tools you're talking about, I can see that a single full-screen triangle at 1080p on an Xbox One generates exactly 32,400 waves and the Radeon RGP Profiler measures exactly 14,400 waves for a 720p full-screen triangle on PC.

Two triangles produces 14,433 waves for 720p on PC and 32,446 waves for 1080p on Xbox.

YMMV on other consoles, but it's not an AMD/GCN thing, and it's definitely not true to say it won't work on AMD GPUs on Windows.

Share this post


Link to post
Share on other sites
1 hour ago, matt77hias said:

What is one wave?

 A wave is a group of threads that run in lockstep with one another when running shaders. NVIDIA call it a "Warp", it's also called a "Wavefront".

On NVIDIA hardware this has traditionally been 32 threads all running in lockstep, on AMD GCN hardware this number is 64.

To render a full-screen triangle at 1920x1080 you have to render 2,073,600 pixels. Once those get batched up into 'waves' (groups) of 64, you require 32,400 of them to render the entire screen. If there's any inefficiency due to using two triangles that would increase the number of 2x2 quads that get launched and consequently the number of waves would go up as well. Even at 720p the amount of waste threads is only 0.2% and at 1080p it's even lower at 0.14%.

Share this post


Link to post
Share on other sites
22 hours ago, turanszkij said:

You are probably good with the input layout reflected from the shader, though I am not used to doing that. I have few layouts which I create by hand and shared across different shaders.

Apparently, no input layout needs to be created. You can just use and bind nullptr.

I use the same template class (but with different instantiations) for all shader types except for the vertex shader since I store the input layout (which feels a bit redundant) as well.

Share this post


Link to post
Share on other sites

@galop1n

Could you please respond to the following post by ajmiles:

14 hours ago, ajmiles said:

That's not my experience of full-screen triangles on GCN.

Using probably the same tools you're talking about, I can see that a single full-screen triangle at 1080p on an Xbox One generates exactly 32,400 waves and the Radeon RGP Profiler measures exactly 14,400 waves for a 720p full-screen triangle on PC.

Two triangles produces 14,433 waves for 720p on PC and 32,446 waves for 1080p on Xbox.

YMMV on other consoles, but it's not an AMD/GCN thing, and it's definitely not true to say it won't work on AMD GPUs on Windows.

I very curious what the consensus on this technique is, 

Share this post


Link to post
Share on other sites

I'm personally using the fullscreen triangle trick. I haven't profiled it against the fullscreen quad (two triangles with a diagonal seam).

While we're on the topic, NV has a vendor-specific extension that rasterizes every pixel within the 2D AABB of a primitive -- so if you drew a triangle that covered one diagonal half of the screen, it's AABB would cover the whole screen, and if this extension is active at the time, your shader would run for every pixel.

Another option is to implement your full-screen passes via compute shaders ;)

Share this post


Link to post
Share on other sites

Hum, i would have to double check, if someone say it works, then i don't see the point to camp on my position. It is possible we set a scissor rectangle to the size of the viewport by convenience or something like that and it screw up the triangle.

I would not even dare fixing it if it is that, the gain are unlikely to be at the level of noise in the profiling, and most of our full screen pass are computes. The later is not always the best choice, to be fair, as it is a trade of L2 cache bandwidth versus the ROP, manual SRGB conversions and a few other considerations, but full screen passes are rarely alone and we can save a few flush/invalidation by staying compute all the way.

Share this post


Link to post
Share on other sites
4 hours ago, matt77hias said:

What would be the benefits of the CS? Is it just skipping the pipeline until PS (VS and RS, specifically)?

The upside is that you skip the IA->VS->RS->PS->OM pipeline with all the state setup involved and instead you have a simple CS pipeline. In the compute shader, it is you who assingns the workload, so there should be no confusion as to how the threads are dispatched. CS pipeline can sometimes be run in parallel with the graphics pipeline in newer APIs (I've no experience with that yet).

The downside is that if you are running CS on the graphics pipe, you have to switch execution from the rasterizing pipeline to compute which involves flushing the graphics buffer, then executing your compute, waiting for it to finish and resuming your graphics processing after that

Share this post


Link to post
Share on other sites

The other advantage is that the CS allows for implementation of different styles of algorithms that aren't possible in a PS.

The CS has access to (up to) 32KiB of local storage within a workgroup, which allows different threads/pixels to share data with each other.

e.g. for a 3x3 pixel blur filter in a PS, every pixel will fetch 9 texels from the source texture, blend them, and output a single result.
In a CS implementation, you might get 64 threads to read a 10x10 area of pixels into the local store (~1.5 texel reads per pixel instead of 9!), and then each thread can fetch the 9 texels that it requries from the local store (which is basically free compared to reading from memory), blend them, and output a single result each / an 8x8 area of results total.

Also, compute shaders don't hard-code the destination / result location. You can write code that works like a normal PS, where each thread outputs to a specific pixel on the screen... or you can write code where each thread writes to a random pixel, or multiple pixels, or writes to a pixel using a dynamic offset (scattered writes). 
e.g. the normal way of writing a blur is to "gather" (above, each destination pixel reads from 9 source pixels), but a "scatter" version would have every source pixel write to 9 destination pixels!

When you first start using CS instead of PS, you get very small improvements (e.g. from skipping the IA->VS->RS costs)... but afterwards, many new doors open for completely different approaches to problems.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this