Jump to content
  • Advertisement

chiffre

Member
  • Content count

    5
  • Joined

  • Last visited

Community Reputation

0 Neutral

About chiffre

  • Rank
    Newbie

Personal Information

  • Role
    Programmer
  • Interests
    Education
    Programming
  1. Thanks so much for this post. The information is very valuable to me, even if I only pursue D3D11 projects as a hobby (to learn C++ in the process of writing a little game engine). I hope I didn't, quite literally, ask for too much here. Quick edit: in the second third of your post you mention this: I have tried out rendering with structured buffers via pulling vertices directly from the buffer with SV_VertexID, and while I couldn't find a performance difference between rendering from structured buffers and the standard method (D3D11_BIND_SHADER_RESOURCE vs D3D11_BIND_VERTEX_BUFFER etc.) I am curious if I understand you correctly here, as I think UAVs and structured buffers are similar and this info could be quite relevant to me. What I understand is: it is not given, that all the DXGI_FORMAT_'s that are supported by the IA or the equivalent preamble in the shader-code for traditional vertex-buffer usage (D3D11_BIND_VERTEX_BUFFER) are also supported when loading data from structured buffers or UAVs in the shader.
  2. Introduction: In general my questions pertain to the differences between floating- and fixed-point data. Additionally I would like to understand when it can be advantageous to prefer fixed-point representation over floating-point representation in the context of vertex data and how the hardware deals with the different data-types. I believe I should be able to reduce the amount of data (bytes) necessary per vertex by choosing the most opportune representations for my vertex attributes. Thanks ahead of time if you, the reader, are considering the effort of reading this and helping me. I found an old topic that shows this is possible in principal, but I am not sure I understand what the pitfalls are when using fixed-point representation and whether there are any hardware-based performance advantages/disadvantages. (TLDR at bottom) The Actual Post: To my understanding HLSL/D3D11 offers not just the traditional floating point model in half-,single-, and double-precision, but also the fixed-point model in form of signed/unsigned normalized integers in 8-,10-,16-,24-, and 32-bit variants. Both models offer a finite sequence of "grid-points". The obvious difference between the two models is that the fixed-point model offers a constant spacing between values in the normalized range of [0,1] or [-1,1], while the floating point model allows for smaller "deltas" as you get closer to 0, and larger "deltas" the further you are away from 0. To add some context, let me define a struct as an example: struct VertexData { float[3] position; //3x32-bits float[2] texCoord; //2x32-bits float[3] normals; //3x32-bits } //Total of 32 bytes Every vertex gets a position, a coordinate on my texture, and a normal to do some light calculations. In this case we have 8x32=256bits per vertex. Since the texture coordinates lie in the interval [0,1] and the normal vector components are in the interval [-1,1] it would seem useful to use normalized representation as suggested in the topic linked at the top of the post. The texture coordinates might as well be represented in a fixed-point model, because it seems most useful to be able to sample the texture in a uniform manner, as the pixels don't get any "denser" as we get closer to 0. In other words the "delta" does not need to become any smaller as the texture coordinates approach (0,0). A similar argument can be made for the normal-vector, as a normal vector should be normalized anyway, and we want as many points as possible on the sphere around (0,0,0) with a radius of 1, and we don't care about precision around the origin. Even if we have large textures such as 4k by 4k (or the maximum allowed by D3D11, 16k by 16k) we only need as many grid-points on one axis, as there are pixels on one axis. An unsigned normalized 14 bit integer would be ideal, but because it is both unsupported and impractical, we will stick to an unsigned normalized 16 bit integer. The same type should take care of the normal vector coordinates, and might even be a bit overkill. struct VertexData { float[3] position; //3x32-bits uint16_t[2] texCoord; //2x16bits uint16_t[3] normals; //3x16bits } //Total of 22 bytes Seems like a good start, and we might even be able to take it further, but before we pursue that path, here is my first question: can the GPU even work with the data in this format, or is all I have accomplished minimizing CPU-side RAM usage? Does the GPU have to convert the texture coordinates back to a floating-point model when I hand them over to the sampler in my pixel shader? I have looked up the data types for HLSL and I am not sure I even comprehend how to declare the vertex input type in HLSL. Would the following work? struct VertexInputType { float3 pos; //this one is obvious unorm half2 tex; //half corresponds to a 16-bit float, so I assume this is wrong, but this the only 16-bit type I found on the linked MSDN site snorm half3 normal; //same as above } I assume this is possible somehow, as I have found input element formats such as: DXGI_FORMAT_R16G16B16A16_SNORM and DXGI_FORMAT_R16G16B16A16_UNORM (also available with a different number of components, as well as different component lengths). I might have to avoid 3-component vectors because there is no 3-component 16-bit input element format, but that is the least of my worries. The next question would be: what happens with my normals if I try to do lighting calculations with them in such a normalized-fixed-point format? Is there no issue as long as I take care not to mix floating- and fixed-point data? Or would that work as well? In general this gives rise to the question: how does the GPU handle fixed-point arithmetic? Is it the same as integer-arithmetic, and/or is it faster/slower than floating-point arithmetic? Assuming that we still have a valid and useful VertexData format, how far could I take this while remaining on the sensible side of what could be called optimization? Theoretically I could use the an input element format such as DXGI_FORMAT_R10G10B10A2_UNORM to pack my normal coordinates into a 10-bit fixed-point format, and my verticies (in object space) might even be representable in a 16-bit unsigned normalized fixed-point format. That way I could end up with something like the following struct: struct VertexData { uint16_t[3] pos; //3x16bits uint16_t[2] texCoord; //2x16bits uint32_t packedNormals; //10+10+10+2bits } //Total of 14 bytes Could I use a vertex structure like this without too much performance-loss on the GPU-side? If the GPU has to execute some sort of unpacking algorithm in the background I might as well let it be. In the end I have a functioning deferred renderer, but I would like to reduce the memory footprint of the huge amount of vertecies involved in rendering my landscape. TLDR: I have a lot of vertices that I need to render and I want to reduce the RAM-usage without introducing crazy compression/decompression algorithms to the CPU or GPU. I am hoping to find a solution by involving fixed-point data-types, but I am not exactly sure how how that would work.
  3. You are absolutely correct, I feel like an idiot now. Should have looked that up/tried it in code before. At least I learned something. This pretty much concludes my barrage of questions, so I'll get back to the practical side. Thanks again to everyone!
  4. First of all, thanks for all the answers! I see that I only have indirect influence on the residency of resources, at least in D3D11. The way I understand the first underlined portion from SoldierOfLight's post, means that I can trust the video memory manager to "do the right thing" as long as I don't (how do I word this? ask the D3D11 API politely? ) attempt to overcommit VRAM. In the scenario when "something else needs to be there instead" the video memory manager will prioritize resources based on my usage/access flags and change residencies as it thinks appropriate. To the second underlined bit: "DEFAULT resources with CPU access flags generally don't reside in VRAM". Does this mean I might as well use a dynamic buffer with cpu access flags because I can take advantage of Map/Unmap over UpdateSubResource?
  5. Heads up: this question is more theoretical than practical. My (minute) knowledge about D3D11 is self taught, so please take any premise I make with additional care. I invite everyone to correct anything I say. Now to the actual post. I have a question about the lifetime of a D3D11_USAGE_DEFAULT buffer, used with a D3D11ShaderResourceView as a StructuredBuffer, in GPU memory. At first I need to make sure I am understanding the difference between DEFAULT and DYNAMIC buffers correctly. The way I understand the difference between DEFAULT and DYNAMIC buffers comes from here: D3D11_USAGE_DEFAULT D3D11_USAGE_DEFAULT tells the API to store my buffer in memory that is fast to access for the GPU. This does absolutely not guarantee (?) it is located in VRAM, however it is more likely to be located there. I can update the buffer (partially) by using UpdateSubResource. Here is some info from the previously mentioned thread. D3D11_USAGE_DYNAMIC D3D11_USAGE_DYNAMIC tells the API to store my buffer in memory that is fast to access for the CPU. This guarantees (?) it will be located on system RAM and not VRAM. Whenever the GPU needs to access the data it will upload the data to VRAM. Assuming the hardware can handle buffers larger than 128mB (see footnote 1 on here) this theoretically means the size of the buffer is limited by the amount of data can be transferred from CPU memory to GPU memory in the desired frametime. An estimate for the upper boundary, ignoring time necessary for actually processing the data, would be the PCIE bandwidth available to the GPU divided by the desired framerate (can we estimate a more precise upper boundary?). I can update the buffer using Map/Unmap with one of the following flags: D3D11_MAP_WRITE D3D11_MAP_READ_WRITE D3D11_MAP_WRITE_DISCARD D3D11_MAP_WRITE_NO_OVERWRITE (D3D11_MAP_READ <- this would not be for updating, but simply for reading) Nvidia suggests to use D3D11_MAP_WRITE_DISCARD (for constant buffers). The reason for this (as I understand from here) is that buffers may still be in use when you are trying to update them, and MAP_WRITE_DISCARD will let you write to a different region of memory so that the GPU can discard the old buffer when it is done with it, and grab the new one when it needs it. All of this is still under my personal, possibly wrong, premise that the USAGE_DYNAMIC buffer is stored in system RAM and grabbed by the GPU over PCIE lanes when it needs it. If I were to use MAP_WRITE_NO_OVERWRITE, I could write to the buffer that is in use, but I would have to guarantee that my implementation does not overwrite anything the GPU is currently using. I assume something undefined happens otherwise. Here I really would need to understand the intricacies of how DX11 manages CPU/GPU memory. So if you happen to know about these intricacies in relation to the map flags, please share your knowledge. Back to my initial question: A structured buffer is nothing but an ID3D11Buffer wrapped by an ID3D11ShaderResourceView. As I understand, this means the memory management by D3D11 should be no different. Of course that assumption could be fatally flawed, but that is why I am posting here asking for help. Nonetheless I have to bind and unbind ShaderResources, for example for the vertex shader via VSSetShaderResources. How is binding/unbinding (both implicitly by binding a new resource, or implicitly by binding a nullptr) related to the memory management of my ID3D11Buffer by the D3D11 API? Assuming I have used a USAGE_DEFAULT buffer, then I would hope my structured buffer stays in VRAM until I Release() the resources explicitly. Meaning I can bind/unbind without the cost of having to move the buffer from RAM to VRAM. I guess this question can be generalized to the following: do I ever get a guarantee from D3D11 that something is stored in VRAM until I decide to remove/release it? Of course I still need clarification/answers for the rest of the questions in my post, but my difficulties with D3D11 are summarized by a lack of understanding of the lifetime of objects in VRAM, and how I can influence these lifetimes. Thanks for reading this far, hope someone can help me.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!