
Advertisement

Popular Tags

Popular Now

Advertisement

Similar Content

By chiffre
Introduction:
In general my questions pertain to the differences between floating and fixedpoint data. Additionally I would like to understand when it can be advantageous to prefer fixedpoint representation over floatingpoint representation in the context of vertex data and how the hardware deals with the different datatypes. I believe I should be able to reduce the amount of data (bytes) necessary per vertex by choosing the most opportune representations for my vertex attributes. Thanks ahead of time if you, the reader, are considering the effort of reading this and helping me.
I found an old topic that shows this is possible in principal, but I am not sure I understand what the pitfalls are when using fixedpoint representation and whether there are any hardwarebased performance advantages/disadvantages.
(TLDR at bottom)
The Actual Post:
To my understanding HLSL/D3D11 offers not just the traditional floating point model in half,single, and doubleprecision, but also the fixedpoint model in form of signed/unsigned normalized integers in 8,10,16,24, and 32bit variants. Both models offer a finite sequence of "gridpoints". The obvious difference between the two models is that the fixedpoint model offers a constant spacing between values in the normalized range of [0,1] or [1,1], while the floating point model allows for smaller "deltas" as you get closer to 0, and larger "deltas" the further you are away from 0.
To add some context, let me define a struct as an example:
struct VertexData { float[3] position; //3x32bits float[2] texCoord; //2x32bits float[3] normals; //3x32bits } //Total of 32 bytes Every vertex gets a position, a coordinate on my texture, and a normal to do some light calculations. In this case we have 8x32=256bits per vertex. Since the texture coordinates lie in the interval [0,1] and the normal vector components are in the interval [1,1] it would seem useful to use normalized representation as suggested in the topic linked at the top of the post. The texture coordinates might as well be represented in a fixedpoint model, because it seems most useful to be able to sample the texture in a uniform manner, as the pixels don't get any "denser" as we get closer to 0. In other words the "delta" does not need to become any smaller as the texture coordinates approach (0,0). A similar argument can be made for the normalvector, as a normal vector should be normalized anyway, and we want as many points as possible on the sphere around (0,0,0) with a radius of 1, and we don't care about precision around the origin. Even if we have large textures such as 4k by 4k (or the maximum allowed by D3D11, 16k by 16k) we only need as many gridpoints on one axis, as there are pixels on one axis. An unsigned normalized 14 bit integer would be ideal, but because it is both unsupported and impractical, we will stick to an unsigned normalized 16 bit integer. The same type should take care of the normal vector coordinates, and might even be a bit overkill.
struct VertexData { float[3] position; //3x32bits uint16_t[2] texCoord; //2x16bits uint16_t[3] normals; //3x16bits } //Total of 22 bytes Seems like a good start, and we might even be able to take it further, but before we pursue that path, here is my first question: can the GPU even work with the data in this format, or is all I have accomplished minimizing CPUside RAM usage? Does the GPU have to convert the texture coordinates back to a floatingpoint model when I hand them over to the sampler in my pixel shader? I have looked up the data types for HLSL and I am not sure I even comprehend how to declare the vertex input type in HLSL. Would the following work?
struct VertexInputType { float3 pos; //this one is obvious unorm half2 tex; //half corresponds to a 16bit float, so I assume this is wrong, but this the only 16bit type I found on the linked MSDN site snorm half3 normal; //same as above } I assume this is possible somehow, as I have found input element formats such as: DXGI_FORMAT_R16G16B16A16_SNORM and DXGI_FORMAT_R16G16B16A16_UNORM (also available with a different number of components, as well as different component lengths). I might have to avoid 3component vectors because there is no 3component 16bit input element format, but that is the least of my worries. The next question would be: what happens with my normals if I try to do lighting calculations with them in such a normalizedfixedpoint format? Is there no issue as long as I take care not to mix floating and fixedpoint data? Or would that work as well? In general this gives rise to the question: how does the GPU handle fixedpoint arithmetic? Is it the same as integerarithmetic, and/or is it faster/slower than floatingpoint arithmetic?
Assuming that we still have a valid and useful VertexData format, how far could I take this while remaining on the sensible side of what could be called optimization? Theoretically I could use the an input element format such as DXGI_FORMAT_R10G10B10A2_UNORM to pack my normal coordinates into a 10bit fixedpoint format, and my verticies (in object space) might even be representable in a 16bit unsigned normalized fixedpoint format. That way I could end up with something like the following struct:
struct VertexData { uint16_t[3] pos; //3x16bits uint16_t[2] texCoord; //2x16bits uint32_t packedNormals; //10+10+10+2bits } //Total of 14 bytes Could I use a vertex structure like this without too much performanceloss on the GPUside? If the GPU has to execute some sort of unpacking algorithm in the background I might as well let it be. In the end I have a functioning deferred renderer, but I would like to reduce the memory footprint of the huge amount of vertecies involved in rendering my landscape.
TLDR: I have a lot of vertices that I need to render and I want to reduce the RAMusage without introducing crazy compression/decompression algorithms to the CPU or GPU. I am hoping to find a solution by involving fixedpoint datatypes, but I am not exactly sure how how that would work.

By Nikita Sidorenko
I'm making render just for fun (c++, opengl)
Want to add decals support. Here what I found
A couple of slides from doom
http://advances.realtimerendering.com/s2016/Siggraph2016_idTech6.pdf Decals but deferred
http://martindevans.me/gamedevelopment/2015/02/27/DrawingStuff… spaceDecals/
No implementation details here
https://turanszkij.wordpress.com/2017/10/12/forwarddecalrendering/
As I see there should be a list of decals for each tile same as for light sources. But what to do next?
Let assume that all decals are packed into a spritesheet. Decal will substitute diffuse and normal.
 What data should be stored for each decal on the GPU?
 Articles above describe decals as OBB. Why OBB if decals seem to be flat?
 How to actually render a decal during object render pass (since it's forward)? Is it projected somehow? Don't understand this part completely.
Are there any papers for this topic?

By cozzie
Hi all,
I was wondering it it matters in which order you draw 2D and 3D items, looking at the BeginDraw/EndDraw calls on a D2D rendertarget.
The order in which you do the actual draw calls is clear, 3D first then 2D, means the 2D (DrawText in this case) is in front of the 3D scene.
The question is mainly about when to call the BeginDraw and EndDraw.
Note that I'm drawing D2D stuff through a DXGI surface linked to the 3D RT.
Option 1:
A  Begin frame, clear D3D RT
B  Draw 3D
C  BeginDraw D2D RT
D  Draw 2D
E  EndDraw D2D RT
F  Present
Option 2:
A  Begin frame, clear D3D RT + BeginDraw D2D RT
B  Draw 3D
C  Draw 2D
D  EndDraw D2D RT
E Present
Would there be a difference (performance/issue?) in using option 2? (versus 1)
Any input is appreciated.

By MingLun "Allen" Chou
Here is the original blog post.
Edit: Sorry, I can't get embedded LaTeX to display properly.
The pinned tutorial post says I have to do it in plain HTML without embedded images?
I actually tried embedding prerendered equations and they seemed fine when editing,
but once I submit the post it just turned into a huge mess.
So...until I can find a proper way to fix this, please refer to the original blog post for formatted formulas.
I've replaced the original LaTex mess in this post with something at least more readable.
Any advice on fixing this is appreciated.
This post is part of my Game Math Series.
Source files are on GitHub.
Shortcut to sterp implementation.
Shortcut to code used to generate animations in this post.
An Alternative to Slerp
Slerp, spherical linear interpolation, is an operation that interpolates from one orientation to another, using a rotational axis paired with the smallest angle possible.
Quick note: Jonathan Blow explains here how you should avoid using slerp, if normalized quaternion linear interpolation (nlerp) suffices. Long store short, nlerp is faster but does not maintain constant angular velocity, while slerp is slower but maintains constant angular velocity; use nlerp if you’re interpolating across small angles or you don’t care about constant angular velocity; use slerp if you’re interpolating across large angles and you care about constant angular velocity. But for the sake of using a more commonly known and used building block, the remaining post will only mention slerp. Replacing all following occurrences of slerp with nlerp would not change the validity of this post.
In general, slerp is considered superior over interpolating individual components of Euler angles, as the latter method usually yields orientational sways.
But, sometimes slerp might not be ideal. Look at the image below showing two different orientations of a rod. On the left is one orientation, and on the right is the resulting orientation of rotating around the axis shown as a cyan arrow, where the pivot is at one end of the rod.
If we slerp between the two orientations, this is what we get:
Mathematically, slerp takes the “shortest rotational path”. The quaternion representing the rod’s orientation travels along the shortest arc on a 4D hyper sphere. But, given the rod’s elongated appearance, the rod’s moving end seems to be deviating from the shortest arc on a 3D sphere.
My intended effect here is for the rod’s moving end to travel along the shortest arc in 3D, like this:
The difference is more obvious if we compare them sidebyside:
This is where swingtwist decomposition comes in.
SwingTwist Decomposition
SwingTwist decomposition is an operation that splits a rotation into two concatenated rotations, swing and twist. Given a twist axis, we would like to separate out the portion of a rotation that contributes to the twist around this axis, and what’s left behind is the remaining swing portion.
There are multiple ways to derive the formulas, but this particular one by Michaele Norel seems to be the most elegant and efficient, and it’s the only one I’ve come across that does not involve any use of trigonometry functions. I will first show the formulas now and then paraphrase his proof later:
Given a rotation represented by a quaternion R = [W_R, vec{V_R}] and a twist axis vec{V_T}, combine the scalar part from R the projection of vec{V_R} onto vec{V_T} to form a new quaternion: T = [W_R, proj_{vec{V_T}}(vec{V_R})]. We want to decompose R into a swing component and a twist component. Let the S denote the swing component, so we can write R = ST. The swing component is then calculated by multiplying R with the inverse (conjugate) of T: S= R T^{1} Beware that S and T are not yet normalized at this point. It's a good idea to normalize them before use, as unit quaternions are just cuter. Below is my code implementation of swingtwist decomposition. Note that it also takes care of the singularity that occurs when the rotation to be decomposed represents a 180degree rotation. public static void DecomposeSwingTwist ( Quaternion q, Vector3 twistAxis, out Quaternion swing, out Quaternion twist ) { Vector3 r = new Vector3(q.x, q.y, q.z); // singularity: rotation by 180 degree if (r.sqrMagnitude < MathUtil.Epsilon) { Vector3 rotatedTwistAxis = q * twistAxis; Vector3 swingAxis = Vector3.Cross(twistAxis, rotatedTwistAxis); if (swingAxis.sqrMagnitude > MathUtil.Epsilon) { float swingAngle = Vector3.Angle(twistAxis, rotatedTwistAxis); swing = Quaternion.AngleAxis(swingAngle, swingAxis); } else { // more singularity: // rotation axis parallel to twist axis swing = Quaternion.identity; // no swing } // always twist 180 degree on singularity twist = Quaternion.AngleAxis(180.0f, twistAxis); return; } // meat of swingtwist decomposition Vector3 p = Vector3.Project(r, twistAxis); twist = new Quaternion(p.x, p.y, p.z, q.w); twist = Normalize(twist); swing = q * Quaternion.Inverse(twist); } Now that we have the means to decompose a rotation into swing and twist components, we need a way to use them to interpolate the rod’s orientation, replacing slerp.
SwingTwist Interpolation
Replacing slerp with the swing and twist components is actually pretty straightforward. Let the Q_0 and Q_1 denote the quaternions representing the rod's two orientations we are interpolating between. Given the interpolation parameter t, we use it to find "fractions" of swing and twist components and combine them together. Such fractiona can be obtained by performing slerp from the identity quaternion, Q_I, to the individual components. So we replace: Slerp(Q_0, Q_1, t) with: Slerp(Q_I, S, t) Slerp(Q_I, T, t) From the rod example, we choose the twist axis to align with the rod's longest side. Let's look at the effect of the individual components Slerp(Q_I, S, t) and Slerp(Q_I, T, t) as t varies over time below, swing on left and twist on right:
And as we concatenate these two components together, we get a swingtwist interpolation that rotates the rod such that its moving end travels in the shortest arc in 3D. Again, here is a sidebyside comparison of slerp (left) and swingtwist interpolation (right):
I decided to name my swingtwist interpolation function sterp. I think it’s cool because it sounds like it belongs to the function family of lerp and slerp. Here’s to hoping that this name catches on.
And here’s my code implementation:
public static Quaternion Sterp ( Quaternion a, Quaternion b, Vector3 twistAxis, float t ) { Quaternion deltaRotation = b * Quaternion.Inverse(a); Quaternion swingFull; Quaternion twistFull; QuaternionUtil.DecomposeSwingTwist ( deltaRotation, twistAxis, out swingFull, out twistFull ); Quaternion swing = Quaternion.Slerp(Quaternion.identity, swingFull, t); Quaternion twist = Quaternion.Slerp(Quaternion.identity, twistFull, t); return twist * swing; } Proof
Lastly, let’s look at the proof for the swingtwist decomposition formulas. All that needs to be proven is that the swing component S does not contribute to any rotation around the twist axis, i.e. the rotational axis of S is orthogonal to the twist axis. Let vec{V_{R_para}} denote the parallel component of vec{V_R} to vec{V_T}, which can be obtained by projecting vec{V_R} onto vec{V_T}: vec{V_{R_para}} = proj_{vec{V_T}}(vec{V_R}) Let vec{V_{R_perp}} denote the orthogonal component of vec{V_R} to vec{V_T}: vec{V_{R_perp}} = vec{V_R}  vec{V_{R_para}} So the scalarvector form of T becomes: T = [W_R, proj_{vec{V_T}}(vec{V_R})] = [W_R, vec{V_{R_para}}] Using the quaternion multiplication formula, here is the scalarvector form of the swing quaternion: S = R T^{1} = [W_R, vec{V_R}] [W_R, vec{V_{R_para}}] = [W_R^2  vec{V_R} ‧ (vec{V_{R_para}}), vec{V_R} X (vec{V_{R_para}}) + W_R vec{V_R} + W_R (vec{V_{R_para}})] = [W_R^2  vec{V_R} ‧ (vec{V_{R_para}}), vec{V_R} X (vec{V_{R_para}}) + W_R (vec{V_R} vec{V_{R_para}})] = [W_R^2  vec{V_R} ‧ (vec{V_{R_para}}), vec{V_R} X (vec{V_{R_para}}) + W_R vec{V_{R_perp}}] Take notice of the vector part of the result: vec{V_R} X (vec{V_{R_para}}) + W_R vec{V_{R_perp}} This is a vector parallel to the rotational axis of S. Both vec{V_R} X(vec{V_{R_para}}) and vec{V_{R_perp}} are orthogonal to the twist axis vec{V_T}, so we have shown that the rotational axis of S is orthogonal to the twist axis. Hence, we have proven that the formulas for S and T are valid for swingtwist decomposition. Conclusion
That’s all.
Given a twist axis, I have shown how to decompose a rotation into a swing component and a twist component.
Such decomposition can be used for swingtwist interpolation, an alternative to slerp that interpolates between two orientations, which can be useful if you’d like some point on a rotating object to travel along the shortest arc.
I like to call such interpolation sterp.
Sterp is merely an alternative to slerp, not a replacement. Also, slerp is definitely more efficient than sterp. Most of the time slerp should work just fine, but if you find unwanted orientational sway on an object’s moving end, you might want to give sterp a try.

By Sebastian Werema
Do you know any papers that cover custom data structures like lists or binary trees implemented in hlsl without CUDA that work perfectly fine no matter how many threads try to use them at any given time?


Advertisement