# ReaperSMS

Member

68

1. ## Projection Offset Problem

Because that is what makes a perspective projection have perspective. The division is what makes things shrink as they move further from the camera.
2. ## Projection Offset Problem

Taking the example case of Far = 10, Near = 1, just dividing by Far-Near would put points at the far plane at 10/9, and points at the near plane at 1/9. Subtracting Near / (Far-Near) changes that so that points on the far plane become 1, and points on the near plane become 0. The scale by Far is to counteract the perspective divide
3. ## Projection Offset Problem

The intended result is to transform the coordinate such that the range [Near,Far] maps to [0,1], but after the perspective divide. Ignoring the divide to start with, we start by translating by -Near, so that Near maps to 0. Zout = Zin - Near Now, in the given case, Z values at the near plane become 0, Z values at the far plane become 9. We rescale by 1/(Far-Near) to bring that to the range [0,1] Zout = (Zin - Near) / (Far - Near) To make this easier to calculate with a matrix, we want it in the form A * z + D, so we distribute and rearrange things Zout = Zin * 1/(Far - Near) - Near / (Far - Near) If it is an orthographic projection, we're done. If it is a perspective projection, we must take into account the divide by Zin that will happen. Zclip = Zin * 1/(Far - Near) - Near / (Far - Near) Zout = Zclip / Zin For Zin = Near, Zclip is 0.0, and nothing would change, but for Zin = Far, we would get a result of: Zclip = Zfar * 1/(Far - Near) - Near / (Far - Near) = 10 / 9 - 1 / 9 = 9 / 9 = 1 Zout = Zclip / Zin = Zclip / Far = 1 / 10 To get a Zout of 1, we have to scale things by Far, which will give the correct result of Zin = Near -> 0.0, Zin = Far -> 1.0. Distributing it across: Zclip = Zin * Far / (Far - Near) - (Near * Far) / (Far - Near) Zout = Zclip / Zin
4. ## Painters Algorithm

The algorithm is literally render things in depth order, but it doesn't work those out, you have to provide them. Things get complicated when the objects start intersecting, and moreso when they are concave, but there are plenty of production particle systems that can boil their sorting down to a simple qsort() on Z. These days it is mostly applicable to translucent rendering, as opaque can rely on zbuffering to get correct results without regard to draw order.
5. ## Matrix palette skinning, blending matrices

Welcome to the wonderful world of linear transformations. For the usual weighted skinning approach, this is indeed a valid way to do it. The short version is that the matrices in this case are linear transforms, which have the helpful properties that, for any particular linear function F(), values u, v, and scalar c, the following hold true: F(c * u) = c * F(u) and F(u + v) = F(u) + F(v) Assuming matrices bone0, bone1, bone2, bone3, weight0..3, and shrinking down to only looking at the x value of the result: float result = 0.0; result += (bone0 * pos).x * weight0; result += (bone1 * pos).x * weight1; result += (bone2 * pos).x * weight2; result += (bone3 * pos).x * weight3; (bone0 * pos).x expands out to something like (bone0._11 * pos.x + bone0._21 * pos.y + bone0._31 * pos.z + bone0._41), and similar for the rest, (apologies for playing very fast and loose with column vs row major, it doesn't particularly matter for the linearity of things) result += (bone0._11 * pos.x + bone0._21 * pos.y + bone0._31 * pos.z + bone0._41) * weight0; result += (bone1._11 * pos.x + bone1._21 * pos.y + bone1._31 * pos.z + bone1._41) * weight1; result += (bone2._11 * pos.x + bone2._21 * pos.y + bone2._31 * pos.z + bone2._41) * weight2; result += (bone3._11 * pos.x + bone3._21 * pos.y + bone3._31 * pos.z + bone3._41) * weight3; if you distribute the weight# multiplies through, roll all the sums together, and then pull pos.x, pos.y, and pos.z out accordingly, you get something like: result = pos.x * (bone0._11 * weight0 + bone1._11 * weight1 + bone2._11 * weight2 + bone3._11 * weight3) + pos.y * (etc...) + pos.z * (etc...) and get exactly the second formulation
6. ## Line segment intersection seems to work but then other times is horribly wrong

It looks like your segment intersection test is actually an infinite line test. It will only return false if they are parallel or coincident...
7. ## What's the difference between dot and * in HLSL

That all looks fine, assuming diffuse and ambient are float4's, which they almost certainly should be if you want lights that aren't just white.
8. ## C++ Self-Evaluation Metrics

Assuming it's for a deep magic code ninja type position, ask why, and likely be satisfied with a coherent answer. If it's not a position that involves staring at hex dumps for bugs, it probably doesn't even come up... unless someone claims they have a better grasp of C++ than Stroustrup or Sutter.   Or, on bad days, be very relieved, as it means I don't have to dig that bit of the standard out of cold storage.
9. ## C++ Self-Evaluation Metrics

Anything over an 8 means one of two things. They've either written a solid, production ready compiler frontend and runtime support library, or they're a 4. 7-8 from someone with a background that matches means "I've seen horrible things, and know how to avoid/diagnose them, but there are still fell and terrible things lurking in the dark corners of the earth".    An approach we used from time to time, at least for people that claim to be Really Good and Technical with it, is to just have them start drawing out the memory layout of an instance of a class object, working up from the trivial case, through to the virtual diamond one, and see where the floundering starts. Bonus points for knowing how dynamic_cast and rtti work (and a slight bit of walking through the process usually serves as a good reminder of why they aren't exactly free).
10. ## SH directional lights, what am I missing?

It's a 3D scene, but with the view direction restricted to slightly off-axis, and camera motion restricted to a 2D plane.   The main area of play is about 400 units in front of the camera, with some near-field objects about 200 units past that that can accept shadows. Tons and tons of background objects lie far beyond that, the far plane is set to around 100,000. It isn't particularly ideal.   That soup gets thrown at a deferred lighting renderer, which is all fine and great up until it needs to light things that don't write depth.
11. ## SH directional lights, what am I missing?

We have a game here using a straightforward deferred lighting approach, but we'd like to get some lighting on our translucent objects. In an attempt to avoid recreating all the horrible things that came from shader combinations for every light combination, I've been trying to implement something similar to the technique Bungie described in their presentation on Destiny's lighting.   The idea is to collapse the light environment at various probe points into a spherical harmonic representation, that the shader would then use to compute lighting. Currently it's doing all of this on the CPU, but I've run into what seems to be a fundamental issue with projecting a directional light into SH.   After digging through all of the fundamental papers, everything seems to agree that the way to project a directional light into SH, convolved with the cosine response is void project_directional( float* SH, float3 color, float3 dir ) {    SH[0] = 0.282095f * color * pi;    SH[1] = -0.48603f * color * dir.y * (pi * 2/3);    SH[2] = 0.48603f * color * dir.z * (pi * 2/3);    SH[3] = -0.48603f * color * dir.x * (pi * 2/3); }   float3 eval_normal( float* SH, float3 dir ) {    float3 result = 0;      result = SH[0] * 0.282095f;    result += SH[1] * -0.48603f * dir.y;    result += SH[2] * 0.48603f * dir.z;    result += SH[3] * -0.48603f * dir.x;    return result; }   // result is then scaled by diffuse There's a normalization term or two, but the problem I've been running into, that I haven't seen any decent way to avoid, is that ambient term in SH[0]. If I plug in a simple light pointing down Z, normals pointing directly at it, or directly away from it behave reasonably, but a normal pointing down, say, the X axis will always be lit by at least 1/4 of the light color. It's produced a directional light that generates significant amounts of light at 90 degress off-axis.   I'm not seeing how this could ever behave differently. I can get vaguely reasonable results if I ignore the ambient term while merging diffuse lights in, but that breaks down the moment I try summing two lights, pointing in opposite directions in. Expanding out to the 9-term quadratic form does not help much either.   I get the feeling I've missed some fundamental thing to trim down the off-axis directional light response, but I'll be damned if I can see where it would come from. Is this just a basic artifact of using a single light as a test case? Is this likely to behave better by keeping the main directional lights out, and just using the SH set to collapse point lights in as sphere lights or attenuated directionals? Have I just royally screwed up my understanding of how to project a directional light into SH?   The usual pile of papers and articles from SCEE, Tom Forsyth, Sebastien Lagarde, etc have not helped. Someone had a random shadertoy that looked like it worked better in posted screenshots, but actually running it produces results more like what I've seen.
12. ## SH directional lights, what am I missing?

I was afraid of that.   The divide by pi is in there on the real code side, I left out some of the normalization to get down to just the SH bits. The lighting model for this project is ridiculously ad-hoc, as we didn't get a real PBS approach set up in the engine until a few months into production. Another project is using a much more well behaved setup, but it has the advantage of still being in preproduction.   For this project the scenes are sparse space-scapes, with a strong directional light, and an absurd number of relatively small radius point lights for effects, and only about three layers of objects (ships, foreground, and background). I suppose a brute force iteration over the light list might do the job well enough, as there might not be enough of these around to justify a fancy approach.
13. ## Help finding error in base 62 converter

The sites are the ones in the wrong. They're probably implemented in javascript, which I believe treats all numbers as floats, and thus are losing precision. As an example, your third number, punched into windows calc, as the first step would be:   22236810928128038 % 62 = 42, which should be 'g'. If we subtract 42 out of there, we get  22236810928127996, which on the second site properly ends up with a final digit of '0'. If you give it 22236810928127997, it still ends in '0', and if you give it 22236810928127998, it jumps to '4'. double precision floats only give about 16 digits of precision, so feeding it an 18 digit number means it starts rounding in units of 4.   The entire idea seems a bit odd however, as for this to be reasonable, you have to convert before encrypting, and need to know exactly where numbers live in the output to parse them back properly. It seems like it would be better to encrypt directly from binary, and base-64 convert the output if you need to send it over a restricted channel.
14. ## StartInstanceLocation, and SV_InstanceID

Bleh, I see gl does the same thing. I suppose I shall have to put up with being terribly disappointed in the PC API's again.
15. ## StartInstanceLocation, and SV_InstanceID

So, I'm trying to use SV_InstanceID as an extra input to a shader, to pick from a small set of vertex colors in code.   It seems to completely ignore the last argument of DrawIndexedInstanced(), and start at 0 per draw call. This seems less than useful, as it would make it impossible to transparently split up an instanced draw call, and defeat a lot of the purpose of having the system value at all.   How would one be expected to use SV_InstanceID properly in this case? The vertex shader looks about like so: struct VertexInput { float4 position : POSITION; uint instanceid : SV_InstanceID; }; struct VertexOutput { float4 projPos : SV_Position; float4 color : COLOR0; }; VertexOutput vs_main( const VertexInput input ) { VertexOutput output = (VertexOutput)0; output.projPos = mul( float4( input.position.xyz, 1.0f ), g_ViewProjection ); if ( input.instanceid == 0 ) { output.color = float4(1,0,0,1); } else if ( input.instanceid == 1 ) { output.color = float4(0,1,0,1); } else { output.color = float4(0.5,0.5,0.5,1); } return output; } This results in it always picking red. If I instead dig a color out of a separate vertex buffer, via D3D11_INPUT_PER_INSTANCE_DATA, it works as expected.   How do I make d3d useful?
16. ## Sharing violations and "Network optomisers"

Or that the driver's just a little old, and the QoS is busted. We had an issue with devkit connectivity, where one machine could talk to a kit after an update, but not another machine. The initial webconfig page would start loading, and then come to a dead halt, and kill the http connection.   That turned out to be related to jumbo packets. The update enabled them for the devkit, and the machine that didn't work had a realtek driver dated ~5 days earlier than the other machine. That caused it to drop any and all jumbo packets, and the second packet the devkit tried sending over was about 20 bytes over the jumbo threshold...
17. ## Unordered access view woes with non-structured buffers

Typed UAVs have some restrictions, check the DXGI programming guide under Hardware Support for Direct3D 11 Formats.   Column 22 on mine is Typed UAV, and it does apply to most of the types. Conspicuously absent from it, however, are 96-bit RGB, 64-bit depth/stencil, 32-bit depth (use R32), packed 24/8 depth/stencil, shared exponent and odd RG_BG/GR_GB modes, and all of the block compressed formats.   tl;dr: DXGI_FORMAT_R32G32B32_FLOAT doesn't work for typed UAVs. The rest do.
18. ## Simulating lighting using volumetric meshes.

The renderer side is going to treat it as slices. If you really want to go this route, you're probably looking at using a geometry shader to replicate the light volume geometry out to all slices covered by it, doing the appropriate projections and such.   The practicality of all that seems questionable, memory restrictions are going to keep your lighting exceedingly lowres, and you're blowing the vast majority of it on empty or useless space.
19. ## Floating point accuracy across computers?

A certain PC RTS title of years past tried this, including sending raw floats over the wire. They hit issues between Intel and AMD, and after sorting some of those out, between Debug and Release. They tried the usual compiler options and floating point control word magic (that still needed resetting after every D3D call).   We got to port it to Linux, and tried very hard to keep it netplay compatible. All of the above applied, plus the fun of Visual Studio vs GCC when it came to fp codegen behavior. Rounding everything to ~3 decimal places mostly dealt with it. but not all of it. In particular, the AI code had some float comparisons lying around, on data that was never sent over the wire, that could change the number of calls to the RNG, and that *was* state that was tracked closely.   I managed to come up with a method that definitively solved the compiler issues -- eyeball the VC output assembly, and reimplement the function on the GCC side with the VC floating point translated to AT&T syntax, pasted in, and add some shim code around it to fix up differences in the calling convention. This is not how one should define C++ class methods, but such was life.   It even worked, and solved it definitively for that case. The next case that came up was the same sort of thing, two steps higher on the callstack. At that point I gave up, because we did not have the time to rewrite the entire AI system in assembly, as that was clearly going to be the end result.   This way lies madness. Stick to fixed point for anything that actually matters to the game simulation. You should probably also make sure your system is set up to be able to detect synchronization loss as immediately as possible, and even better, have a mechanism for resynchronizing. Otherwise you're in for debugging issues that only happen in 5+ player games, after 2 hours, with the bulk of the useful data being gigs upon gigs of value logs and callstack traces.
20. ## Just how alright will I be if I were to skip normal-mapping?

Fillrate and memory bandwidth are not quite the same thing. Normalmaps don't really hit fillrate outside of a deferred or light prepass render, just memory bandwidth. However, their access patterns are fairly predictable, and scale better (assuming mipmaps) than random vertex access.   Given that every card imaginable these days shades and rasterizes in units larger than a pixel, 2x2 quads at the least, and far larger in practice, any ALU gains you get by not bothering with normalmaps will be consumed by small triangle overhead. 1x1 pixel triangles will generally compute as 2x2 quads, or worse, and throw away most of the results, so pixel for pixel they're 4-16x more expensive than a more reasonably sized triangle.   Lastly, mipmaps provide a more automatic method of LOD. With discrete triangles, lighting, texturing, etc will almost certainly break down into a flickery, sparkly mess as the triangles shrink to sub-pixel resolution.
21. ## Problem on physical material

It's the Schlick formula, but with k = roughness^2 / 2 to fit the smith GGX D function. The one I mentioned was equation 4 in epics notes from the physically based shading course at siggraph.   For most things, the saturate works, but for the case of m=0, it depends on what the card does for 0/0. Epic avoids that case for regular lights, as they remap their roughness to (roughness+1)/2 first, so it became float G_Schlick(float v, float m) {    float k = (m + 1) * (m + 1) / 8.0f;    return v / (v * (1 - k) + k); } their notes are a bit thin on some of the other details.
22. ## Problem on physical material

Additionally, your D term formulation might start misbehaving as NoH approaches 1, but the artifacting in that term is probably supposed to get masked off by the missing NoL * NoV in the G term.
23. ## Problem on physical material

Looks like the G term is the likely culprit.   For starters, schlick's G is   G1(x) = dot(n, x) / ( dot(n, x) * (1-k) + k )   G(L, V, H) = G1(L) * G1(V)   I don't know where that 0.25 in your numerator came from, but at the least you're missing an NoL * NoV, and I'm pretty sure that doesn't cancel out. Epic's paper mentions using a remapping of roughness to (roughness+1)/2 before squaring it, which guarantees a non-zero denominator.   Your math for G is probably exploding for surfaces that don't have normals facing one of the two directions, and a roughness of 0.
24. ## Are mutexes really fool-proof?

Because your mutex is too simplistic to actually work.   Here are some of the various possible failures it could run into:   Both threads could try locking at the same time, both read it as unlocked, and both write it locked and enter. The compiler could be clever, and cache locked in a register, resulting in an infinite loop if one tries locking an already locked mutex. The compiler could inline the lock, and shuffle code around such that part of the block you're trying to protect happens before locking the mutex. The CPU could speculatively execute past the lock. etc.
25. ## OpenGL color interpolation not linear?

You should be using glBlendFunc(GL_SRC_ALPHA, GL_ONE).   At a point where both of them are interpolated to 50% alpha, using SRC_ALPHA, ONE_MINUS_SRC_ALPHA, you will get 0.5 green, 0.25 red, 0.25 background, because you're doing this:   Output = 0.5 * Green + (1-0.5) * ( 0.5 * red + (1-0.5) * background );