Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 02 Mar 2010
Offline Last Active May 04 2015 02:21 AM

#5224144 Localizing image based reflections (issues / questions)

Posted by lipsryme on 18 April 2015 - 04:28 AM

I'm currently trying to localize my IBL probes but have come across some issues that I couldn't find any answers to in any paper or otherwise.


1. How do I localize my diffuse IBL (irradiance map) ? The same method as with specular doesn't really seem to work.

    The locality does not really seem to work for the light bleed.


As you can see here the light bleed is spread over a large portion of the entire plane even if the rectangular object is very thin.

Also the red'ish tone doesn't really become increasingly stronger the closer the sphere is moved to the red object.

If I move the sphere further to the side of the thin red object the red reflection is still visible. So there's no real locality to it.



2. How do I solve the case for objects that aren't rectangular or where there's objects not entirely at the edge of the AABB that I intersect ? (or am I'm missing a line or two to do that ?)




As you can see here the rectangular red object reflection works perfectly (but then again only if its exactly at the edge of the AABB).

If an object is like a sphere or moved closer to the probe (so not at the edge) the reflection will still be completely flat and projected to the side of the AABB.

Here's the code snippet how I localize my probe reflection vector...



float3 LocalizeReflectionVector(in float3 R, in float3 PosWS, 
                                in float3 AABB_Max, in float3 AABB_Min, 
                                in float3 ProbePosition,
	// Find the ray intersection with box plane
	float3 FirstPlaneIntersect = (AABB_Max - PosWS) / R;
	float3 SecondPlaneIntersect = (AABB_Min - PosWS) / R;

	// Get the furthest of these intersections along the ray
	float3 FurthestPlane = max(FirstPlaneIntersect, SecondPlaneIntersect);

	// Find the closest far intersection
	float distance = min(min(FurthestPlane.x, FurthestPlane.y), FurthestPlane.z);

	// Get the intersection position
	float3 IntersectPositionWS = PosWS + distance * R;

	return IntersectPositionWS - ProbePosition;

#5203884 Motion Blur flickering highlights unavoidable ?

Posted by lipsryme on 13 January 2015 - 04:59 AM

With motion blur: https://www.youtube.com/watch?v=IV7HLkXjy2I&feature=youtu.be

Without motion blur: https://www.youtube.com/watch?v=h7Ej5KWKNHM


I was wondering if this artifact/flickering is something unavoidable with these common type motion blur techniques?

Because I've seen this happening in several other applications (e.g. CryEngine).

Reducing the velocity amount kind of reduces the amount of flickering but never completely fixes it.

#5199658 Rendering blurred progress lines

Posted by lipsryme on 23 December 2014 - 02:18 AM

Looks like a 2D animation to me.

You just blur it manually in photoshop.

Or maybe do something more fancy with UI stuff like flash (I think UE3 does UI animations with scaleform).

#5185341 Forward+ Rendering - best way to updating light buffer ?

Posted by lipsryme on 06 October 2014 - 12:44 PM

Ah so I can just basically do the same as with constant buffers ! smile.png

Having some issues copying the data due to heap corruption but that might be related to something else.

Is this the correct way of doing it ?


Update: yep that heap corruption was something else. Seems to be working now smile.png

if (!pointLights_center_and_radius.empty())
	this->numActiveLights = static_cast<unsigned int>(pointLights_center_and_radius.size());
	D3D11_MAPPED_SUBRESOURCE pointLightResource = {};
	if (SUCCEEDED(this->context->Map(this->pointLightBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &pointLightResource)))
		DirectX::XMFLOAT4 *pData = (DirectX::XMFLOAT4*)pointLightResource.pData;
		memcpy(pData, &pointLights_center_and_radius[0], sizeof(DirectX::XMFLOAT4) * numActiveLights);
		this->context->Unmap(this->pointLightBuffer, 0);

#5172591 Alpha-Test and Forward+ Rendering (Z-prepass questions)

Posted by lipsryme on 10 August 2014 - 07:17 AM


Alpha-tested geometry tends to mess up z-buffer compression and hierarchical z representations. So usually you want to render it after your "normal" opaques, so that the normal geometry can get the benefit of full-speed depth testing.


As for why they return a color from their pixel shader...I have no idea. In our engine we use a void return type for our alpha-tested depth-only pixel shader.


Could it be beneficial to skip z prepass for alpha tested geometry compleatly to avoid that z-buffer compression mess ups?


The issue here is you can't since you need the depth information for light culling in the compute shader.

By the way it seems they have alpha-to-coverage enabled in the AMD sample so maybe the color return type has something to do with that ?

#5163235 UE4 IBL / shading confusion

Posted by lipsryme on 27 June 2014 - 07:55 AM

Few months back I had a thread here on trying to implement their approach of IBL using this so called split-sum-approximation and I thought I understood what it was trying to do (same with other aspects of their paper)...I guess I was wrong and I'm even more confused now.


For anyone who doesn't know which paper I'm refering to: http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf


1. Metallic shading parameter: I'm still unsure if I get what this parameter actually does. What I thought it does is have some value [0-1] that just attenuates the diffuse term like so:

Diffuse *= 1.0f - metallic;

Because if we want the material to be more metallic the diffuse term needs to decrease until it's fully gone for an actual metal. Just attenuating the fresnel reflectance does not completely remove that. Looking more closely at the pictures in the paper it looks as if they control the actual FresnelReflectance with it, something like:

FresnelReflectance = lerp(0.04, 1.0f, metallic);

Does anyone know what the parameter actually does ?


2. Cavity Map: They describe this as small-scale shadowing...so is this just an AO map ? But then they talk about this being a replacement for specular (do they mean specular intensity?)


3. Split-sum-approxmation: Now this is the most confusing thing to me. My understanding of this was that they additionally precompute a 2D texture that handles the geometry and fresnel portion of the BRDF and that this is a lookup texture that is used during the realtime lighting pass to attenuate the FresnelReflectance so that it looks more realistic(?) / e.g. removes the glowing edges at high roughness. Am I wrong ?

I've generated this 2d lookup texture and it look almost precisely like the picture in his paper except that it's rotated 45 degrees for some odd reason ? I've tried rotating it so it looks the same as in the paper using photoshop but the result of looking up the values seems completely wrong (did he just artificially rotate the texture just for the paper ??) while the original texture does produce reasonable(?) results in that if applied to a sphere the edges become increasingly stronger the smoother it is.

Let me give you a few screenshots:


This is the texture generated by me:



This is the texture shown in his paper:



And here's a quick video showing how the fresnel reflectance looks like using the formula described in the paper:

FresnelReflectance * EnvBRDF.x + EnvBRDF.y




Also what does he mean by can only afford a single sample for each (why would you have more ?):


Even with importance sampling, many samples still need to be taken. The sample count can be
reduced significantly by using mip maps [3], but counts still need to be greater than 16 for sufficient
quality. Because we blend between many environment maps per pixel for local reflections, we can only
practically afford a single sample for each


In his sample code he uses a sample_count of 1024 which in my tests produces lots of dots on the env map from the random distribution and it only gets better using at least 5k samples. I don't see how he does that. Is this just a case of making the precomputation faster because of hundreds/thousands of probes ?

#5154785 why the alphablend is a better choice than alphatest to implement transparent...

Posted by lipsryme on 20 May 2014 - 05:28 AM

discard is GLSL (openGL based shading language) / clip is HLSL (directX based shading language) -> both refer to the clipping operation that can be used for alpha testing.


The reason why alpha blend is "sometimes" the better choice is that for example powerVR gpus use so called "Deferred Tile-Based-Rendering". The gpu collects triangle data and at some point executes pixel processing. But before going there powerVR chips implement an additional optimization stage (this is what the "Deferred" part refers to) that determines which parts of the "Tile" should actually be drawn so we don't shade them multiple times for no reason aka overdraw (overdraw mostly refers to the redundant multiple framebuffer writes but shading is also part of this problem).

So when using clipping operations the gpu won't be able to do this optimization anymore. Note that on every gpu this results in a performance reduction because of early-Z since you can't determine which pixel should be culled before going through the pixel pipeline. But on powerVR chips this is even more of a problem due to the aforementioned "pixel overlap determination stage".


Why exactly alpha blend is faster in this case I'm not quite sure...my guess is that the blend operation in a tile-based-rendering environment is fairly fast since you don't blend into the actual framebuffer but the small on-chip-memory that holds the tile which as it seems may still be faster than opaque rendering without the hidden-surface-removal stage.


I hope I got all of this right since I'm still in the process of learning so if I made a mistake someone correct me please smile.png

#5151423 Specular Mapping on terrain

Posted by lipsryme on 04 May 2014 - 09:31 AM

Your shader code looks a little odd to me (but maybe I'm missing something).

The WoW example is just regular blinn-phong specular shading which is done like this:

// This is the "half-vector", because it's half-way between the light direction and the view direction
float3 H = normalize(L + V); 

// LightDirection being the vector from the surface towards the light
float NdotH = saturate(dot(Normal, LightDirection));

// This is the blinn-phong distribution (D)
float D = pow(NdotH, glossiness);

// You should multiply it by the angle between the surface normal and the light direction
// to avoid light leaking at some angles.
float3 specular = D * LightColor * NdotL; 

Notice specular being a single float value / scalar (this is basically just the form and strength of your specular reflection).

You may also want to apply at least an additional Fresnel Term to it so its intensity will differ more realistically according to the viewing angle.


P.S. A reflection vector is calculated like so:

float3 R = 2 * (NdotL * N) - L

or just use the hlsl intrinsics:

float3 R = reflect(L, N);

#5121213 [Software Rasterizer] Vertex Clipping & Guard Bands

Posted by lipsryme on 04 January 2014 - 04:06 PM

@Tim: I actually do something similar with the clipping to viewport bounds and stuff but I was trying to implement actual vertex clipping if a vertex is outside the guard bands.


Anyhow I've managed to get it working (at least flawlessly with a single triangle smile.png ).

For those who might wanna know and/or found this years after tongue.png, this is how I do it :


I'm now translating my post transform coordinates to basically get [-1,1] coordinates and then check these with the guard band boundaries.

Then I count how many vertices are outside the guard band. If there's two vertices outside then it's an easy case where I just use the cohen-sutherland algorithm to find the two intersection points with the guard band boundary and just change the original position with the intersection point.

If there's just one vertex outside it gets a little trickier, because you now have to basically generate an additional triangle and "reconnect" the original vertices with the new ones from the intersections.

#5118341 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 20 December 2013 - 06:21 AM

Now for the multithreaded approach.....


To put this into some perspective here's my test setup:

Rendering of a mesh with ~70k polygons (fairly close to the camera) http://d.pr/i/2lml

CPU: Intel Core i5 2.8ghz (4 physical cores, no hyperthreading / 256kb L2 cache each)

I've not yet converted everything to SIMD / SSE processing (only the most important bits e.g. VS).

And I'm only outputting a color value as the output (so no pixel shading or depth buffer test/write).






What I tried to do was give each thread a split of the index buffer (so for 4 threads -> 4 splits) that each thread works on it's own batch of vertices.

What I'm then doing is filling an IntermediateTriangle buffer local to each thread like so:

this->intermediateTriangles[threadIdx][triangleNr] = IntermediateTriangle(v0, v1, v2,
                                                     v0_out, v1_out, v2_out, triangleNr);

The IntermediateTriangle includes the transformed vertices that make up the triangle + it's additional VS outputs and it's ID inside this buffer.

This process takes around 3.4ms







Next up in the same thread I'm doing the Binning process.

I loop through each intermediateTriangle that is inside the local buffer for this thread and do binning like seen in my last post:

// Add triangle to bin
bin[ii][binCountPerTile[ii]++] = std::make_pair(tri.index, threadIdx);

I store an std::pair with 2 unsigned integers because I not only need to know the triangle's index into the intermediate buffer but also in which local thread's buffer these vertices are located at.

binCountPerTile is an atomic<unsigned int>.

note: switched everything from push_back to direct writes which gave me a little speedup.

This process takes around 11.4ms (probably the culprit?)


Then I rejoin my threads and prepare a quick queue that holds indices to the bin's:

// Prepare tile queue
for(unsigned short i = 0; i < NUM_TILES; ++i)
	if(this->binCountPerTile[i] > 0)






Now I launch another threadgroup for rasterization / pixel processing.

In here I run a while loop that checks if the queue is empty.

I take the first tile from the queue like so:

// As long as the tile queue isn't empty
	unsigned int tile = this->tileQueue.front();


And then process each triangle inside the tile.

This process takes around 4.7ms


So as you can see my frametime adds up to around 19-20ms which definitely needs improvement.

Tests without binning run the whole process in around 9-10ms.


I'd love to get your inputs smile.png


edit: um...what? why is this getting voted down ?

#5113784 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 10:09 AM

Okay so I loop through the triangle's bounding box, do these calculations and then store the per pixel interpolants ? Is that correct ?

#5113770 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 09:38 AM

But calculating the barycentric weights/coords is part of the rasterizing process and done (partly) per pixel. Even more so calculating interpolated data like tex coordinates also has to be interpolated per pixel. Which part of that process do you actually store ?

This is how I do the rasterization part in my renderer:

// Triangle setup
int A01, A12, A20, B01, B12, B20;
int w0_row, w1_row, w2_row;

A01 = v0.y - v1.y, B01 = v1.x - v0.x;
A12 = v1.y - v2.y, B12 = v2.x - v1.x;
A20 = v2.y - v0.y, B20 = v0.x - v2.x;

w0_row = orient2D(v1, v2, p);
w1_row = orient2D(v2, v0, p);
w2_row = orient2D(v0, v1, p);

for(p.y = tri.boundingBox.minY; p.y <= tri.boundingBox.maxY; ++p.y)
        // Barycentric coordinates at the start of the row
	int w0 = w0_row;
	int w1 = w1_row;
	int w2 = w2_row;

	for(p.x = tri.boundingBox.minX; p.x <= tri.boundingBox.maxX; ++p.x)
		// If p is on or inside all edges, render pixel
	        if((w0 | w1 | w2) >= 0)
		        // Degenerate case
			if(w0 == 0 && w1 == 0 && w2 == 0)
			unsigned int currentPixel = p.x + p.y * viewportWidth;

			// Calc final color
			this->OM->RT->texture[currentPixel] = 
                                          OutputMerger::ToUnsigned(Vector4(1, 1, 1, 1));


			// One step to the right
			w0 += A12;
			w1 += A20;
			w2 += A01;

		// One row step
		w0_row += B12;
		w1_row += B20;
		w2_row += B01;

#5113706 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 04:46 AM

Damn that sounds quite complicated...

I'm still not quite sure how the actual binning process would work in detail (and efficiently) ?

Adding items (even if something like unsigned int) to a dynamic array with the size of e.g. framebuffersize / 8 is extremely slow per frame and per vertex in my tests.

Especially if a single tile contains a lot of vertices.

#5113463 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 01 December 2013 - 06:33 AM

Hey guys,


I've been thinking about some optimizations on my rasterizer and I've come across the concept of "binning" the triangles to tiles of the framebuffer. Which then are processed in parallel using several threads.


But there are still some open questions for me on how to do this efficiently...


1. I've seen a sample where they use a tile size that is small enough to fit into the cache e.g. 320x90. What I don't quite get is why make the height / 8 and the width / 4. My pixels are stored contiguously in an std::vector container so doesn't that mean that one scanline is also much faster to access in memory than two half scanlines ? So why not the other way around ?


2. Then these binning containers, are they vectors where I push them back ? Wouldn't that be extremely slow per frame ?


3. What happens if a triangle is overlapping several tiles. Do I just push the triangles into all overlapping containers ? Doesn't that produce a lot of unneeded processing ?


4. In the case of 320x90 tiles at a screen resolution of 1280x720 that would mean I've divided my buffer into 4 * 8  = 32 tiles. That's probably not effective as a 1:1 ratio on CPU threads. So would I run 4 threads and then when they're done the next 4 ?

#5111025 Design help on software rasterizer / renderer

Posted by lipsryme on 21 November 2013 - 11:17 AM

Yep that gave me some idea on how to go about it. Thanks guys.