Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


lipsryme

Member Since 02 Mar 2010
Offline Last Active Today, 02:36 AM

#5185341 Forward+ Rendering - best way to updating light buffer ?

Posted by lipsryme on 06 October 2014 - 12:44 PM

Ah so I can just basically do the same as with constant buffers ! smile.png

Having some issues copying the data due to heap corruption but that might be related to something else.

Is this the correct way of doing it ?

 

Update: yep that heap corruption was something else. Seems to be working now smile.png

if (!pointLights_center_and_radius.empty())
{
	this->numActiveLights = static_cast<unsigned int>(pointLights_center_and_radius.size());
	D3D11_MAPPED_SUBRESOURCE pointLightResource = {};
	if (SUCCEEDED(this->context->Map(this->pointLightBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &pointLightResource)))
	{
		DirectX::XMFLOAT4 *pData = (DirectX::XMFLOAT4*)pointLightResource.pData;
		memcpy(pData, &pointLights_center_and_radius[0], sizeof(DirectX::XMFLOAT4) * numActiveLights);
		this->context->Unmap(this->pointLightBuffer, 0);
	}
}



#5172591 Alpha-Test and Forward+ Rendering (Z-prepass questions)

Posted by lipsryme on 10 August 2014 - 07:17 AM

 

Alpha-tested geometry tends to mess up z-buffer compression and hierarchical z representations. So usually you want to render it after your "normal" opaques, so that the normal geometry can get the benefit of full-speed depth testing.

 

As for why they return a color from their pixel shader...I have no idea. In our engine we use a void return type for our alpha-tested depth-only pixel shader.

 

Could it be beneficial to skip z prepass for alpha tested geometry compleatly to avoid that z-buffer compression mess ups?

 

The issue here is you can't since you need the depth information for light culling in the compute shader.

By the way it seems they have alpha-to-coverage enabled in the AMD sample so maybe the color return type has something to do with that ?




#5163235 UE4 IBL / shading confusion

Posted by lipsryme on 27 June 2014 - 07:55 AM

Few months back I had a thread here on trying to implement their approach of IBL using this so called split-sum-approximation and I thought I understood what it was trying to do (same with other aspects of their paper)...I guess I was wrong and I'm even more confused now.

 

For anyone who doesn't know which paper I'm refering to: http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf

 

1. Metallic shading parameter: I'm still unsure if I get what this parameter actually does. What I thought it does is have some value [0-1] that just attenuates the diffuse term like so:

Diffuse *= 1.0f - metallic;

Because if we want the material to be more metallic the diffuse term needs to decrease until it's fully gone for an actual metal. Just attenuating the fresnel reflectance does not completely remove that. Looking more closely at the pictures in the paper it looks as if they control the actual FresnelReflectance with it, something like:

FresnelReflectance = lerp(0.04, 1.0f, metallic);

Does anyone know what the parameter actually does ?

 

2. Cavity Map: They describe this as small-scale shadowing...so is this just an AO map ? But then they talk about this being a replacement for specular (do they mean specular intensity?)

 

3. Split-sum-approxmation: Now this is the most confusing thing to me. My understanding of this was that they additionally precompute a 2D texture that handles the geometry and fresnel portion of the BRDF and that this is a lookup texture that is used during the realtime lighting pass to attenuate the FresnelReflectance so that it looks more realistic(?) / e.g. removes the glowing edges at high roughness. Am I wrong ?

I've generated this 2d lookup texture and it look almost precisely like the picture in his paper except that it's rotated 45 degrees for some odd reason ? I've tried rotating it so it looks the same as in the paper using photoshop but the result of looking up the values seems completely wrong (did he just artificially rotate the texture just for the paper ??) while the original texture does produce reasonable(?) results in that if applied to a sphere the edges become increasingly stronger the smoother it is.

Let me give you a few screenshots:

 

This is the texture generated by me:

zB8UF1G.png?1

 

This is the texture shown in his paper:

 

TDb6Dhh.png?1

And here's a quick video showing how the fresnel reflectance looks like using the formula described in the paper:

FresnelReflectance * EnvBRDF.x + EnvBRDF.y

https://www.youtube.com/watch?v=bdY1rvDPCB8&feature=youtu.be

 

 

Also what does he mean by can only afford a single sample for each (why would you have more ?):

 

Even with importance sampling, many samples still need to be taken. The sample count can be
reduced significantly by using mip maps [3], but counts still need to be greater than 16 for sufficient
quality. Because we blend between many environment maps per pixel for local reflections, we can only
practically afford a single sample for each

 

In his sample code he uses a sample_count of 1024 which in my tests produces lots of dots on the env map from the random distribution and it only gets better using at least 5k samples. I don't see how he does that. Is this just a case of making the precomputation faster because of hundreds/thousands of probes ?




#5154785 why the alphablend is a better choice than alphatest to implement transparent...

Posted by lipsryme on 20 May 2014 - 05:28 AM

discard is GLSL (openGL based shading language) / clip is HLSL (directX based shading language) -> both refer to the clipping operation that can be used for alpha testing.

 

The reason why alpha blend is "sometimes" the better choice is that for example powerVR gpus use so called "Deferred Tile-Based-Rendering". The gpu collects triangle data and at some point executes pixel processing. But before going there powerVR chips implement an additional optimization stage (this is what the "Deferred" part refers to) that determines which parts of the "Tile" should actually be drawn so we don't shade them multiple times for no reason aka overdraw (overdraw mostly refers to the redundant multiple framebuffer writes but shading is also part of this problem).

So when using clipping operations the gpu won't be able to do this optimization anymore. Note that on every gpu this results in a performance reduction because of early-Z since you can't determine which pixel should be culled before going through the pixel pipeline. But on powerVR chips this is even more of a problem due to the aforementioned "pixel overlap determination stage".

 

Why exactly alpha blend is faster in this case I'm not quite sure...my guess is that the blend operation in a tile-based-rendering environment is fairly fast since you don't blend into the actual framebuffer but the small on-chip-memory that holds the tile which as it seems may still be faster than opaque rendering without the hidden-surface-removal stage.

 

I hope I got all of this right since I'm still in the process of learning so if I made a mistake someone correct me please smile.png




#5151423 Specular Mapping on terrain

Posted by lipsryme on 04 May 2014 - 09:31 AM

Your shader code looks a little odd to me (but maybe I'm missing something).

The WoW example is just regular blinn-phong specular shading which is done like this:

// This is the "half-vector", because it's half-way between the light direction and the view direction
float3 H = normalize(L + V); 

// LightDirection being the vector from the surface towards the light
float NdotH = saturate(dot(Normal, LightDirection));

// This is the blinn-phong distribution (D)
float D = pow(NdotH, glossiness);

// You should multiply it by the angle between the surface normal and the light direction
// to avoid light leaking at some angles.
float3 specular = D * LightColor * NdotL; 

Notice specular being a single float value / scalar (this is basically just the form and strength of your specular reflection).

You may also want to apply at least an additional Fresnel Term to it so its intensity will differ more realistically according to the viewing angle.

 

P.S. A reflection vector is calculated like so:

float3 R = 2 * (NdotL * N) - L

or just use the hlsl intrinsics:

float3 R = reflect(L, N);



#5121213 [Software Rasterizer] Vertex Clipping & Guard Bands

Posted by lipsryme on 04 January 2014 - 04:06 PM

@Tim: I actually do something similar with the clipping to viewport bounds and stuff but I was trying to implement actual vertex clipping if a vertex is outside the guard bands.

 

Anyhow I've managed to get it working (at least flawlessly with a single triangle smile.png ).

For those who might wanna know and/or found this years after tongue.png, this is how I do it :

 

I'm now translating my post transform coordinates to basically get [-1,1] coordinates and then check these with the guard band boundaries.

Then I count how many vertices are outside the guard band. If there's two vertices outside then it's an easy case where I just use the cohen-sutherland algorithm to find the two intersection points with the guard band boundary and just change the original position with the intersection point.

If there's just one vertex outside it gets a little trickier, because you now have to basically generate an additional triangle and "reconnect" the original vertices with the new ones from the intersections.




#5118341 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 20 December 2013 - 06:21 AM

Now for the multithreaded approach.....

 

To put this into some perspective here's my test setup:

Rendering of a mesh with ~70k polygons (fairly close to the camera) http://d.pr/i/2lml

CPU: Intel Core i5 2.8ghz (4 physical cores, no hyperthreading / 256kb L2 cache each)

I've not yet converted everything to SIMD / SSE processing (only the most important bits e.g. VS).

And I'm only outputting a color value as the output (so no pixel shading or depth buffer test/write).

 

 

 

VERTEX PROCESSING:

-----------------------------------

What I tried to do was give each thread a split of the index buffer (so for 4 threads -> 4 splits) that each thread works on it's own batch of vertices.

What I'm then doing is filling an IntermediateTriangle buffer local to each thread like so:

this->intermediateTriangles[threadIdx][triangleNr] = IntermediateTriangle(v0, v1, v2,
                                                     v0_out, v1_out, v2_out, triangleNr);
triangleNr++;

The IntermediateTriangle includes the transformed vertices that make up the triangle + it's additional VS outputs and it's ID inside this buffer.

This process takes around 3.4ms

 

 

 

 

BINNING:

-----------------------------------

Next up in the same thread I'm doing the Binning process.

I loop through each intermediateTriangle that is inside the local buffer for this thread and do binning like seen in my last post:

// Add triangle to bin
bin[ii][binCountPerTile[ii]++] = std::make_pair(tri.index, threadIdx);

I store an std::pair with 2 unsigned integers because I not only need to know the triangle's index into the intermediate buffer but also in which local thread's buffer these vertices are located at.

binCountPerTile is an atomic<unsigned int>.

note: switched everything from push_back to direct writes which gave me a little speedup.

This process takes around 11.4ms (probably the culprit?)

 

Then I rejoin my threads and prepare a quick queue that holds indices to the bin's:

// Prepare tile queue
for(unsigned short i = 0; i < NUM_TILES; ++i)
{
	if(this->binCountPerTile[i] > 0)
		this->tileQueue.push(i);
}

m

m

m

PIXEL PROCESSING:

-----------------------------------

Now I launch another threadgroup for rasterization / pixel processing.

In here I run a while loop that checks if the queue is empty.

I take the first tile from the queue like so:

// As long as the tile queue isn't empty
while(!this->tileQueue.empty())
{
	m.lock();
	unsigned int tile = this->tileQueue.front();
	this->tileQueue.pop();
	m.unlock();

        ...
}

And then process each triangle inside the tile.

This process takes around 4.7ms

 

So as you can see my frametime adds up to around 19-20ms which definitely needs improvement.

Tests without binning run the whole process in around 9-10ms.

 

I'd love to get your inputs smile.png

 

edit: um...what? why is this getting voted down ?




#5113784 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 10:09 AM

Okay so I loop through the triangle's bounding box, do these calculations and then store the per pixel interpolants ? Is that correct ?




#5113770 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 09:38 AM

But calculating the barycentric weights/coords is part of the rasterizing process and done (partly) per pixel. Even more so calculating interpolated data like tex coordinates also has to be interpolated per pixel. Which part of that process do you actually store ?

This is how I do the rasterization part in my renderer:

// Triangle setup
int A01, A12, A20, B01, B12, B20;
int w0_row, w1_row, w2_row;

A01 = v0.y - v1.y, B01 = v1.x - v0.x;
A12 = v1.y - v2.y, B12 = v2.x - v1.x;
A20 = v2.y - v0.y, B20 = v0.x - v2.x;

w0_row = orient2D(v1, v2, p);
w1_row = orient2D(v2, v0, p);
w2_row = orient2D(v0, v1, p);


for(p.y = tri.boundingBox.minY; p.y <= tri.boundingBox.maxY; ++p.y)
{
        // Barycentric coordinates at the start of the row
	int w0 = w0_row;
	int w1 = w1_row;
	int w2 = w2_row;

	for(p.x = tri.boundingBox.minX; p.x <= tri.boundingBox.maxX; ++p.x)
	{
		// If p is on or inside all edges, render pixel
	        if((w0 | w1 | w2) >= 0)
		{
		        // Degenerate case
			if(w0 == 0 && w1 == 0 && w2 == 0)
				continue;	
	
			unsigned int currentPixel = p.x + p.y * viewportWidth;

			// Calc final color
			this->OM->RT->texture[currentPixel] = 
                                          OutputMerger::ToUnsigned(Vector4(1, 1, 1, 1));

												
		}

			// One step to the right
			w0 += A12;
			w1 += A20;
			w2 += A01;
	}

		// One row step
		w0_row += B12;
		w1_row += B20;
		w2_row += B01;
}



#5113706 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 02 December 2013 - 04:46 AM

Damn that sounds quite complicated...

I'm still not quite sure how the actual binning process would work in detail (and efficiently) ?

Adding items (even if something like unsigned int) to a dynamic array with the size of e.g. framebuffersize / 8 is extremely slow per frame and per vertex in my tests.

Especially if a single tile contains a lot of vertices.




#5113463 [Software Rasterizer] Triangle Binning Question

Posted by lipsryme on 01 December 2013 - 06:33 AM

Hey guys,

 

I've been thinking about some optimizations on my rasterizer and I've come across the concept of "binning" the triangles to tiles of the framebuffer. Which then are processed in parallel using several threads.

 

But there are still some open questions for me on how to do this efficiently...

 

1. I've seen a sample where they use a tile size that is small enough to fit into the cache e.g. 320x90. What I don't quite get is why make the height / 8 and the width / 4. My pixels are stored contiguously in an std::vector container so doesn't that mean that one scanline is also much faster to access in memory than two half scanlines ? So why not the other way around ?

 

2. Then these binning containers, are they vectors where I push them back ? Wouldn't that be extremely slow per frame ?

 

3. What happens if a triangle is overlapping several tiles. Do I just push the triangles into all overlapping containers ? Doesn't that produce a lot of unneeded processing ?

 

4. In the case of 320x90 tiles at a screen resolution of 1280x720 that would mean I've divided my buffer into 4 * 8  = 32 tiles. That's probably not effective as a 1:1 ratio on CPU threads. So would I run 4 threads and then when they're done the next 4 ?




#5111025 Design help on software rasterizer / renderer

Posted by lipsryme on 21 November 2013 - 11:17 AM

Yep that gave me some idea on how to go about it. Thanks guys.


#5110825 Design help on software rasterizer / renderer

Posted by lipsryme on 20 November 2013 - 02:24 PM

Like I said for fun/learning I guess. But I take it from your answer that it seems like trying to emulate the gpu is a bad idea?


#5110814 Design help on software rasterizer / renderer

Posted by lipsryme on 20 November 2013 - 01:24 PM

Hey guys, I've recently started working on a software renderer/rasterizer (for the fun of it / portfolio) and I've been thinking of trying to emulate the way the gpu pipeline works.

My questions are:

 

1. Is it even reasonable trying to emulate the gpu pipeline ? [e.g. IA-VS-PA-RS-PS-OM]

2. How would I design the shading part (vertex and pixel shading) to make it flexible enough that I can write shaders with different inputs and outputs

(currently having a hard time figuring that out rolleyes.gif ...) Or should I just settle for a static approach using only what I need for the type of scene I'm trying to display ?




#5095523 Performance optimization SSE vector dot/normalize

Posted by lipsryme on 20 September 2013 - 11:25 AM

@achild As you can see above I'm already using that intrinsic for the dot product rolleyes.gif

@hodgman I was thinking about that...I'll give it a try and see if it gives me some boost






PARTNERS