# 3D optimization in barycentric technique of rasterisation

## Recommended Posts

I am coding the rasterization of triangles by the baricentric coordinate method.

Look a lot of code and tutorials that are on the web about the optimization of this algorithm.

I found a way to optimize it that I did not see it anywhere.

I Only code the painting of triangles without Zbuffer and without textures. I am not comparing speeds and I am not interested in doing them, I am simply reducing the amount of instructions that are executed in the internal loop.

The idea is simple, someone must have done it before but he did not publish the code or maybe in hardware it is already that way too.

It should be noted that for each horizontal line drawn, of the three segments, you only need to look at one when going from negative to positive to start drawing and you only need to look at one when it goes from positive to negative when you stop drawing.

I try it and it works well, now I am implementing a regular version with texture and zbuffer to realize how to add it to this optimization.

Does anyone know if this optimization is already done?

From line 92 to 155

##### Share on other sites
46 minutes ago, pabloreda said:

It should be noted that for each horizontal line drawn, of the three segments, you only need to look at one when going from negative to positive to start drawing and you only need to look at one when it goes from positive to negative when you stop drawing.

Are you talking about halfspaces?  and by segment you mean the edges of the triangle?

46 minutes ago, pabloreda said:

or maybe in hardware it is already that way too.

Hardware is done more like this: http://forum.devmaster.net/t/advanced-rasterization/6145

##### Share on other sites

yes, the halfspaces.

the code in inner loop is

   for(int x = minx; x < maxx; x++)
{
if(Cx1 > 0 && Cx2 > 0 && Cx3 > 0)
{
colorBuffer[x] = 0x00FFFFFF;<< // White
}

Cx1 -= Dy12;
Cx2 -= Dy23;
Cx3 -= Dy31;
}

don't need calculate the 3 vars, only one, when you know, for example x1 for in and x2 for out

			  while (Cx1<0)
{ Cx1-=Dy12;Cx2-=Dy23;pVideo++; }
while (Cx2>0)
{ Cx2-=Dy23;*pVideo++=Color; }

Edited by pabloreda

##### Share on other sites

Your optimization moves complexity to the outer loop, similar to the sorting of edges to walk them along for classical scanline conversation. This likely makes your algorithm equivalent to it, you just got there from another point of view. If you would optimize your first while away (you should - it just counts until real work can be started), then it boils down to classical scanline conversation inner loop, and outer loop should be similar to the edge setup / sorting we see in classical scanline conversation.

(I can't read you code because i don't know that language, so i'm not 100% sure)

Note: Scanline conversation does not process 'useless' pixels outside the triangle but inside the bounding box. And it can be easily extended to polygons. If you have planar polygons that's another big win because edge setup is expensive. When not using SIMD i think it's still the fastest in general.

##### Share on other sites

JoeJ:

I don't need sort this order in every scanline, only in setup stage, then make 2 outer loop, top and bottom triangle (like clasical scanline algo)

Useless pixels outside is the other room for optimization, this algo not scan rigth useless pixels

##### Share on other sites
37 minutes ago, pabloreda said:

JoeJ:

I don't need sort this order in every scanline, only in setup stage, then make 2 outer loop, top and bottom triangle (like clasical scanline algo)

That's what i assumed.

You could use multiple variants for different sized triangles, trading setup cost vs. per pixel cost.

For large triangles do the sorting and complex edge setup, for tiny triangles do all per pixel, and in between your optimization might do best.

16 hours ago, pabloreda said:

Does anyone know if this optimization is already done?

Back the days when software rasterization was used triangles where large and polys dominated over triangles, so classic scanline conversation was surely faster. So even halfspace method is easier to implement, i doupt it has been used a lot or at all for games.

Today you would look for a simd implementation similar to infinisearchs link. Your idea would make sense here, for instance if you process 4*1 spans instead 2*2 blocks. I assume GPUs use 2*2 blocks because their framebuffers are tiled and Z-curved, while on CPU 4*1 spans should do much better.

The only remaining usecase of software rasterization today i can think of is software occlusion culling. But there again you likely use large polys approximating good occluders, and more important you likely test a whole scanline vs. a spanlist instead individual pixels. I made such a system and classic non simd scanlines work best here.

##### Share on other sites

What language did you use btw, I've never seen it before.

##### Share on other sites

a colorforth dialect. is my lang

##### Share on other sites

for the record.

The problem here is I need the 3 vars because this is amount of parameter in vertex (z,u,v) to calculate in every point.

then, this optimization not have future!!

thank's to all

## Create an account

Register a new account

1. 1
2. 2
Rutin
19
3. 3
JoeJ
16
4. 4
5. 5

• 30
• 22
• 13
• 13
• 17
• ### Forum Statistics

• Total Topics
631700
• Total Posts
3001800
×