Sign in to follow this  
changoo

Multi-core rendering

Recommended Posts

Hello all,

I have been researching the Larrabee architecture in an effort to learn how to design an efficient multi-threaded software renderer. I've also found Nate's version of the half-space algorithm, which is similar to Intel's approach. However, with Larrabee, vectorization is a huge part of their algorithm and everything is done in batches of 16 wherever possible. On a typical desktop machine where SSE supports 128bit vector units, it's not the same story.

What I'm interested in is a good algorithm for mapping triangles to tiles (and bins, as Intel calls them).
I wonder what a good approach would be?

Obviously there's the brute force approach: test each tile against the triangle and add it to the bin if it touches. That's not very efficient.

One thing that's been coming to my mind lately is a quad-tree approach, where the polygon is first tested against the entire screen split into 2x2 rectangular tiles. Then split again, etc. at the leaf level it would have to account for the square tile size (be it 128x128 or 64x64) and compare individually.

I could also mimick the technique used for Larrabee, just without the 16 wide vectors. Basically then I'd be testing each tile's trivial reject and trivial accept corners with the half space equation for each edge. Trivially accepted tiles don't need to be rasterized (just drawn), trivially rejected tiles can be ignored, and partially accepted tiles are rasterized.

Have you guys thought about this much? What do you think?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this