Sign in to follow this  
Hodgman

Clustering rectangles on a grid

Recommended Posts

I'm looking for some advice, or even just name-dropping of algorithms or topics to research, in regards to a particular clustering problem.
I did a class on data-mining about 5 years ago, but have pretty much forgotten it all... i.e. I don't really have any theoretical knowledge in the realm of clustering algorithms.


[b]The inputs[/b] to my problem are:
* A grid of some (small) known size, say 50x50 cells.
* Within the grid, many rectangles of varying size/shape -- probably about 1000 of these, but maybe up to 65k max.
[img]http://i.imgur.com/kdeoB.png[/img]


[b]The problem[/b] is that I need to group rectangles together into clusters, where each cluster:
* has a maximum number of members -- e.g. 8 rectangles per cluster.
* the bounds of a cluster is the bounding rectangle of all it's members.
* the bounds should be as small as possible.
* each cell within a cluster's bounding region should contain as many overlapping rectangles ([i]that are members of the cluster[/i]) as possible.
* the above implies that the bounding region should contain as few empty cells as possible.
Also, I'd like to produce as few clusters as possible -- i.e. clusters should contain as many rectangles as possible ([i]up to the maximum member limit above[/i]).

Ideally, I'd like to be able to control/tweak a few heuristics, so I can balance how aggressively the algorithm [i]minimizes empty cells[/i] [b]vs[/b] [i]minimizes cluster count[/i] [b]vs[/b][i] maximizes overlap [/i][b]vs [/b][i]minimizes bounding size[/i], etc...


For example, the above diagram could result in two clusters:
[i][img]http://i.imgur.com/kCMko.png[/img][/i]

Or if avoiding empty cells, it could split the pink/yellow/teal rectangles out into their own single-member clusters, like:
[img]http://i.imgur.com/PWcVM.png[/img]


[size=2][i][b]Off-topic motivation:[/b][/i][/size]
[size=2]The background for this problem is optimizing light splatting in an experimental SM3.0 tiled deferred renderer. Each of the rectangles is the bounds of the screen-space tiles that a particular light will affect. If lights are drawn individually, the g-buffer attributes must be read once per light. However, if overlapping lights are grouped together into a 'cluster light', the G-buffer attributes can instead be read once per cluster.[/size]

Share this post


Link to post
Share on other sites
A good source for data mining algorithms is [url="http://www.cs.waikato.ac.nz/ml/weka/"]WEKA[/url].
It is open source so you may find in its source, under clusterers, something useful :)

It is all written in java. Edited by kuramayoko10

Share this post


Link to post
Share on other sites
This problem is very similar to the problem when updating a raster display of finding the optimum set of dirty rectangles that need to be blitted that minimize the blitting of pixels that haven't changed. So you might look up published solutions to the dirty rectangle problem (Although I googled around a little bit didn't really find anything)

Share this post


Link to post
Share on other sites
Hodgman, what did you end up coming up with regarding an algorithm for this problem? (It kind of interested me, but didn't have time to think about it, so I now I just want to hear what the answer is [img]http://public.gamedev.net//public/style_emoticons/default/smile.png[/img] ) Edited by jwezorek

Share this post


Link to post
Share on other sites
Thanks for the replies, I remember using that WEKA software back in university!

@jwezorek, I ended up deciding it was a premature optimisation and basically limiting 'clusters' to the size of a cell [img]http://public.gamedev.net//public/style_emoticons/default/laugh.png[/img]

In terms of the renderer that this was going to be used with -- The CPU collects a list of all lights that overlap each tile in screen space, and then these lists are broken down into groups of 8 (or less) light ID's per tile per pass. The GPU then takes each 'pass' of IDs (up to 8) and checks if those lights actually affect their tile ([i]this time based on min/max depth, which the CPU didn't know[/i]), and outputs a compacted list of IDs ([i]with the non-visible lights removed[/i]). Then when lighting each pass, the tile is discarded if the compacted list is empty, otherwise it loops through the (possibly shortened) list and does the deferred shading logic for up to 8 lights at once. Edited by Hodgman

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this