Clustering rectangles on a grid

Started by
4 comments, last by Steve_Segreto 11 years, 4 months ago
I'm looking for some advice, or even just name-dropping of algorithms or topics to research, in regards to a particular clustering problem.
I did a class on data-mining about 5 years ago, but have pretty much forgotten it all... i.e. I don't really have any theoretical knowledge in the realm of clustering algorithms.


The inputs to my problem are:
* A grid of some (small) known size, say 50x50 cells.
* Within the grid, many rectangles of varying size/shape -- probably about 1000 of these, but maybe up to 65k max.
kdeoB.png


The problem is that I need to group rectangles together into clusters, where each cluster:
* has a maximum number of members -- e.g. 8 rectangles per cluster.
* the bounds of a cluster is the bounding rectangle of all it's members.
* the bounds should be as small as possible.
* each cell within a cluster's bounding region should contain as many overlapping rectangles (that are members of the cluster) as possible.
* the above implies that the bounding region should contain as few empty cells as possible.
Also, I'd like to produce as few clusters as possible -- i.e. clusters should contain as many rectangles as possible (up to the maximum member limit above).

Ideally, I'd like to be able to control/tweak a few heuristics, so I can balance how aggressively the algorithm minimizes empty cells vs minimizes cluster count vs maximizes overlap vs minimizes bounding size, etc...


For example, the above diagram could result in two clusters:
kCMko.png

Or if avoiding empty cells, it could split the pink/yellow/teal rectangles out into their own single-member clusters, like:
PWcVM.png


[size=2]Off-topic motivation:
[size=2]The background for this problem is optimizing light splatting in an experimental SM3.0 tiled deferred renderer. Each of the rectangles is the bounds of the screen-space tiles that a particular light will affect. If lights are drawn individually, the g-buffer attributes must be read once per light. However, if overlapping lights are grouped together into a 'cluster light', the G-buffer attributes can instead be read once per cluster.
Advertisement
A good source for data mining algorithms is WEKA.
It is open source so you may find in its source, under clusterers, something useful :)

It is all written in java.
Programming is an art. Game programming is a masterpiece!
This problem is very similar to the problem when updating a raster display of finding the optimum set of dirty rectangles that need to be blitted that minimize the blitting of pixels that haven't changed. So you might look up published solutions to the dirty rectangle problem (Although I googled around a little bit didn't really find anything)
Hodgman, what did you end up coming up with regarding an algorithm for this problem? (It kind of interested me, but didn't have time to think about it, so I now I just want to hear what the answer is smile.png )
Thanks for the replies, I remember using that WEKA software back in university!

@jwezorek, I ended up deciding it was a premature optimisation and basically limiting 'clusters' to the size of a cell laugh.png

In terms of the renderer that this was going to be used with -- The CPU collects a list of all lights that overlap each tile in screen space, and then these lists are broken down into groups of 8 (or less) light ID's per tile per pass. The GPU then takes each 'pass' of IDs (up to 8) and checks if those lights actually affect their tile (this time based on min/max depth, which the CPU didn't know), and outputs a compacted list of IDs (with the non-visible lights removed). Then when lighting each pass, the tile is discarded if the compacted list is empty, otherwise it loops through the (possibly shortened) list and does the deferred shading logic for up to 8 lights at once.
Looks like a breadth-first search problem

This topic is closed to new replies.

Advertisement