I did a class on data-mining about 5 years ago, but have pretty much forgotten it all... i.e. I don't really have any theoretical knowledge in the realm of clustering algorithms.
The inputs to my problem are:
* A grid of some (small) known size, say 50x50 cells.
* Within the grid, many rectangles of varying size/shape -- probably about 1000 of these, but maybe up to 65k max.

The problem is that I need to group rectangles together into clusters, where each cluster:
* has a maximum number of members -- e.g. 8 rectangles per cluster.
* the bounds of a cluster is the bounding rectangle of all it's members.
* the bounds should be as small as possible.
* each cell within a cluster's bounding region should contain as many overlapping rectangles (that are members of the cluster) as possible.
* the above implies that the bounding region should contain as few empty cells as possible.
Also, I'd like to produce as few clusters as possible -- i.e. clusters should contain as many rectangles as possible (up to the maximum member limit above).
Ideally, I'd like to be able to control/tweak a few heuristics, so I can balance how aggressively the algorithm minimizes empty cells vs minimizes cluster count vs maximizes overlap vs minimizes bounding size, etc...
For example, the above diagram could result in two clusters:

Or if avoiding empty cells, it could split the pink/yellow/teal rectangles out into their own single-member clusters, like:

Off-topic motivation:
The background for this problem is optimizing light splatting in an experimental SM3.0 tiled deferred renderer. Each of the rectangles is the bounds of the screen-space tiles that a particular light will affect. If lights are drawn individually, the g-buffer attributes must be read once per light. However, if overlapping lights are grouped together into a 'cluster light', the G-buffer attributes can instead be read once per cluster.






