For the first solution is there an algorithm to speed up?
You could start with all the triangles in your scene and remove a triangle when you've fully explored all the cells that might contain it, then you have less and less triangles to process as you work through the algorithm. I would personally start by walking the grid along the triangle's edges, and then exploring the cells "in between". Your code might end up looking somewhat like a 3D "scanline" (or, more appropriately, scanplane) algorithm.
A more efficient method could be to partition your triangles into an octree (this might require your grid to be a power of two). Once you've done that it's a simple matter of looking up each triangle at the right depth (depending on your grid's resolution) and binning it inside the corresponding cells. It might be possible to do it all in one step by building the grid at the same time you build the octree, but I can't think of a way right now. This seems to handle enormous triangles which overlap lots of cells much better than the first approach.
There's probably a better way to do it, but that could be a starting point. I'd go with a divide and conquer approach anyway, anything else is likely to not scale well with the number of triangles.