ummm, please read at least the wikipedia article on the given topic BEFORE trying to implement something.
To answer your questions:
They didn't save the diffuse color from step 1, and so they can have a smaller G-buffer?
They didn't save it because the given technique (deferred lighting) is like that. Yes this way the G-buffer size is only 64 bits per pixel which is VERY friendly. And yes you do need to render the geometry twice, BUT:
-since you optimize rendering you'd only render the content of the view frustum twice
-this geometry would be perfectly occlusion culled because of early-z culling
-therefore you don't sample the material buffers too much, which makes this (3.) pass easy on the GPU
Thanks for the comment, it was interesting. I didn't know about the distinction between "deferred shading" and "deferred lighting". I suppose the key point is that "deferred shader" renders the geometry twice. Though I think it should be feasible to save the diffuse color from the first phase these days, shouldn't it?
see wiki. otherwise if you save the diffuse colors, then it would be feasible to save everything that belongs to shading right? then you'd be doing deferred shading. Therefore read the wiki article

the point is if you do deferred lighting you'd be making the G-buffer 64 bpp, whereas if you did deferred shading then it'd be at least 128 bpp, BUT you don't need to re-render the geometry.
In my case, I have a case with having a large voxel based geometry that is difficult to cut down. But on the other hand, being a voxel based world, I don't need that much GPU memory for pre defined textures.
so comes the answer, when you ask which is better? well the one that suits your case. In your case I came to the conclusion that you're having trouble drawing lots of stuff. This means that deferred shading would suit your case better.
You say the cryengine 3 uses an 8-bit stencil with the depth buffer. What is that used for?
well it's rather 24 bits of depth with 8 bits of stencil. Depth is used for storing the distance between the viewer and the given pixel (this is why it is called depth buffer). Stencil is used for masking out unnecessary drawing area. This way you can optimize stuff out.