Since all frame buffer access will hit the cache, rendering should be faster at the cost of
Assuming that frame buffer latency is actually a bottleneck in the first place?
If you're doing any sort of modern/fancy shading, then your shading time per pixel will likely be higher than your frame-buffer write time, so pipelining will make the buffer writes 'free'...
Deferred/Tiled (PowerVR style) triangle-binning works well on PowerVR, because it means that instead of performing frame-buffer writes to RAM, it can perform them to a tiny-but-super-fast local storage area (ESRAM/etc), and then later bulk-flush that local storage to RAM.
Implenting the algorithm without having the hardware to suit may not be the best idea...
Xbox360 has local/fast EDRAM, and XbOne and Intel GPUs have local/fast ESRAM, so if you're targeting them you may be able to find some benefit. These local storage areas are likely measured in the 10's of MB's though, so you could get away with using very large tiles.
One other benefit of PowerVR style tiling is that they perform polygon sorting to eliminate overdraw, and allow OIT. By sorting your mesh-chunks (front to back for opaque and back to front for translucent) you'd gain the same benefits but with slightly corser accuracy.
You'd want to perform this sorting on the GPU though, so I'd make use of indirect draws, rather than relying on thousands of individual CPU-driven draws.