Pixel shaders are run in 2x2 blocks.
When a triangle covers one pixel; the pixel shader wastes up to 75% of its power in dummy pixels. If this triangle also happens to be occluded by a bigger triangle in front of it, it becomes unnecessary.
Single-pixel triangles are also a PITA for concurrency since there might be not enough pixels gathered to occupy a full wavefront; and when you do, divergence could be higher (since there could be little space coherence between those pixels).
Also triangles that occupy no pixels (because they're too tiny), if too many, can drown the rasterizer and starve the pixel shader.
Due to this, GPUs love triangles covering large areas, and hate super small triangles. Note that on AMD's GCN, the compute shader could be ran async while rendering the shadow maps (which barely occupy the compute units), thus making this pass essentially "free".
It's like a Z Prepass but on steriods. Because instead of only populating the Z-buffer for the 2nd pass, it also removes useless triangles (lowering vertex shader usage and alleviating the rasterizer).
So yeah... it's a performance optimization.