As far as I have understood it, Uniform Buffer Objects were created exactly for the need of bulk-updating multiple uniforms in a single call. Submitting a single UBO that contains mere five matrices should be a trivial workload. Refactoring that to multiple UBOs e.g. where one would contain model matrices and the other would contain projection matrices like was suggested above sounds like a heavy antioptimization - don't do that! (unless profiling suggests that two UBO uploads are faster than one in this case :o)
Or perhaps the discussion has confused the use of uniforms with a call to glUniformMatrix4fv without UBOs, and UBOs themselves. If you are not using UBOs and are manually updating uniform matrices with glUniformMatrix4fv, the there is benefit in optimizing to not redundantly change matrices that haven't changed.
Hodgman's suggestion is the sanest here:
- Stop measuring FPS, but instead start measuring milliseconds. This will give better sense of the actual difference in workload.
- Use a CPU profiler with the old code and the new code to compare where the extra added time is being spent. E.g. AMD CodeAnalyst is good (works on non-AMD CPUs as well). If it turns out to not be a CPU-side slowdown (the profiles are identical), then use e.g. nVidia Parallel Studio or AMD CodeXL to debug and profile the GPU side operation.