That is correct. Sort only by shader and textures. If you add depth to that test your FPS will go down.
A few slides were a bit ambiguous. You want to align to 4 bytes manually, or let GPU do it for you?
StiX already answered but I wanted to give some more detail and an idea of how critical this is.
When you call glDrawElements() or glDrawArrays() there are a few things that can cause it to take a slower path which causes it to copy your entire vertex buffer to a new location in “GPU” RAM (of course there is no such thing in a unified memory model but it is easier to think of memory managed by the driver as GPU RAM).
One way is to simply not use VBO’s. Another way is to pass misaligned data (attributes not aligned to 4 bytes).
These copies obviously involve a lot of extra cycles, even though it uses an optimized memcpy() when possible (it can’t when realignment is necessary), and to give you an overhead of just how much that is, on an average game it means the difference between 20 and 45 FPS.
In going into extreme detail, if you benchmark with Time Profiler and you see a function called glDraw[
Arrays|Elements]_ACC_ES2Exec() taking a large amount of time, check your vertex alignments.
If you see glDraw[
Arrays|Elements]_IMM_ES2Exec() taking a lot of time then your problem is likely the lack of a VBO.
L. Spiro