DLs are still faster on NV hardware than VBOs, but only for static geometry.
To quote your own reply to V-man above:
"It seems you are stuck in pre-OpenGL3 era. Just kidding!"
Display lists are legacy. It is irrelevant if they are faster (which is highly debatable), they are deprecated and should not be used in non-legacy code. While both NV and AMD have promised to continue their support in the compatibility profile, there is absolutely no guarantee that they will be optimized as much as technically possible in future drivers. VBOs, however, are guaranteed to be the most optimal way to submit geometry.
Oh and while I can't go into details for legal reasons, at least one major GPU manufacturer applies a heuristic during DL compilation which will transform larger geometry chunks from a DL into an internal VBO... So much for DLs being faster. Usually the difference comes from the usage of non-optimal data formats and alignment in VBOs.
Whenever I use triangle strips they are indexed. It enables efficient rendering with minimal index buffers. Generally, it is hard to generate optimized indices for triangle strips, that's why indexed triangle lists are usually proposed as the best solution. They have much larger index buffers, but are easier for creation.
It is obviously implied that indexed tri-lists are preprocessed for maximal vertex cache efficiency. Comparing indexed strips (which require pre-processing) to "easy", raw, random and cache-thrashing tri-list data is naive. As always, the real-world is a bit more complicated. Performance statistics comparing unindexed tri-strips, indexed tri-strips and indexed tri-lists tremendously depend on the mesh topology, vertex shader complexity, deepness and strategy of the vertex caches (ie. on the GPU), pipeline bottlenecks and many more factors. Given common closed models with typically connected topology (ie. NOT terrains or similar regular quad-grids), running on PC consumer hardware with deep caches, properly preprocessed tri-lists often allow for a much better cache usage. Embedded devices, on the other hand, tend to prefer
unindexed tri-strips.
Why it is implemented so, and not on glFenceSync() don't ask me.
Because implementing implicit flushing (or calling glFlush) directly after glFenceSync is inefficient if the issuing thread has any other work to do after setting the fence. In this case, it is better to continue issuing commands (possibly with a glFlush at some later time) into the fifo after glFenceSync in order to avoid stalling.
It seems you are stuck in pre-OpenGL3 era. Just kidding!
There are very few scenarios where these synchronisation primitives actually increase performance, while there are tons of scenarios where they will make your performance worse. Top of the line next-gen 3D engines can be written without ever using any of these primitives.