Cross-platform, no. But on NVidia, bindless can easily exceed the performance you get from VAOs, and the reason is intuitive.
There is no viable alternative to VAOs though, which is why we are all so confused.
No you don't. Having a bazillion little VAOs floating around with all the cache misses that go with them isn't necessarily the best approach. Best case, use bindless (does an end-run around the cache issues, but is NV-only) or a streaming VBO approach with reuse (which keeps the bind count down, and works cross-platform).
You ideally just "enable" your 5 attribs, and then proceed to render 300 VBOs. You can't. It sucks.
Instead I have to create 300 VAOs, with 5 "enabled" attribs each. Then render 300 VAOs.
If you have a number of separate, preloaded VBOs, you absolutely can't/refuse to make them large enough to keep you from being CPU bound for some reason, and you can't/won't use bindless, then fall back on (in the order of the performance I've observed on NVidia) 1) VBOs with VAOs, one per batch state combination, 2) client arrays, or 3) VBOs without VAOs or with one VAO for all.
In case it's not obvious, I do what gives me the best performance. I'm not a core purist.
Its really unfortunate AMD hasn't stepped up and supported bindless in their OpenGL drivers, at least for launching batches (vertex attribute and index list specification).