I've used LWJGL, LibGDX and OpenTK for 3D before. You place vertices into memory, get the handle, draw the model, change lightning when needed, load vertice and fragment shaders. LibGDX even allows programatical mesh adjustment, building and moving mesh parts around.
Modern 3D renderers are so very, very complex compared to that.
Rendering a modern AAA game scene is _hard_. There are billions of triangles in a scene, multiple lighting passes, heavy post-processing, etc. If you were to just toss everything into vertex buffers and make draw calls the naive way you're describing, a 4x SLI GTX 1080 rig couldn't handle half the newer games on their lowest settings. Getting things to look as good as they do and still actually run on mere mortals' machines is an intensely complicated process and requires a number of special tricks, passes, techniques, and content pipeline alterations.
To get a bit better of an idea, just consider this small handful of techniques a modern game might use: procedural skinning, tiled deferred rendering, atmospheric lighting, global illumination, physically-based materials, dense foliage rendering, parallax mapping, temporal reprojection, fluid rendering, physics-enabled particles, HDR tone mapping, etc. A new graphics engine might have added any number of those (or other) techniques, or completely rewritten how many of the techniques work for more efficiency/scalability or better content iteration.
And then do all that (and tons more!) in an open-world environment where the core game data for the entire world can't even fit into main memory. And make it run at 30+hz on a PS4/XBone's hardware (somewhat equivalent to a 2012-era laptop). :)
Maybe there are some 2D "graphical engines" you know of?
Nothing comes to mind. There are a lot of interesting stylized 2D games out there that certainly require some neat tricks, but they still don't really rival what a modern 3D renderer has to do. :)
2D still deals with a significantly smaller number of objects and a much more constrained view frustrum than 3D. In 2D, a player view on a mountain top requires just as many tiles/whatever to be drawn to fill up the screen as is required when standing in the corner of a basement. In 3D, standing on a mountain top requires rendering miles upon miles worth of scenery with atmospheric effects and dense forests or seas or the like.
The number of objects/effects in more complex 2D games just doesn't scale up the same way that they do in more complex 3D scenes.