FWIW, GPU based scene traversal and culling is a state of the art engine design topic.
I've been working in graphics engines for 10 years and it's the kind of thing that would cause me to sit down for a solid week of planning on. There's a bunch of GDC presentations from people who are currently doing it, but you're not going to find a tutorial that will hold you hand through it yet.
The short version though -- you're going to want to merge as much of your pipeline state (fixed function / shaders) and resources as possible. That means using texture atlases, texture arrays, and giant buffers that hold geometry for many meshes at once. This will let you reduce the draw count substantially. Then you're going to want to split every mesh into many smaller clusters, which are associated with different culling structures such as bounding volumes and normal cones for backface culling. Then you write a CS to cull your clusters and produce a list of visible clusters. Then you compact that list. Then you write a CS to step inside each cluster and cull the triangles that it's made of and produce a list of visible triangles, and then compact that list. Then you use draw-indirect to draw your list of visible triangles.