FWIW, GPU based scene traversal and culling is a state of the art engine design topic.
I've been working in graphics engines for 10 years and it's the kind of thing that would cause me to sit down for a solid week of planning on. There's a bunch of GDC presentations from people who are currently doing it, but you're not going to find a tutorial that will hold you hand through it yet.
The short version though -- you're going to want to merge as much of your pipeline state (fixed function / shaders) and resources as possible. That means using texture atlases, texture arrays, and giant buffers that hold geometry for many meshes at once. This will let you reduce the draw count substantially. Then you're going to want to split every mesh into many smaller clusters, which are associated with different culling structures such as bounding volumes and normal cones for backface culling. Then you write a CS to cull your clusters and produce a list of visible clusters. Then you compact that list. Then you write a CS to step inside each cluster and cull the triangles that it's made of and produce a list of visible triangles, and then compact that list. Then you use draw-indirect to draw your list of visible triangles.
Right, I guess I just can't mentally work out how that operates given the code that you show, because (for example) the writing code has appeared to add a stringtable at the end that didn't exist prior to serialisation. No doubt there's something tied up in the StringOffset and Offset types that does the magic!
Yes when serializing, I had a C# string object that basically had to become a char* into the Foo structure (or my equivalent: StringOffset). The runtime Foo structure has a StringOffset, which is basically a u32 -- with 0 being "null" and any other value being "there are some char's at (((char*)this) + *this)". The serialization code initially writes a zero for this field into the stream, but retains the stream cursor inside the StringTable strings helper object.
After writing out all the structures, strings.WriteChars(w); writes the actual char contents of the C# string objects to the stream, at patches up all the offsets (which were written as placeholder 0's earlier) to point to the start of each string's char data. The runtime isn't aware of the existence of a string table, it just knows that it has an offset to some chars, and the tool has made sure that when the runtime follows this offset it will find some zero-terminated chars.
Likewise, an array of Foo structures is written out initially, each of which must contain a link to a Bar structure. The Bar structures can't be written out on demand, as we're in the process of writing a contiguous Foo array. So, the Foo::bar member is written out with a placeholder value of zero, but the stream cursor is retained in tocBars (with tocBars = w.WriteTemp32();). After writing the contiguous Foo array, all the Bar structures are written out, and the Foo::bar members are patched up with the appropriate offsets.
The string table could be written before the Bar structures, making the two arrays non-contiguous with each other (i.e. strings.WriteChars(w); could be moved in between the two for loops) and the runtime would continue working unaffected, since as long as the offsets are correct, the runtime will interpret the data correctly.
This code is basically acting as a hand-written malloc, where the order that they go into the stream is deciding their memory address at runtime... which can be a good thing, as the serialization code can be written with knowledge of the data access patterns, e.g. tweaked to place "hot" data close together.
You say this is 2 separate problems, but I don't see it that way, because even "largely-static data" can have a "complex, richly-typed, and dynamic [..] in-memory representation" (forgive the quote-juggling). If I have a 3D model, with vertices/indices/materials/shaders/textures/animations, some of which are obviously composed of the others, others merely associating with the others, that's all static data (as far as running the game is concerned), but is going to have a relatively complex in-memory representation. No?
I believe when ApochPiQ says "richly typed and dynamic", he means, regular, mutable C++ objects full of pointers and vtables and moving parts and things that grow and shrink and change. This stuff obviously isn't applicable to a static data format, as typically the whole file is loaded with one (or a handful of) malloc's, so moving parts aren't very feasible. That said, I know some middleware that contains a nice big blob of 0x00000000 values in their files (which compress to nothing), so that after your engine loads their data file, they've already got a memory area that they can give to their internal allocator to placement-new complex C++ objects within.
As to your example though: an asset that contains vertices/indices/materials/shaders/textures/animations is static, and though it can have a complex in-memory representation, it is simple to handle such complex formats using this low-level serialization technique. That kind of asset data is the perfect candidate for this kind of system.
Let me steer this towards a practical question then; for those using these low-overhead methods, what are you doing to ensure that the layout of data is exactly what you expect - endianness, platform dependent types, 32bit vs 64bit, alignment (inc. SIMD requirements), padding, etc?
There's a whole load of approaches.
Do it manually, like I posted earlier. This is actually way simpler than it sounds. For each field in a struct, you have a matching Write call. You know the padding/alignment rules and explicitly account for them (Write the invisible padding fields when required). When you call something like Write32 (which matches u32 member;) it can internally endian swap if required (hint: it's not, the PPC consoles are dead ). Same for pointers, it can write 32 or 64 bits depending on the platform. If you're writing an asset that will be used on Win32 and Win64 (hint: you're not, nobody supports Win32 any more), then you can just write 8 byte pointers and use PadTo64<T*>::Type m_ptr; instead of T* m_ptr; in your structures and suffer the waste in your Win32 build. In my case though, I prefer to not serialize pointers if possible, and leave them as offsets. Include static_assert(sizeof(T)==42) in the runtime and assert(writer.cursor == cursor_at_start_of_struct + 42) in the tools to act as reminders to fix both code-bases when the data format changes.
Use the metadata from your compiler. If you're compiling your code with clang, then also use clang to parse your code and generate a description of your structures automatically. Write a serialization library that can automagically write data into the memory format that it's discovered by parsing your code.
Use the metadata from your binding system. If you've got your structures bound to a scripting language like Lua, etc, then you've probably already annotated your structures somehow, as a DIY reflection system for C++. Give that binding data to a serialization library that can automagically write data into the right memory format.
Don't manually write the C++ structures at all. Author your structures in a different data definition language / schema, and have it generate the C++ headers that will be used by the runtime, and also generate the serialization code for your tools.
until windows 8, you had to install dx to get both the d3d and d3dx files. this was usually done by installing some game that included the redistributable. as a consequence, pretty much everyone who had ever installed a dx game had dx installed already. so there was almost never a need to install dx. instead of increasing the game size by adding the dx redistributables which were almost never needed, and slowing down the install by running the dx install, which basically did nothing because it was already installed, i would simply provide links to the redistributable for those very few users who didn't already have dx installed. things haven't really changed much with win8 and win10. anyone who's ever run a dx9 game has the runtime installed already. and for the few who haven't, the choices are to provide a link to the files, or to increase the file size and add an extra setup step unnecessarily for everyone else. as dx9 games become less common, and win 8 and later become more common, adding the runtime setup to the game would probably be a good idea. but at this point in time, i think we're still at the point where including it would be an inconvenience for the majority of users - as they've installed a dx9 game before.
On all versions of Windows, you need to install the appropriate D3D runtimes to ensure you've got the right version of the d3d9 and d3dx9 dlls as used by a particular game. Different games may also require different versions.
Simply assuming that your customer has probably installed the right D3D9 version already because they play games, therefore have likely installed a game that happens to use the same version of the runtime as you is not very professional™. It's the equivalent of not vaccinating because the rest of the community will probably give you herd immunity, or eating dog food because it's probably safe since the manufacturer is forced to maintain human-grade processing after they got sued by that guy who chose to eat dog food. If I met a doctor or a chef doing such things, the impression they would give me is that they are not very professional™.
Sure, if you're still targeting people on dial-up internet, then having a seperate RAR or 7Z archive, with no installer or runtimes (in order to minimize download size to the extreme) is a very nice thing to do... If you're afraid of file-size, there's also the online version of the installer, which is under a few hundred KB.
In the general case though, it's just a customer service nightmare waiting to happen. If someone runs you game and it complains about a missing DLL, they're not going to blame themselves and they're not going to blame Microsoft, they're going to blame you for writing "such a buggy game".
If you're distributing via a modern platform such as Steam, bundling D3D9/D3DX with your game is as simple as ticking a single checkbox in the setup page for your repository. Steam will avoid downloading it if another game has already forced it to, or will download and install it automatically when required.