The simple answer is that everyone still sucks at writing multithreaded software. Games only took it seriously 10 years ago when Sony/MS released multicore systems. Universities still largely suck at teaching it -- introducing the shared memory paradigm and then stopping, as if that's the big picture. The number of junior engineers who answer with "Uh, use a mutex?" when asked how to make multithreaded game loops is embarrassing The message passing paradigm is supposed to be the default choice, but isn't anywhere near common enough in the programming zeitgeist.
Why on earth would you ever want to load on the main thread?
...
Certainly, if your rendering is bottlenecked, you're kind of screwed.
I mentioned one reason above -- D3D/GL have traditionally forced you to load graphical resources on the main thread. I wasn't talking about the GPU at all, above, just how the API forces CPU-side work to be done on certain threads.
Traditionally games and game engines have all been single-threaded too. You'd think that 10 years after Sony/MS forced multicore onto us we'd have adapted to that... but a damn lot of games are still written with a single-threaded mindset. Shitty threading support in extension languages like Lua doesn't help... My last company wrote all their gameplay code in Lua, which mean it was stuck on one thread, relying on the engine to magically try and push the heavy lifting onto all the other cores somehow.
Most engines now are based around the model of there being a singular "main thread", accompanied by N "worker" threads who's logic is basically "while(1) { Job j; if( g_jobs->pop(j) ) j.Run(); }". The main thread then tries to break up any parallel workloads into jobs, and hands them off to the workers. Some things are easy to port to this model, but anything dealing with very large and complex data structures (with a lot of synchronization, or random write locations) is a lot harder. Loading a game world and populating it with disparate entity types might fall into that latter category for a lot of games/engines.
Yeah, the actual part of streaming the bytes from disk into RAM should almost universally be asynchronous by now, but many engines still have massive amounts of deserialization work to do after loading the data, which may have stupid amounts of dependencies across multiple game systems. In our engine, we try to deal with this by keeping the on-disk and in-ram resource formats identical where possible, removing the need for most deserialization tasks, and breaking any remaining instantiation/deserialization work into clear phases so that dependencies between threads can be easily resolved and scheduled.
There's also a lot of different ways to design a main loop for a game. Often the simulation loop and the rendering loop are tied together somehow -- e.g. they're the same loop. The simplest loop runs one simulation step, followed by one render step. If either of them go over their time budget, the framerate suffers.
Different games/genres will have different types of loops. e.g. in my current game, simulation is partially decoupled from rendering, but a long simulation frame will still block the next graphics frame... This isn't an issue for me usually, as simulation is so fast that I often run 2+ simulation updates at a time, followed by one rendering update! However, this breaks down on loading screens if any one asset takes 10+ms to deserialize... causing that frame to go over budget.