No, there isn't very often a need for it -- it may be one way to solve the problem, but I assure you there's probably a single-threaded solution as well.
You can still benefit from more threads than cores; here is an extreme example: a single core/single lane machine.
Is there ever any need for multithreading on such a machine?
But it provides performance and behavior you can't achieve in a single threaded model.
as in --------Please wait....scene loading---------...
For example, background loading of scenes definitely is possible in a single-threaded game...
Threading should not be used for I/O-bound tasks (only processing-bound tasks). The OS is designed to handle I/O-bound tasks asynchronously without using extra threads already -- use the OS's functionality instead of reinventing it with extra threads.
Sure; suppose you have a prep thread that is waiting on I/O or some other condition, like a FIFO being less than half full. It can yield while waiting to another thread. If you are single threaded, then that single thread eats all the latency in your model, and latency can't be hidden at all..
If a single-theraded program wanted something to occur when a FIFO was half-full, it would likely use a callback that is triggered by the push function.
But if you're writing a high-performance real-time system (such as a modern game engine), then you want a small number of threads that hardly ever yield to get predictable performance. Yielding a thread on Windows is totally unpredictable in length, with your only guarantee being that it's unlikely to be longer than 5 seconds (although yes: that case shouldn't occur unless you're massively oversubscribed)...
If you have threads that never yield at all, then each thread will try to consume a complete core. (Depending on the O/S, it will still occasionaly lose bandwidth and be parked, but it will always be crying for attention. Sometimes that is a necessity, but if such tight polls or freewheeling threads can be minimized, modern O/S can manage lots of well behaved yielding threads far in excess of the number of available cores/lanes...
What's this "two-step allocate and release model"? Is that specific to your OpenGL resource FIFO, or are you talking about thread-shared FIFOs in general?
Headsup with the design of the threadsafe FIFO; it must use a two-step allocate and release model, because there is finite execution time between when a handle is pulled and when it is prepped or consumed. But that is easily done.
A FIFO object is basically tracking a head and a tail in a circular fashion, with some maximum FIFO size. The FIFO should provife booleans for IsFull, IsEmpty, etc.
Functions like IsFull and IsEmpty are nonsensical in the context of a shared structure like a multiple-producer/multipler-consumer FIFO -- they can never be correct. It makes sense for Push and Pop to be able to fail (if the queue was full or empty), but simply querying some state of the structure, such as IsFull is useless, because by the time you've obtained your return value, it may well be incorrect and any branch you make on that value is very dubious.