I understand that the idea of OIT is to not worry so much about ordering. But does it still matter in some cases? And is there a way to preserve it if a certain ordering is desired?
"Order independent" refers to submission only, you still need to sort the stored fragments either afterwards or while inserting them in per pixel lists to get correct results.
Weighted blending is an exception, but i guess you need luck and a lot of tuning time to make it 'good enough'.
I see 3 methods of implementing per pixel lists:
1. max N fragments per pixel - because N is small it should work best to sort ant insertion to keep the closest N fragments.
2. max N fragments per pixel block, e.g. 8x8 pixels has max 256 samples. If you are sure 256 is enough, the method may be faster requiring less memory for more fragments.
3. max N fragments per frame, implemented with 2 pass method: 1st pass increases just a counter per pixel - prefix sum so each pixel gets its exactly necessary number of fragments reserved, 2nd pass renders to those lists, sorting can be done afterwards. Best memory utilization and less pressure on atomics, but 2 passes.
All of this is still very demanding, i'd consider raytracing as well - perticles means you could do it with a ray - sphere test.
I'm constantly moving away from OOP over the years. The idea of inheritance never made much sense to me - it just complicates things and forces you to make decissions about software design. To me that's just blah blah and i prefer to spend this time on solving real problems.
So i ended up using C with classes style, but i moved away from that too, mainly because of this:
Class member functions hide some of the data they use because you don't know what member variables they access without looking at the implementation.
This way it's hard to see data complexity, which is important to optimizing / refactoring.
Often i ended up making member functions static, forcing me to add all data to the function parameters - just to see how many there are (ALWAYS more than you would expect).
Next i realized that static member functions can be used from anywhere, how practical.
So why did i still using classes?
My answer was simply: To group related functions together by 'topic', so i cand find them somehow.
But there is something better to do this: Namespaces.
With namespaces it's possible to group stuff in hirarchies without any restrictions or problems known from inheritance.
Today i create a classes very rarely, using them only as an interface to a large system which is implemented mostly procedural under the hood.
But i still use a lot of small structs with member functions for trivial functionality like indexing arrays or un/packing.
Software occlusion culling (front to back dependency -> serial algorithm -> not good to parallelize)
Animating 100 characters
Physics simulation (100 islands of rigid bodies in contact)
The easy route would be to use one thread per task - maybe suboptiomal, but good enough if your speedup is about the number of cores.
The hard and error prone route would be trying to parallelize the occlusion culling - ending up with a small speedup for a lot of work and debugging time.
The practical route would be: One thread for occlision culling, the others are free to parallelize a job system processing all characters and after that all physics islands.
If a single character would be very fast, we would coose to process e.g. 4 characters per job to hide the synchronization costs.
std::async and other high level functionality can be used to achieve this, my approach using atomics is more the low level kind.
Looking at http://en.cppreference.com/w/cpp/thread/async i tend to think: There is no control on the creation of threads (which is expensive), there is no guarantee multi threading is used at all. So i'll probably never use it, but probably it's just a matter of personal preference.