The problem gets more complicated every time
a lot of computation using their own memory space
No problem.
Nobody else cares what a thread does with data that isn't shared (and even if it's shared, it only matters if at least one thread writes to the area).
the result of the computation needs to be stored back into a very large array in memory.
So basically, 4 threads do some lengthy operation and output 4 result values once. When they are available (presumably you block on an event?) you read them. No problem.
Acutally, you don't even need atomicity for that to work reliably. Since you must synchronize with a semaphore or an event or a cond var anyway, that's already more than good enough.
random locations and can partially overlap (it depends from what the user do)
Problem. This isn't so much a threading or atomicity thing, however. It's a correctness thing.
If you add those results to some "random ranges" and these ranges overlap, you will add two results instead of one, atomic or not. If that is not allowable (or even desirable) then your code will not work correctly.