Out of curiosity, how do you turn the all-to-many comparisons into an O(n) problem? By definition all-to-all or all-to-many problems are complexity O(n[sup]x[/sup]).
Barnes Hut (or variation of tree code) or Fast Multipole Method (or whatever multipole representation you choose). They are both approximations by definition but have been shown to work well in practice, even for actual astrophysical simulation. There is also a rant from Barnes on how CS experts could never come up with such clever optimization...
BH is nlogn. FMM is O(28n). Now you can solve nlogN = 28N and get that, for practical purposes (CS academia would cry foul) you can assume that both are O(n), but for less than 2^28 elements (billion particles, ~4-8GB of memory, vastly beyond reasonable non-clustered problems), BH has lower constant factor. Tree construction of BH can be performed what is essentially a radix sort, which is O(n) and benefits greatly from Morton ordering. FMM uses implicit grid.
BH is branch and bound, runs well on CPU, easy to parallelize, benefits from uneven distribution. FMM prefers even distributions, but is best suited for GPU. BH variations using kd-tree instead of quad or oct-tree exist, but do not benefit much from GPU.
Rendering is a matter of available hardware.
As said, it's doable. But either method will use all the computing power you throw at it, so iPad or iPhone are not ideal targets, but likely are doable for small n.
For trivial cosmological simulation, create n particles (of same or slightly different mass), vary initial velocity a bit and let it run. Use the information provided by methods above (meta nodes in BH or coarse grid of FMM) to resolve collisions. Tweak the algorithm a bit to compensate for very large objects.
And you get a rudimentary simulation of solar system evolution. n=1,000,000 is perfectly doable on a mid-range PC, especially if you take into account numeric ranges and implement everything using integers (initial particle mass = 1, sum(particles) < 2^64). Hence the "specialized programming knowledge", you need to aggressively optimize at expense of accuracy (works for toy problems). Simulating 2D only reduces constant factors further.
A possible tweak would be to run multi-level simulation. "gas" particles are simulated by one simulation (perhaps even fluid solver) while "things that clumped" are simulated by another, making collision between tangible objects simpler.
For accurate simulations, CPU does become a problem, since individual interactions need to be resolved within a certain error, but those run on clusters for days, so it's a different problem.