One of the problems when you get a lot of CPUs with their own cache and possibly their own RAM pools is that your data is no longer uniform. When you start working with one set of data on one CPU, if the thread moves to another CPU, you have a serious performance penalty as the data is moved over to it. On the CELL, the use of private address spaces on the SPEs ameliorates this, but brings its own management complexities. One proposed solution to this is maintaining a global address space, with everything accessible to all processors, but making segments of this space associated with specific processors and RAM pools. This can be still complimented with private spaces, but the point is that you still have direct access to a global space, which simplifies code and can improve performance.
To do this, these new languages (and language extensions) use the concept of mapping data/memory to places/domains (representing these pools of cache, RAM and even remote computers) with distribution objects and annotations (e.g. shared/private data). The point of having that extra information is that you can intelligently manage what is executed where. You can ensure that one thread does not go where it's data would be slow to access. Now, these particular languages are focused on supercomputers, but once we hit 8 cores per CPU this sort of system with be necessary, imho. I expect mainstream languages to pick up on it in a few years.