Parallel For Loops
One of the big promises of the Epoch language is the ability to automatically move code around to various bits of host hardware. For example, suppose I write an app that relies on certain GPGPU logic, such as image filtering. Then, someone runs the app on a machine that lacks capable GPU hardware. Epoch programs should transparently relocate the GPGPU code over to the primary CPU, and do things like attempt to vectorize loops and such.
In other words, "write once, run everywhere. No, seriously - everywhere." Except Epoch doesn't suck like Java does [grin]
To provide this degree of flexibility, the Epoch VM obviously needs a serious arsenal of CPU-side parallelization tricks. It's no good to write solid, parallel code and then have it run in a single CPU thread.
A prime example is the "parallel for" concept, where a given set of calculations can be performed in parallel. In a traditional setting, you might see these calculations simply run in serial, in a single thread. The parallel-for construct allows you to split up that loop into chunks, and then feed each chunk to a worker thread to do the actual computations.
As I write this, I'm finishing up the polishing touches on Epoch's very own parallelfor loop. It's taken a couple of hours to really get all the semantics right, but the actual process of adding the control structure was surprisingly easy, albeit time consuming. This gives me a lot of hope for future expansions to the Epoch parallelization repertoire.
Of course, with Epoch, the big news right now is Release 9; as I've mentioned before I plan to debut R9 at GDC'10 this year. (Don't worry, I'll post the release package on the project site the same day [smile])
That leaves me with scant few hours to finish up the release package. I'm down to evenings and potentially a small chunk of time on Saturday, and then Sunday afternoon I leave for San Francisco. Nothing like a little bit of pressure to keep you on your toes...
The only really significant chunk of work left is to add the CPU failover logic so that when a suitable GPU is not present, the CUDA extension defers to standard CPU execution. This is slightly important because my demo machine (aka. my notebook) doesn't have a CUDA-ready GPU. It'd kind of look bad to present the project and show it failing to work correctly [grin]
After that, it's down to lots of small detail work; getting the release ready is a fairly involved process, as I'm doing my best not to release totally broken code. Unfortunately, many of these tasks are hard to predict and plan around, so I have no idea at this point if I'll be able to hit my desired R9 deadline.
But, hey, you can sleep when you're dead, right?