A couple of changes had to be made the most import of which being I could no longer directly map and write to video memory when calculating the normals. This is because the pointer to the VBO only exists in the main thread, so when one of the others tries to write to it... BANG! it dies.
This was fixed by using a chuck of memory allocated on startup to hold the results and then doing a straight upload to the gfx card.
The other "issue" was I'd preallocated a bunch of vector3 objects outside of the loop, in single thread mode this saves me suffering the minor construction cost each loop, however in threaded mode they kinda collide it seems and things go a bit wonky, so I had to move them inside the loop which was being threaded.
Based on some visual assessment it seems to have increased things by about 2x speed wise for the CPU only version, I kinda wish I had access to a quad core system now to see if that increase holds [sad]
The CPU/GPU version did show some speed increase but as the major slow down was normal calculation the difference wasn't really that important, certainly considering the lack of normal correction.
Still haven't fixed the normal issue, however as that won't effect speed, only screen shots, I'm not that bothered right now (same goes for lighting).