Maybe he meant 680?
It's a real problem. I don't understand what local and global groups are and how i should choose such values for the best performance. I've tried to set local group into different values but i've had an error.
While you are telling OpenCL to launch 0x4000000 threads you are telling
it that each work group consists of a single thread, which means you
are wasting a vast amount of GPU resources as it will be launching
'preferred_work_group_size_multiple * 0x4000000' warps or wave fronts
but only using one thread in each one.