Normally when you compute x = y + z, those 3 variables represent single values.
e.g. 2 + 2, results in 4.
With SIMD, those 3 variables represent arrays.
e.g. [2,7,1] + [2,1,1] results in [4,8,2].
Each instruction is simultaneously executed over a large number of values, so that you get more work done faster.
You want to avoid branching with this kind of architecture, because you end up wasting a lot of your SIMD abilities.
e.g. take the code
if( y > 5 ) x = y; else x = z;If we execute that with our data of y=[2,7,1] and z=[2,1,1], this results in:
if( y > 5 ) [false, true, false] x = y; [N/A, 7, N/A] else [true, false, true] x = z; [2, N/A, 1] //finally x = [2, 7, 1]The GPU has had to execute both the 'if' and the 'else', but ignoring some parts of it's arrays for each branch, and merging the results at the end. This is wasteful -- e.g. say the GPU has the capability to work on 3 bits of data at once, in this example it's only working on 1 or 2 bits of data at once.
The more nested branches you add, the more wasteful this becomes... so those kinds of programs are better of running on regular CPU's (or being redesigned to better suit this style of hardware).
Out of interest's sake, GPUs can generate CPU interrupts and write to arbitrary addresses (which might be mapped to peripherals), but these abilities aren't exposed on PC's (outside of the driver).
In practice, the GPUs don't have any way to communicate with peripherals, just with the CPU
... the GPUs lack key features such as interrupts that are critical to implementing those programs in practice.