By the way, the real reason they had problems was because they just pumped desktop shaders through a translator and expected that to work on mobile GPUs. It's no surprise that it performed badly.
If you want performance you have to hand code the shader specifically targetting the mobile GPU, using low precision, etc. and reading the maufactures recommended practices.
Yes, it's true that some of their inefficiencies were coming from translation - same as this thread. But in general you don't want to be burdened with removing redundant operations, temporaries, constant folds, etc by hand. That's why we created optimizers. These things are easy to do in a machine fashion and difficult/time consuming for developers. Sitting down and hand tuning shaders for each individual platform, particularly when it's stupid stuff like redundant copy removal, is ridiculous.