I wrote a piece of "article" for an answer on stackoverflow about benchmarking, it can be found here:
http://stackoverflow.com/a/25027750
Of course only the first part is of any interest to you.
Otherwise, I'd say you can always predict performance by hand, using a "math" model of the machine, but it is so difficult and tricky that you might as well consider it impossible.
Do do a performance analysis on paper, you'd need to know the exact binary form of the program (op codes, or de-assembled op-codes), then you'll need to know how your cpu is going to treat this, what instructions goes into what pipeline (which out of order execution parallel pipe), and the exact state of the caches, which is hard.
Please consider this article:
http://www.gamedev.net/page/resources/_/technical/general-programming/a-journey-through-the-cpu-pipeline-r3115
The not-crazy-way™ is to measure it. You profile it in situation, or simply time it between start and end. Sampling profilers will give you details about where are the hot spots. Which means usually what function is called the most in your program. Then you can try different implementations for it, and profile again, if you lost time, revert code, if you gained speed, good. I'd say basically that's it.