You should definitely profile. Optimizing without a profiler is just hopeless and foolish for all but the simplest of programs. What profiler to use depends on your system. I haven't used Visual Studio in quite some time, but if you're using that then you may be able to use the Performance Analyzer.
Another thing you could do is to add timers to your code; enclose pieces of code you want to time so you can figure out how long that portion took. Be sure to also take note of the total frame time as well so you can see how much time that portion takes up in relation to the entire frame.
If you want a poor man's sampling profiler:
http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024
The profiler will tell you what portion of the code to focus your attention on, but it doesn't really tell you anything about why something is slow. You need to at least have an understanding of algorithms and computer architecture (probably also a little dash of operating systems) to be able to know what that "why" is. Typically, people inspect the algorithm first, since that usually yields the largest gains for relatively minimal effort. A terrible algorithm replaced with a good one can yield huge gains, especially as problem sizes increase. But once you're at a fast algorithm, you may be stuck at a wall that's limited by your particular implementation. This is where computer architecture knowledge often becomes useful. People then proceed to optimize out slow instruction sequences with fast ones and also rearrange data to allow for faster access. Sometimes, people flat out "cheat" because they know something specific about the problem and can precompute things and start some computation further along because of those precomputed results.
The top answer to this question gives a pretty good account of how optimizations usually go: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort