This is a profiling question, so step 1 is always: take measurements.
For a target of 60Hz, you've got a budget of approximately 16ms (16.66 in reality, but you probably want to keep that extra two-thirds of a milli spare for OS spikes).
First measure how much time each part of your scene is taking to render, and map that out against your budget to see where you're currently at.
Then adjust those numbers to where you'd like to be (in this version, they should sum to 16ms or less! )
I'm not too experienced with Unity, so I'm not sure how good it's profiling tools are here... As an alternative, there's tools from NVidia/AMD/Intel/Microsoft that you may be able to use to gather this kind of data....
As an example, a frame-capture from my game currently looks like this:
The total frame time is 10.585 ms in this capture (~94fps). That's already 60Hz, so let's say I'm trying to get to 120Hz (a budget of ~8ms).
I can now look through each of the passes and objects and look for low-hanging fruit. The bigger something is on the graph, the more impact a micro-optimization to it's shader will make. Even cutting out a single multiplication from a shader can have a big impact if it's being used to draw a million pixels.
The two rectangles right before FXAA are Hud at 0.061ms, and Tonemap at 0.198 ms -- so trying to micro-optimize the Tonemap shader probably isn't worth it.
Seeing tonemapping is taking ~0.2ms, a 10% improvement in that shader would bring me from 94.47fps (10.585ms) up to 94.65fps (10.565ms)
However, inside the GBuffer pass, I can see that there's an object called "Platforms_Beam_13|geometries_94|Substance_Library1.pewter_002" which alone is taking up 2.293ms, which is anomalous compared to the other objects in the scene -- it alone is over half of the GBuffer pass' cost, and I know that it's a tiny little ring at the bottom of a spotlight in the distance! This immediately stands out to me as a problem, so it would go on the top of my list of problems to investigate. If I can eliminate that anomalous object, then I might be able to bump the frame-rate from ~94fps to ~116fps.
After addressing that problem, I might decide that the sky is taking up too much of my budget -- I'm spending ~1.5ms on lighting, but ~3ms on drawing the background. That intuitively seems out of balance. So the next item on my list would be to go over the algorithms being used by the sky renderer -- are there any algorithmic improvements that can change the big-O cost of the pass? After that, can I simply change the amount of work being done as a quality trade-off -- can I render it at a reduced resolution and then up-scale the results? Mixed resolution rendering for low-frequency-detail passes often delivers performance improvements approaching 4x to 16x. After that, I'd look at micro-optimizing the shader code. Can the code be mathematically rearranged to do the same thing with less instructions? Can multiple shaders be joined into one, or can one big shader be split into multiple simpler ones to make them run on the GPU more efficiently? Can I change the texture formats being used to reduce bandwidth? Can the data be packed better? For these micro-optimization tasks, a shader profiler from AMD/NVidia/Intel is a great tool to have, as it can tell you theoretical performance characteristics of a shader just by looking at the code, letting you experiment with these trade-offs quickly.
After that, maybe I've recovered 1.5ms from the sky pass, and 2ms from the gbuffer pass, which would bring me down to 8ms / 120Hz
I'd then have to repeat this work for many different scenes and view-points within the game, to ensure that the different passes remain within their budgets at all times. If a particular level causes the renderer to exceed it's budgets, then you can either repeat this work of optimizing the renderer, or work alongside the content creators to help them optimize their level by removing/rearranging object/effect placements.
[edit] Everything I've mentioned above us focusing on GPU time per frame.
It's important though to really understand the difference between GPU frametime and CPU frametime. You need tools that can measure both of them independently from each other. Whichever one is higher, is your "bottleneck". e.g maybe your GPU frametime is 8ms, meaning it could be running at 120Hz, but, your CPU frametime is 50ms, meaning that you're stuck at 20Hz :(
Whichever processor is the bottleneck, is the one that you should optimize for first. Usually if the GPU is the bottleneck, then the CPU will spend a lot of time idling inside a function like SwapBuffers, Flip, or Present -- which is where it waits for the GPU to finish the previous frame's commands.
e.g. another shot from my engine -- the rendering thread on the CPU submits a whole load of draw-calls (the rainbow of stripes), and then gets stuck inside the Present function, indicating that the GPU is running too slow, causing the CPU to wait for it to catch up.
If the CPU is actually your bottleneck, then micro-optimizing shaders is useless. Instead, as frob mentions below, you probably need to look into reducing batch counts to reduce the amount of work that the CPU rendering thread is requried to perform.