I also got major performance improvements from switching my broad-phase collision algorithm. I was previously using a modification of the sweep and prune implementation that came with the Farseer Physics Engine. In the way I was using sweep and prune, objects were being sorted in an array by their lower and upper boundaries along the X axis.
In my specific case I have a lot of objects, but most are static (such as the blocks and obstacles), so the array processing got expensive on more complex levels.
I switched to a grid-based collision algorithm, where all static objects are placed in a two-dimensional matrix where each position corresponds to a sub-square of the level, containing the list of static objects that are present in that sub-square. An object is placed in every sub-square from his lower (x,y) boundary to his upper (x,y) boundary, as depending on the object's dimensions it can belong to more than one sub-square. The non-static objects are kept in a list apart, and collisions are only checked between each non-static object and the static objects in the sub-squares the non-static object is currently located in.
Right now the collisions between each pair of non-static objects are all being checked, since I have a small number of them, but these can be optimized apart.