There's no attachment so I'll just list typical problems:
DirectX SDK comes with Windows SDK, which comes with Visual Studio. This means you don't need to install SDK separately. Visual Studio is all you need to use DirectX. If you have old SDK you will have to remove it;
Microsoft got rid of D3DX, which had texture loading, meshes, and a lot of other stuff. Therefore you have to implement these yourself and old samples don't work anymore. There's DirectX Tool Kit but I think you'll still have to make some changes to make D3DX code work;
DXUT is gone as well, which heavily relied on D3DX and been used in a lot code samples;
Math API has been updated, so old D3DX math functions don't work. See DirectXMath.h include and MSDN.
If you pass pointer you'll have to load value from memory by pointer, however if you pass your vector by value there's a good chance it will be passed in register and you won't need to access memory, which is way more faster and efficient.
Yes, operator is a little slower, especially if you have exceptions/etc turned on; it checks whether range is valid, etc. However your loop isn't big enough to make a any noticeable difference.
std::clock() has horrible accuracy. Even if there is performance difference you won't notice until it's huge or it'll round wrong way giving invalid data, ex. execution time can increase by 1% by rounding will change from 16ms to 32ms. Pointers aren't "faster" or "slower", however when misused they definitely can be slower because of cache misses.
I have suspicions that your handConflictsBoard() function passes values instead of references, thus copying a lot of data when it's not necessary which is likely source of your slowdown.
If you want to make your code faster first you need to figure which part is the slowest and then see how you can improve it. Optimizing random code parts isn't likely to give significant difference.
brute force, just iterate over the set of renderables and cull them against the view frustum. Even better, pack the bounding boxes with handles back to the owner into an array and iterate that much more cache efficient, minimal branching (in fact, you can delay branching till the end in most cases).
To backup Washu I can say I've done thesis on view frustum culling and bruteforce method culls 340,000 AABBs in 1 millisecond. If this isn't sufficient you might have different problems.
You'd need 2 transformation matrices per situation. Even if you do accuracy of 1 degree you get 360 * 360 * 360 = 46,656,000 combinations. Add to that matrix size of 16 * 4 * 2 = 128 and it's suddenly 6 gigabytes of data and you still haven't taken into account translation or scaling.