If you really want to use those vector instructions and be guaranteed that they are being used, you need to use intrinsics.
The auto-vectorization provided by compilers, as Ravyne mentioned, should be viewed as a bonus if it occurs. If you must get vector instructions it's far simpler to just write the intrinsics yourself than to go through strange hoops to write C/C++ code that the auto-vectorizer will accept, which is a totally backwards (and unreliable) way of getting these instructions to be emitted. Additionally, I've found that it's just annoying to try to get the compiler to realize things that I know is true for my code (such as alignment or aliasing). Intrinsics are the way to go here.
I tried compiling your code on g++ 4.8.4 with the options you give (except c++14, since my compiler is not new enough) and I get identical assembly for both functions and the assembly is in the aligned form with no prologue/epilogue to deal with unaligned values.
I'm generating all of my mesh data procedurally and I'm by necessity dealing with either islands or holes. The reasons are a bit complicated, but in a nutshell this is what I'm up against:
just to save a few nerve cells I gave up my own triangulation code and adopted the Triangle library, mainly because since it supports CDT (constrained Deulaunay triangulation) and is notably faster than my own ear clipping code. However, it dies when I feed it concave polygons with interior edges that wrap around islands. As a result I'll have to purge the interior edges, convert the islands into separate holes and rely on Triangle to feed me back the concave hull after it's done. The problem is that so far I haven't been able to gouge a proper set of hull segments from it after refinement. That is, simple triangulation, which produces fewer but poorly graded triangles works fine, since I already have my hull and the initial triangulation doesn't compromise it. But refinement can add vertices to the interior and the edges, which means I will no longer have a valid nav mesh to work off of. As if interior vertices weren't hard enough to deal with... Of course I could shrink the nav mesh for all agent sizes and then triangulate it, but that seems like an awful waste of resources, especially since I want the nav mesh generation code to be viable in near real-time (on the order of ~10-20 ms per cell).
Can you show me an image to describe what's going on? Something doesn't vibe with me very well with what you've described, but I'm not sure I'm fully understanding your situation. For instance, I don't understand what you mean by "concave polygons with interior edges that wrap around islands". If my recollection of CDT is correct, it shouldn't make a difference if you have a concave polygon or not.
PS - reading around on the net seems to suggest that shrinking the nav mesh for each agent size is the simplest way to go, but Demyen and this guy seem to try it the harder way. Does anyone have a comment on what the performance impact is like in practice?
I tried to implement Demyen's approach but I gave up after trying for a really long time to get all the details just right. Although I haven't tried it myself, I suspect that shrinking the nav mesh would indeed be a much simpler approach, and if you can afford it, it might be best to have a few different nav meshes tuned for particular sizes.
As for the other person (http://www.koffeebird.com/2014/05/towards-modified-simple-stupid-funnel.html), I do exactly this approach on my game and if you can operate under the same assumptions, namely that the portals you have are guaranteed for the agent of radius R, it works great (at least I think so)! Performance is absolutely a non-issue for me. On very long paths, my entire string pulling + tangent to circle computation takes on the order of ~50000 cycles which is around 0.02 ms on my machine. The A* search takes far, far longer. The A* search alone to generate that path could take me 0.25 to 0.5 ms.
You have far bigger problems to worry about when it comes to performance in pathfinding. The A* search (if you're using it) will be the lion's share of the time spent.
Ah ha! Yes, I worked through your example and I think what you were referring to is when you get to this case:
And it had me stumped for a second until I remembered that you need to treat the destination point as a final portal to process. It's not obvious in Mikko's explanation and in most other explanations, but it's mentioned in Jeff Erickson's description of the funnel algorithm:
When the funnel has reached the last edge in X ̄ , we compute the shortest path from π(0) to π(1) in the sleeve by treating π(1) as a triangle vertex and extending the funnel one last time. Thus, assuming the polygon is already triangulated, the overall time to compute the shortest path is O(x + x ̄ ) = O(x) = O(nk).
In my own code, I create dummy portals for the start and end positions where the left and right endpoints of the portals are the same as the positions themselves. This should fall into place nicely with the rest of the algorithm as long as your cross product comparisons are using >= or <= (not the strictly < or >).