I suppose next time I should read your code :)
So, I see you're not explicitly deleting individual elements in a loop, however, you still need to understand that the destructor is being called for each object when you delete the list. For these particular objects, the destructor is simple since you've just got a POD type. But the problem is that you're hitting the built-in memory manager's deallocator 65k times -- since memory is a shared resource, taking or returning memory to the pool can be non-trivial.
So, I take it back. This probably is the kind of problem that a pooled memory manager would benefit you. Conceptually, think of it as allocating a raw array of 65k objects, and then creating shared-pointers to those objects within it. Now, if the destructor for those items does work you still need to call them but you don't need to deallocate the memory -- but if they do no work, then you can just reuse those element-spaces that are dis-used, and deallocate that huge array when you're done with all of them. If you abstract that, and throw in a little book-keeping you end up with a pooled memory manager, though usually you will use smaller chunks.
All that being said, understand that you're allocating 64 MB of data here, in small chunks, and this is what causes you to hit the standard deallocator so much-- and its probably taking locks and crossing into kernel-space every time. That's not cheap.