I don't know which platform, so I'll talk about Win32.
On larger scale, contiguous memory block in virtual memory might not be contiguous in physical memory. The OS might reorder, and defragment the memory pages.
Reallocations causes fragmentation. I have personally used custom static_vector class which has dynamic size and static capacity. A hybrid of std::vector and std::array, where there are no allocations when you know the maximum size.
I did some experiments years ago to optimize memory management for multithreading. Goal was to have thread specific memory management without locking, and then fall back to locked shared blocks.
My approach was to create few power of two sized pages. 8kb, 64kb, 512kb, and dynamic size. Pages were put in lists of full, used and free. Memory manager would search the correct pages. I made many custom static sized containers to manage the block reservations: eg a memory_block_64 class size of 64bit (bit per index), to indicate which 4 byte blocks were allocated. And a lot of these would fit to the 32kb cache line at once. They were the structure for the memory manager finding the exact location of the allocation.
Benchmarked versus std new/delete. End result was major boost in allocations with some std containers, but deallocations were slower than std. It was my conclusion that the effort to beat the standard new/delete in performance was too high. More so if it was a team project. It would introduce very hard to find bugs, even for small program. And for a big project I dread the thought tracking a bug from deallocation to its origin.
I don't want to discourage, because I too find it fun to optimize cache lines and allocations, but just a warning it might not be productive in grand scheme.