Some popular ones that haven't been mentioned are Page Allocators, Slab Allocators, and Paged Pool Allocators. Another really simple one that isn't a heap replacement but is handy on the stack is Alloca. However, it should never be used in a loop, recursive function, or with an allocation request that will blow up the stack. Some temporary memory allocators will use a statically sized array first, then alloca, and then go to the heap if needed for "scratch" memory. One way of going about this is described here.
If the end goal is memory performance (and it should be if you're writing allocators) I cannot agree enough with what Hodgman is saying here. malloc is a general purpose allocator that has to fulfill allocations of all different sizes and alignments. It also has to do something called a context switch and these are ridiculously expensive. However, it's still primarily about cache misses at the end of the day.
The goal should be to make fewer allocations, with tightly packed data that is accessed contiguously. Find out what a struct of arrays is, what prefetching does and how to use it responsibly, why casting a float to an int is expensive, what the restrict keyword is, and how this also applies to instructions. Custom memory allocators are a small slice of a much bigger problem.