Something else that may help is if you trade space for speed, and allocate pre-sized blocks of ints, say 32 or 64, and keep all blocks the same size. If objects request a number of blocks of memory and they handle them internally it becomes a simple problem. It would take more size of course because there'd be unused numbers, but its a thought.
(if I understand correctly) this is basically what I had meant by a slab allocator.
a slab allocator can be fast, but also reduce memory use by allowing multiple objects to share the same object-header (all of them will implicitly have the same logical-type and share other details).
for 4 byte items, you would allocate space for 256 items (1KiB), and then add them to a free-list of 4 byte items.
for 8 or 16 byte items, it would be similar, just allocating 2KiB or 4KiB instead.
then, for a range of items, you would use a number of free-lists of fixed-size members, say, using something like:
to map the size to a specific free-list.
while this isn't perfect, allocations/frees can generally be made pretty fast (in the case where the free-lists aren't empty).
allocation in this case basically looks like:
... fall back to general-case allocation ...
faster is possible via a few tricks, namely having functions hard-coded for a specific allocation/free size, and using separate per-thread contexts (to avoid a need for locking during free-list allocations). for a general-case allocator, it is usually necessary to use a lock to prevent threads from stomping on each other (and for operations which work with or update the heap).
another trick here is to use pre-made "allocation handles", where one first creates a handle-object describing the type of allocation (size, type, flags, ...), and then does allocations/frees using this handle. internally, the handle may contain function-pointers for the specific allocation/free logic, and some data members that may be used for special free-lists or other things.
these may or may not use slabs, but will typically use a pool-allocation strategy (where dedicated free-lists exist for specific types of object, ...).
this may not necessarily apply to "general" code, but in my case code sometimes pops up which requires high-speed allocation/freeing of typically-small objects (often with specific type-information and special semantics/behaviors).
likewise, these large numbers of small objects comprise a big part of the heap (there are often millions of them in dumped memory-stats, 16-64 byte objects comprising the vast majority of these), vs the relatively small number of significantly-larger allocations (thousands of kB range objects, and tens of MB-range objects).
keeping the object-header small is generally also a goal.
currently, my MM/GC uses a 16-byte header, which basically holds:
the object type-ID (objects in my main MM each have a qualified type, externally represented as a string giving a type-name);
the object allocation size;
data about where the allocation-call was made (useful for debugging);
reference-counts (for ref-counted data);
some flags and other things;
a check-value (used to help detect if the header was corrupted, say by an overrun, *).
a lot of this is basically bit-packed, with a few special header sub-variants (such as for objects currently in the free-list, ...).
it was originally an 8-byte header, but was expanded to 16 partly as:
I wanted to support larger objects (the same header is used by large-objects as well, and would otherwise limit allocations in 64-bit apps to 256MB);
needed space for additional metadata;
many objects require 16-byte alignment anyways (for SSE/etc);
*: detecting corrupt headers is useful for things like heap forensics, where the MM may detect that a header is corrupt and try to dump diagnostic information and try to detect which memory object overflowed or underflowed, and try to make a guess as to where in the codebase to start looking for the offending code, in addition to raising an exception. also, we may free an object and it helps the MM to detect if a given object overran its bounds without requiring additional memory for additional check-values.
though, say, using 12-bits for a check-value, there is a 1/4096 chance for trashed-headers to escape notice, but this can be reduced some by use of additional "sanity checks" (example: does the size in the header fall within the range of what could fit within the given memory-cells and is appropriate for this part of the heap; is the object type-ID and source-ID valid?), but this needs to be balanced with performance (leaving it more as something for heap-analysis).