I just wanted to share what I came up with after carefully examining your input:
Messages are not fixed size so I cannot do fixed size object pre-allocations.
I went with a system that works a bit like <vector> such that allocations happen in chunks with some well-sized initial allocated space. The reasoning is that by choosing the initial allocation size with care, allocations for more chunks will happen only at rare occasions, memory will not have to be freed (unless after some timeout additionally allocated memory has not been used), the allocated bytes can be used directly for sending (no byte copy), and memory is mostly contiguous.
Each client maintains a byte buffer manager. The byte buffer manager merely holds a list of buffer arrays (char[]) which have a fixed size. Initially, there is only one pre-allocated buffer array in the buffer manager.
Should the first buffer array fill up a new one is allocated and added to the list.
The byte buffer manager maintains a read and write cursor and exposes functions to move either:
char* BufferManager::requestBytes(unsigned int numBytes)
Checks if the number of bytes in the current buffer array - counting from the write cursor position - is available. A pointer to the buffer array is returned for writing and the caller will have to make sure not to write more bytes than what was requested from the buffer. The write cursor is moved after the last requested byte. If there is need to allocate a new buffer array, a new one is allocated and the pointer is returned from this new buffer array.
This is transparent to the caller.
unsigned int BufferManager::nextDataSize();
Returns the number of bytes which can be used fo sending in one go.
char* BufferManager::getReadCursor()
Returns a pointer for sending the bytes in the buffer.
char* BufferManager::shiftReadCursor(int numBytes)
Shifts the read cursor for the given number of bytes, transparently using the next buffer arrays in the list if necessary. Returns a pointer for sending the bytes in the buffer at the newly determined read cursor position. When the cursor is set such that there are no more remaining bytes after the cursor, write and read cursor are both reset to the first byte of the first buffer array.
Some space may be wasted when a requested byte chunk does not fit onto the end of one buffer array, but if the buffer array sizes are carefully I assume that to be negligible.
Another problem is that for those parts which have already been sent (read cursor was shifted away from position 0), writing to the beginning of the buffer to use these free bytes is not possible, only wrting at the current write cursor position up to the end.
Thus, if the maximum number of byte arrays is filled up, but some of the buffer has been read (read cursor pos > 0) the buffer manager will not "wrap" to write to the beginning of the buffer up to (read cursor pos - 1), like a circular buffer. I figured that when this condition occurs sending to the client was too slow for a given time and I would just disconnect.
This was mostly done as an exercise to collect some profiling results and experiment with pre-allocation. I hope this can serve as inspiration for others.