Messages outgoing buffer, avoiding frequent heap allocations

Started by
14 comments, last by jmp97 15 years, 4 months ago
Hi, I would like to adjust the server loop so that outgoing messages will be queued in the respective client object, so I can do the sending at specific intervals and send up to a defined limit. In order to build the message queue, at some point in the server logic, the message itself is created and then put to client message queue. That could be done in a function where the bytes are allocated on the heap and returned from that function:

unsigned char* createMessageXYZ() {
    unsigned char* msg = new unsigned char[32];
    // assign data to msg
    // ...
    return msg;
}
I am worried about expensive memory allocations, since sending of outgoing messages is going to be a frequent operation. How could I best avoid this or am I worrying too much about this? What I see as possible solutions: - pre-allocate a given number (queue size) of empty byte arrays per client and only fill those up - use a custom memory manager with pre-allocated heap space Any ideas? Thanks!
Advertisement
It depends on how many messages, your overall available memory, type of networking, expected traffic...

Profiler will tell you.

If you intend to keep per-connection message queue, consider using smart pointers, or making your messages ref counted. This way, when multicasting (should be somewhat common), you send one instance to all connections. Reference counted pointers will take care of memory management. This doesn't necessarily play well with pre-allocation, since messages may be of different sizes and will generally not be destroyed in same order they were created.

If you're using UDP and construct packets, pre-allocate the packets to maximum size (MTU, 500-1500 or so bytes). Pre-allocate enough for all on-wire packets. Reasoning here is that there can never be more packets in existence. 200 clients, 1500-byte sized packets, 10 slot on-wire buffer means 2.8Mb storage total.

Ideally, you will need little to no memory to store messages (data gets sent instantly). But in case a network has a hick-up, then worst case means you will need to store them until sent. On a 100Mbit network, this can mean 10Mb per each second you can't send. This can fill your memory quickly.

While this can be considered fatal, it may be a good idea to attempt to evaluate how fast your logic is generating messages, and perhaps control that. In lock-step, you can evaluate the size of changed state after each iteration. If you notice this state is getting too big, either changed state or pending network traffic, you can throttle AI, or perhaps suspend simulation for a few steps to prevent things from blowing up.

But again, without more details on design, or upfront analysis of possible costs, profiler will be by far your best bet.
For my data sending, what I do is first construct the data in a BitStream object, which is then passed to my connection. To avoid creating a new BitStream for every message, I just have a pool of them that allocates as many as it needs (rarely needs more than one).

This BitStream containing the data to send is forwarded to the connection's Send(), which then reads the data from the BitStream into an internal BitStream in the connection. This internal BitStream is used to combine as many messages as possible before sending. It always has the internal size of whatever my constant MaxSendSize is.

When it is time to send, I copy the data from the BitStream to byte array which is used as the buffer for my currently sending data. This, too, is allocated to the MaxSendSize. As new messages come in while a send is in progress, they just end up going to the BitStream.

The only problem this leaves is what if the BitStream overflows? To handle this, I check if I can fit the new message into the internal BitStream before packing the message into it. If I can fit it, no problem - just go on like above. If I can not fit it, I have a separate Queue<byte[]> just in case. The BitStream's buffer is copied to a new byte array, allocated just large enough to fit all the data (no need to make it MaxSendSize). Before grabbing the data from the BitStream to send, I just check if the queue has any items. If it does, dequeue it, then use that as the buffer.

Only time memory is allocated in any of this is when messages have to be queued, which is only when there is a lot of traffic.

Hope that helps!
NetGore - Open source multiplayer RPG engine
First, measure the cost of the allocations. Chances are, they won't show up on a profiler. The available outgoing bandwidth will be a very small fraction of available main RAM bandwidth for most servers.

Second, if you actually need to optimize memory allocation for network packets/messages, there are two approaches.

One is to keep a pool of allocated messages. When you want to allocate a new message of some size, see if there's a block of that size in the pool. If there isn't, allocate a new block and return it. If it is, then return the old block. When a message is going to be freed, then put it into the pool to be re-used, rather than passing it to free/delete. You can also play games with pre-allocating the pool, and limiting the total amount of outstanding blocks, and even using fixed-size pools, if you can determine your max capacity needs beforehand. For the right implementation, this will be somewhat faster than malloc/free, because it will avoid one layer of indirection (however, malloc/free are generally quite fast these days, and aren't THAT different from the pool implementation above).

The second is to keep a ring buffer of data waiting to be sent to a user. When you want to allocate a new message, you allocate out of this buffer. If you run out of space in this buffer, then you know that the receiving end isn't receiving fast enough, and you typically want to either scale back how fast you're queuing new data, or just kick the client.
enum Bool { True, False, FileNotFound };
hplus0603 is right on the money. Use object pools for fixed size rapid allocations; custom memory managers - with techniques like a small block allocator - are really only useful for allocations that are not a fixed size, but are small and within a common average allocation size. Ring buffers are cool, but really only if you have a good idea of what the ring buffer size should be. If your network traffic grows significantly each time a new client connects than I would avoid it. Ring buffers can be very useful for reducing thread lock contention however (single directional).

Another nice thing about fixed size allocator + object pools is when you know the main initial pool size up front. Knowing this ahead of time allows you to allocate all memory related to the initial pooled objects in a large contiguous block, which of course greatly helps out with fragmentation (in certain instances).

~Graham
Thank you very much for the directions! I am going to experiment and see how it all plays out.
Quote:Original post by Antheus
Ideally, you will need little to no memory to store messages (data gets sent instantly).


Does that mean you wouldn't normally queue data for collective sending but instead issue a send() directly whenever possible (only queue when send isn't possible or application bandwidth limit has been reached)?
I think you will always generate messages before calling send(). If nothing else, then because you want to put multiple messages into a single datagram. However, it is sometimes beneficial to queue a pointer to an object, rather than a message from an object, and when it's time to actually send a packet, call the object back to generate the message into the outgoing buffer. That way, the data you send is the freshest possible.
enum Bool { True, False, FileNotFound };
Ok, got that. Thank you!
In our system there was one area where it was possible to move unreliable message creation onto the stack instead of the heap. The end result led to more copying but less memory allocation, which was a big gain.

This topic is closed to new replies.

Advertisement