C# and Large Object Heap lack of compaction workaround

Started by
15 comments, last by Antheus 15 years, 3 months ago
I am creating a server using the TcpClient sockets. It needs to be able to handle receiving data from 1 byte to about 10MB. The data it receives then needs to be re-processed in-memory. So, I have a buffer to which async calls are made and it is filled up. But, once I hit the 85K or so for the array, it gets promoted to LOH. I can only imagine how fragmented the memory will become when you have 100s of clients sending 100K, 213K, 1Meg, etc of data. My idea was this: array of arrays. Have an array of byte array that is 65K in size. I tried to google about this and can't figure out if that would work. If I had a List<byte[]>. Then say I create 65K list entries of byte[65535]. That would give me 4GB (theoretically). When I free that memory (clear the list and remove all references to the arrays), will GC release all the memory and compact it since the list is not larger than 65K and each of the byte arrays are not greater than 65K? Or does GC do something weird and promote it to LOH even though each individual item is less than 85K? I know I won't have 4GB, but having a client send me few megs will be often.
Advertisement
Have you actually profiled and determined there is a real performance impact here? Trying to outsmart the GC usually results in serious performance problems unless you're extremely careful (not to mention that the various flavors of the .NET GC alone will react massively differently to these sorts of shenanigans).

If you don't have documented performance problems you can trace to this sort of thing, don't worry about it. Doing things prematurely here will bite you hard.

That said, objects go on the LOH based on their size, not their size plus the size of their children. Your method will probably cause the objects to avoid the LOH, but of course the price you pay is that those objects aren't on the LOH -- which is to say, instead of fragmentation, now you have the GC spending massive amounts of time moving all that crap around. That is just as likely to cause performance problems as fragmentation in the LOH, which is often not so big a deal since the LOH may be otherwise relatively empty.

Also note that clearing lists and discarding all references to a given object will not cause the GC to collect it. It will only (potentially) make it eligible for collection. The GC will not collect it until it decides it wants to. For example, if the object were not on the LOH, it may still be in gen2, meaning it won't get collected for a while since gen2 collections are rare. The LOH is collected at the same time gen2 is collected, furthermore. (Assuming the workstation .NET GC.)
I understand the idea of pre-mature optimization. This is more to protect myself from worst case scenario. Most of my data transfered will be in few hundred bytes. 80/20 rule. But, I don't want to have an "egg" on my face when customers call in to say my service can't run 4 days without throwing out of memory errors on their 8GB systems.

I've encountered LOH issue already on services that have less demand. But I'm not an expert and it seems that my proposed workaround *might* work as intended, even if it's ugly.

I'm not worried about speed (GC collecting and compacting cause a stutter here and there) as much as I'm concerned about service crashing due to OOM exceptions.
If all you want to do is avoid running out of memory, just preallocate the buffer up front and fill/"flush" it as needed (never actually changing its size), perhaps in a rotary fashion, and never discard the reference. That way it can't be collected. Whether or not that buffer ends up on the LOH should be irrelevant.
But I have many clients. Say there are 80 clients connected. They each at some pointed decided to upload 30MB file. There are 80 buffers, 1 for each member. Now, I allocate 30MB buffer per person, 80 people, 2.4GB space.

The example I'm worried about is 400K, 3MB, 890K, 200K, 6MB, etc. And then these buffers being freed at various stages. Clients disconnecting. New ones connecting. Soon enough, you won't have LOH space for 400K.

I'm using Async methods so multiple clients can be uploaded large files.

So far, I have come up with a solution that takes into account 3 scenarios:

1. Header + 16 byte data. This is used if you want to send a date, integer, guid to the server, that's it. Server reads 32 bytes, 16 for header, 16 for data.

2. Header + up to 65K data: When you need to send an object or list of objects that comprise 17 - 65K data. First 32 bytes are read as a header, then up to 65K is filled up by subsequent calls to BeginRead/EndRead.

3. Header + 65K+: This will use my method of having a list of up to 65K entries of byte arrays that contain 65K bytes. Here is where I need to think of how to do it. I might even consider parsing out 65K worth of data on the fly and only have one 65K buffer.

Say I send an array of 300 Foo objects. Each objects takes about 1K of space. That is 300K of data. But, I can parse out about 65 of them, de-serialize the data, and flush the buffer. I just need to watch out for when the last object is partially in the buffer at the end, I need to copy the partial data to the beginning of the buffer and continue filling up the buffer again.

This way, no need for the array of buffers. But, it is more work. Also, what if a single object takes up more than 65K... that is another edge case to worry about.

I need this to be fast and not crash. If edge case of 1GB of data comes along, i'll be fine with some performance decrease, but I won't be fine with a crash.
Quote:Original post by azherdev
But I have many clients. Say there are 80 clients connected. They each at some pointed decided to upload 30MB file. There are 80 buffers, 1 for each member. Now, I allocate 30MB buffer per person, 80 people, 2.4GB space.

The example I'm worried about is 400K, 3MB, 890K, 200K, 6MB, etc. And then these buffers being freed at various stages. Clients disconnecting. New ones connecting. Soon enough, you won't have LOH space for 400K.


When you receive a file, you read from socket buffer and write to disk. To do this, you need 64k per client. To add some extra room, let's say it's 256k per client to compensate for slow disk writes, which means for 80 clients you end up with 20Mb RAM.

There's only two scenarios here:
1) You can either process the data as it arrives in streaming manner - in which case you can skip the "write to disk" part.
2) You need to receive entire file before you can process it. In this case, you write it to disk. Since per-client bandwidth will be considerably less than that of disk, there is no point to keep the data in memory for 30 minutes while the client is trickling it to you.


Look at torrent clients. They're serving gigabytes to thousands of clients - but use 20Mb of total memory.
I agree with Antheus. Write to disk if you have such wildly varying requirements and speed is not one of them.

Also, if you're worried about performance, test it! write a random allocator that allocates amounts in the ranges you expect and profile it to see how bad performance is or how long you can run it till you run OOM. If speed is not an issue call GC.Collect() when you free up one of the memory blocks.

Oh, if you expect to run out of memory, handle the exception! Send a "Please retry later" message.
Have you considered pooling the buffers as a workaround? This way, discarded buffers will be reused as needed and you'll avoid holes in the LOH.

You can also take it a step further and split up the data in smaller buffers (e.g. 16KB, where 100KB of data are stored in 7 buffers). This way you can avoid the LOH completely and the buffer pool becomes easier to implement.

[OpenTK: C# OpenGL 4.4, OpenGL ES 3.0 and OpenAL 1.1. Now with Linux/KMS support!]

Quote:
But I have many clients. Say there are 80 clients connected. They each at some pointed decided to upload 30MB file. There are 80 buffers, 1 for each member. Now, I allocate 30MB buffer per person, 80 people, 2.4GB space.

The example I'm worried about is 400K, 3MB, 890K, 200K, 6MB, etc. And then these buffers being freed at various stages. Clients disconnecting. New ones connecting. Soon enough, you won't have LOH space for 400K.

The LOH is not compacting like the generational portions of the heap, but it is collected.

You seem to be worried about the case where enough clients spam enough data that you'd exhaust the memory available to your process. Antheus's "page to disk" suggestion is a great way to deal with this: my follow-up point was that LOH or not, exhausting memory is exhausting memory. Various buffering schemes (pre-allocate-up-front-and-never-release, reuse-small-circular-buffers, et cetera) can be used to alter the conditions under which you are exhausted, but can't prevent it -- in other words you should be worrying about the schema and in-code-usage pattern of your buffers, not necessarily about keeping them off the LOH.

But paging out to disk is a pretty good solution. Especially since clients running the problem can always buy more hard drives on the off chance that ever becomes a problem.
Quote:Original post by jpetrie
Quote:
But I have many clients. Say there are 80 clients connected. They each at some pointed decided to upload 30MB file. There are 80 buffers, 1 for each member. Now, I allocate 30MB buffer per person, 80 people, 2.4GB space.

The example I'm worried about is 400K, 3MB, 890K, 200K, 6MB, etc. And then these buffers being freed at various stages. Clients disconnecting. New ones connecting. Soon enough, you won't have LOH space for 400K.

The LOH is not compacting like the generational portions of the heap, but it is collected.

You seem to be worried about the case where enough clients spam enough data that you'd exhaust the memory available to your process. [...]


The problem here is not memory exhaustion, but memory fragmentation. The indicated usage scenario will sooner or later cause enough fragmentation that the LOH won't be able to allocate a contiguous region of 400K - this will result in an out of memory exception even if enough memory *is* available.

The solution is pooling or pre-allocating the buffers.

[OpenTK: C# OpenGL 4.4, OpenGL ES 3.0 and OpenAL 1.1. Now with Linux/KMS support!]

This topic is closed to new replies.

Advertisement