# using erlang as the middleman for a ultra-low latency distributed system

## Recommended Posts

phsan    100
I am about to write a distributed system using C++/Linux (ultra-low latency is one of the mandates). I worked on a distributed system before. It does its own data marshalling by the sender app, send it over the wire to the receiver app on a different machine, and de-marshalled by the receiver app. The data are C++ objects.

I can do the same thing. However, I am asking for opinions about using erlang as the middleman to send data between the sender and receiver. I am new to erlang. The idea is that sender will call erlang API to send data to an erlang instance on the same machine. That erlang instance then sends the data to the erlang instance on the receiver app's machine, the receiver app gets the data from this instance through erlang API as well.

I am wondering if
1) this design makes sense in an ultra-low latency environment?
2) more importantly, introducing erlang as the middleman, as supposed to the data marshalling and de-marshalling by the sender and receiver themselves, would introduce significant latency (>1 milli-second)?
3) any better idea?

Thanks

##### Share on other sites
Antheus    2409
[quote name='phsan' timestamp='1307218726' post='4819509']
1) this design makes sense in an ultra-low latency environment?[/quote]

How "ultra-low" and with what characteristics?

The lowest I've seen is microsecond latency running on hard real-time systems with dedicated hardware. Most of such systems are PLCs or VHDL or similar. On commodity hardware 1000Hz resolution is next to impossible to guarantee, but can be achieved fairly easily on average.

##### Share on other sites
phsan    100
[quote name='Antheus' timestamp='1307223626' post='4819536']
[quote name='phsan' timestamp='1307218726' post='4819509']
1) this design makes sense in an ultra-low latency environment?[/quote]

How "ultra-low" and with what characteristics?

The lowest I've seen is microsecond latency running on hard real-time systems with dedicated hardware. Most of such systems are PLCs or VHDL or similar. On commodity hardware 1000Hz resolution is next to impossible to guarantee, but can be achieved fairly easily on average.
[/quote]

I am hoping the data transfer from one machine to another on the same LAN can be done within 0.1 milli-second.

##### Share on other sites
hplus0603    11356
[quote name='phsan' timestamp='1307224003' post='4819538']
I am hoping the data transfer from one machine to another on the same LAN can be done within 0.1 milli-second.
[/quote]

Unless you pay a lot for some very custom hardware, that's not really a reasonable hope. You should try to code up a prototype of what you're trying to do, and actually measure it.

Of course, how you measure timing at that resolution is another interesting problem to solve. And if you can't even measure it, does it really matter? :-)

Note that Linux, in hardware, will typically give you a scheduling jitter of at least 3 milliseconds, and up to 10 milliseconds possible, because of the 100 Hz pre-emption tick rate. Windows Server is even worse. If you really want 0.1 millisecond worst-case latencies, you will need some quite custom software (and perhaps hardware, too).

Why do you need these particular latencies? What is the actual application you're trying to build or support?

##### Share on other sites
ApochPiQ    23061
Think about it logically.

Anything Erlang does to serialize and deserialize its data across the network is going to have on the same order of magnitude an overhead as a properly done custom implementation. However, Erlang also has to support what is likely a much richer set of features and capabilities than your custom system; therefore, it is likely to have [i]more [/i]overhead on the average than a carefully tuned custom implementation. Unless you want to rewrite portions of the Erlang infrastructure, you won't be able to customize it to remove unnecessary overhead or optimize for common conditions in your particular situation.

For extreme performance, custom implementations are virtually always necessary. Now, if prototyping and getting a running version for measurement ASAP is your true goal, then by all means use any off-the-shelf intermediate components you can. But if your true demand is for incredibly low latency and high throughput, you'll have to resort to doing a lot of things yourself - possibly including custom hardware and/or a custom RTOS, as has been mentioned.

Linux and Windows (especially Windows) are not designed for realtime, minimal-latency, high-throughput systems. They can do some incredible feats if you handle them carefully, but for the most part anything that truly needs to exhibit numbers like you're talking about is going to need some heftier firepower.

##### Share on other sites
phsan    100
[quote name='ApochPiQ' timestamp='1307313999' post='4819887']
Think about it logically.

Anything Erlang does to serialize and deserialize its data across the network is going to have on the same order of magnitude an overhead as a properly done custom implementation. However, Erlang also has to support what is likely a much richer set of features and capabilities than your custom system; therefore, it is likely to have [i]more [/i]overhead on the average than a carefully tuned custom implementation. Unless you want to rewrite portions of the Erlang infrastructure, you won't be able to customize it to remove unnecessary overhead or optimize for common conditions in your particular situation.

For extreme performance, custom implementations are virtually always necessary. Now, if prototyping and getting a running version for measurement ASAP is your true goal, then by all means use any off-the-shelf intermediate components you can. But if your true demand is for incredibly low latency and high throughput, you'll have to resort to doing a lot of things yourself - possibly including custom hardware and/or a custom RTOS, as has been mentioned.

Linux and Windows (especially Windows) are not designed for realtime, minimal-latency, high-throughput systems. They can do some incredible feats if you handle them carefully, but for the most part anything that truly needs to exhibit numbers like you're talking about is going to need some heftier firepower.
[/quote]

Agreed. This is what I thought. I just try to get a measure, from whoever has done and measured, how much overhead it might occur, so that I can know what to expect using erlang.

##### Share on other sites
frob    44971
[quote name='phsan' timestamp='1307224003' post='4819538']
I am hoping the data transfer from one machine to another on the same LAN can be done within 0.1 milli-second.
[/quote]

The equipment exists. 100 microsecond latency is reasonable for a (very modern) high-capacity, high-volume data center.

It generally falls into the category of "If you have to ask, you can't afford it."

In a game development forum, the big reason to have such a high speed system is for an MMO system or a fairly large not-quite-MMO multiplayer hub.

The first two big problems are that the cards themselves add a lot of time, and the Ethernet protocol itself adds a lot of time. Rather than Ethernet you'll want to use RDMA or a similar protocol designed for higher networking speed. A network with serious communications will require proper cards and switchs to handle it. You'll need to be in the 10Gbit or 20Gbit Ethernet range if you want to handle collisions and internal latency fast enough.

After that you have the latency inherent in trasfer time across the wire, transfer time across the computer's bus, and other wiring. Since this is generally kept in a rack mount or two, wiring distance shouldn't be too bad.

When you need to potentially go through the local NIC with 10-microsecond latency, two or three or five switches with 5-microsecond latency, and then the destination NIC, plus time in through the wire, you're quickly approaching the 50-microsecond border. By the time you add in the local computer's bus and the OS overhead you're very near your time limit. These machines are generally built for enterprise data warehouses.

The next big problem is keeping the equipment fed. You'll generally need some serious processing power and high speed storage if you want to keep it fed. The hardware is similar to what some people use in high-end gaming machines, except for the video card and NIC.

You can buy pre-built clusters that support that kind of speed. Oracle sells them pre-built if you are building a data center, Cisco does as well. Be prepared for a quarter million or so for your "small" server system as they require serious cooling and a reasonable power backup system.

If all you are looking for is two or three machines and not a pre-built cluster you can do that too.

A low latency (roughly 5-microseconds per kilobit) 20Gbit switch from Cisco will cost about $6000 for a small one. If you have any serious needs you are looking in the$10K+ range. Then you'll need various server machines that can handle it; low latency NIC cards alone are going to run about $600 each for the cheap refurbished cards, appropriate motherboards and memory and other equipment will handle it will also put each machine into the$5K+ range. You'll want them rack mounted and cooled, so factor in that cost as well.

Finally, I'm assuming you'll want to connect it to the Internet at some point. You'll need to figure out your hosting needs, but expect a five-digit figure every month for your optical connection, plus whatever installation costs.

There is a very good reason that data systems for MMOs require big budgets, and it isn't just for development time. Simply running the datacenter from day to day costs a small fortune.

##### Share on other sites
glawton    100
I dont think that erlang is at all an unknown reasonable choice because it is widely used in high performance databases and trading platforms. If the bigger goal is to create an ultra-low distributed database, you need to consider several layers of abstraction, many of which have been brought, up, but I could summarize thus:

Disk access will always be a big problem -- any time you have to leave the 800+mbyte/s but to get data, you are going to add serious latency
ssd is not much of a help either, the high end ones now hit 250 mbyte, and that is just for reads

TCP offloading -- any ultra low latency system is going to need to move the TCP stack outside of the OS, there are just too many dependancies in both windows and linux to get the kind of microsecond performance you want.

For that matter, you will need to invest in 10 gigabit ethernet on both size in order to get the window size down low enough.

But it seems like the very first step is setting up the testing environment to simulate the conditions you want to achieve both on the server and the client so you can see how the pieces fit together for you.

##### Share on other sites
hplus0603    11356
[quote name='glawton' timestamp='1307332562' post='4819970']
I dont think that erlang is at all an unknown reasonable choice
[/quote]

I think all the answerers more or less agree:

1) The problem is not necessarily the software, it's the hardware.
2) Assuming the hardware can do it, Erlang, C++ or assembly *likely* won't matter.
3) For the cases where you run the hardware close to the edge, C++ will likely let you get a small constant factor more users per host, assuming you are CPU bound rather than I/O bound.
4) Note that most of the cost comes from the kernel and the networking hardware, rather than the application, so the cost of the application is unlikely to be the gating factor.

I still need to understand what the use case is, to know whether this is even something I would consider at all. There are lots of different cases:

1) You're writing a voice-over-IP switching fabric, and need to guarantee end-to-end latencies for user experience. Generally, transmission latencies will be so big that if you can guarantee 3 milliseconds end-to-end, you'll be fine!
2) You're writing a distributed simulation, and you want objects to interact within the same simulation step, even if they live on different servers. Thus, you want RPC to be "fast enough" to support a lot of objects at, say, 60 Hz simulation rate. With 0.1 ms latency, assuming you can pipeline, this means about 160 objects per server, give or take. (16.7 milliseconds per frame).

In general, most large networked systems look more like 1), and thus don't need 0.1 millisecond latency. And if you're trying to distribute simulation across multiple servers by object, rather than by zone/locale, then RPC or same-tick dependencies probably aren't the right solution -- much better to allow a one-tick delay for object/object interactions, which allows you to put thousands of objects per server and just send all of the interactions across once per tick -- latency doesn't matter so much at that point.

For comparison: We serve on the order of 100,000 simultaneously connected users through an Erlang cluster of dual-4-core Xeon CPUs, and the total load for the 5 front-end machines at peak is something like 3%. This is an any-to-any messasging fabric. Average round trip is about a millisecond, but we accept much longer worst-case round-trips (design limit: 100 ms, actual limit: more like 10 milliseconds), which means we don't need to pay for the super-heavy-duty hardware or software -- plain Ubuntu and Dell rack servers work for us!
And the $600 that Anteus talks about for low-latency network cards is a very nice price! I think we are quoted about$1,500 each for 10 Gbps Ethernet cards with acceptable latencies (mostly determined by the drivers).

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account