Communication between languages

Started by
19 comments, last by wood_brian 11 years, 8 months ago
What gotchas are there when it comes to network communication between programs written in different languages? I'm interested in finding out what the possible problems are in this multi-language scenario, and in particular where one of the languages used is C++. Are there some open-source systems to look at where one of the components is C++-based?
For example, let's say the people who wrote this on line Java code generator wanted to switch to a binary distribution model. They might be interested in adapting the middle and front tiers of the C++ Middleware Writer to work with their (possibly) Java-based back tier.
Advertisement
Low-level network communication between different languages is pretty much the same as file I/O between different languages. You've got a series of bytes in a particular format, and you just need to implement reading/writing for that format.

If their programs are expecting to interop over the network with other programs, they'll probably serialize data out in a very straightforward manner (either they'll use a widely-used format like XML or JSON, or will use a very simple language-agnostic binary format and provide libraries in various languages or good documentation for other developers to implement their own libraries).

If they aren't expecting to interop, they may use their preferred language or framework's built-in seialization features, which would be a lot harder to deal with from a language without an implementation of those features. For example, in .Net, someone might use the built-in BinarySerializer. If you want to try reading that data using native C++ (without using C++/CLI to get access to .Net), you'd need to implement a complex deserializer, which is a LOT more work.



At the network connection level, most languages provide an implementation of 'sockets' to communicate over TCP, UDP, etc. Generally the sockets don't care what language is using them, so they should be able to talk to each other no matter what language is on each end. Each end should be aware of the expected endianness, if applicable.

There are some gotchas: Some developers, particularly for games, write helper code to deal with common things like sending 'messages' over a TCP stream. That code often adds extra data to the TCP stream (such as message lengths, message IDs, etc). You need to be sure that you know what the other side's code sends and expects to receive in order to communicate with it correctly.

Most web services will use HTTP and expect you to send and receive JSON or XML data. These are typically very easy to deal with no matter what language you're using. If they decide to use something else, you might have a lot more work to do.
cross platform issues, maybe byte ordering (endianness), floating point arithmetic precision and standards used, bitfield ordering, byte alignment used, string formats (UTF-8, muiltibyte, single byte). Always be careful when you pack in your objects structures as a contiguous memory block. Well, basically, don't do it.

Everything is better with Metal.


Low-level network communication between different languages is pretty much the same as file I/O between different languages. You've got a series of bytes in a particular format, and you just need to implement reading/writing for that format.

If their programs are expecting to interop over the network with other programs, they'll probably serialize data out in a very straightforward manner (either they'll use a widely-used format like XML or JSON, or will use a very simple language-agnostic binary format and provide libraries in various languages or good documentation for other developers to implement their own libraries).


I'm interested in a language-agnostic binary format. Is there work being done in this area? Just for strings there are the formats papalazaru mentioned and the matter of the string length -- fixed (2,3,4 bytes) or variable-length.



If they aren't expecting to interop, they may use their preferred language or framework's built-in seialization features, which would be a lot harder to deal with from a language without an implementation of those features. For example, in .Net, someone might use the built-in BinarySerializer. If you want to try reading that data using native C++ (without using C++/CLI to get access to .Net), you'd need to implement a complex deserializer, which is a LOT more work.



At the network connection level, most languages provide an implementation of 'sockets' to communicate over TCP, UDP, etc. Generally the sockets don't care what language is using them, so they should be able to talk to each other no matter what language is on each end. Each end should be aware of the expected endianness, if applicable.

[/quote]
I think endianness is hardware related rather than language related. (The code I linked to in the OP is able to deal with either big or little endian machines.)
I'm interested in a language-agnostic binary format. Is there work being done in this area?[/quote]

It's been done, and re-done, since the 1960s. Notable examples that offer compact, binary, cross-language marshaling include:

ASN.1
XDR
Protobuf
Thrift
DCOM
CORBA
HLA / IEEE-1516

These are all very widely adopted, with literally thousands of users each. There are more in that class -- and there's several orders of magnitude more than that of company-specific or system-specific or vendor-specific or project-specific re-solutions of this problem.
enum Bool { True, False, FileNotFound };
I've taken this from the Protobuf page you mentioned.

message Point {
required int32 x = 1;
required int32 y = 2;
optional string label = 3;
}

message Polyline {
repeated Point point = 1;
optional string label = 2;
}


I dislike the "repeated" semantics. CORBA has this weakness too. I'm aiming for specifying the protocol and then letting anyone who is interested work with that. Given how many C++ containers there are, something like repeated doesn't help much.

If you specify only the protocol, both ends can use whatever containers they want.
I'm not following you here. Protocol buffers are container-agnostic. That's why they say "repeated" instead of "use a vector" or "use a list." First you seem to be against this, then you seem to be advocating it. I'm confused?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


I'm not following you here. Protocol buffers are container-agnostic. That's why they say "repeated" instead of "use a vector" or "use a list." First you seem to be against this, then you seem to be advocating it. I'm confused?


I haven't used Protocol buffers so perhaps it's me. How do you populate Polyline's container of Points? It seems you either have to use only Protobuf's general container or copy/move everything from the container of your choice to their generic container.

If the answer is the latter, nothing is gained functionally by the copying/moving -- it just slows things down.
Try reading the C++ protocol buffers tutorial. It should hopefully clarify things a bit.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


Try reading the C++ protocol buffers tutorial. It should hopefully clarify things a bit.


I've read that more now and downloaded a copy of Protocol Buffers. From what I can tell they implement repeated messages with a RepeatedPtrField.
RepeatedPtrField is derived from a RepeatedPtrFieldBase, which has a

void* initial_space_[kInitialSize];


. Anyway, it looks like users have to write code to populate Protobuf's generic container at run-time.

This topic is closed to new replies.

Advertisement