Jump to content

  • Log In with Google      Sign In   
  • Create Account


Communication between languages


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 21 July 2012 - 09:12 PM

What gotchas are there when it comes to network communication between programs written in different languages? I'm interested in finding out what the possible problems are in this multi-language scenario, and in particular where one of the languages used is C++. Are there some open-source systems to look at where one of the components is C++-based?
For example, let's say the people who wrote this on line Java code generator wanted to switch to a binary distribution model. They might be interested in adapting the middle and front tiers of the C++ Middleware Writer to work with their (possibly) Java-based back tier.

Sponsor:

#2 Nypyren   Crossbones+   -  Reputation: 4032

Like
2Likes
Like

Posted 21 July 2012 - 11:24 PM

Low-level network communication between different languages is pretty much the same as file I/O between different languages. You've got a series of bytes in a particular format, and you just need to implement reading/writing for that format.

If their programs are expecting to interop over the network with other programs, they'll probably serialize data out in a very straightforward manner (either they'll use a widely-used format like XML or JSON, or will use a very simple language-agnostic binary format and provide libraries in various languages or good documentation for other developers to implement their own libraries).

If they aren't expecting to interop, they may use their preferred language or framework's built-in seialization features, which would be a lot harder to deal with from a language without an implementation of those features. For example, in .Net, someone might use the built-in BinarySerializer. If you want to try reading that data using native C++ (without using C++/CLI to get access to .Net), you'd need to implement a complex deserializer, which is a LOT more work.



At the network connection level, most languages provide an implementation of 'sockets' to communicate over TCP, UDP, etc. Generally the sockets don't care what language is using them, so they should be able to talk to each other no matter what language is on each end. Each end should be aware of the expected endianness, if applicable.

There are some gotchas: Some developers, particularly for games, write helper code to deal with common things like sending 'messages' over a TCP stream. That code often adds extra data to the TCP stream (such as message lengths, message IDs, etc). You need to be sure that you know what the other side's code sends and expects to receive in order to communicate with it correctly.

Most web services will use HTTP and expect you to send and receive JSON or XML data. These are typically very easy to deal with no matter what language you're using. If they decide to use something else, you might have a lot more work to do.

Edited by Nypyren, 21 July 2012 - 11:38 PM.


#3 0BZEN   Crossbones+   -  Reputation: 2011

Like
0Likes
Like

Posted 22 July 2012 - 01:36 PM

cross platform issues, maybe byte ordering (endianness), floating point arithmetic precision and standards used, bitfield ordering, byte alignment used, string formats (UTF-8, muiltibyte, single byte). Always be careful when you pack in your objects structures as a contiguous memory block. Well, basically, don't do it.

Edited by papalazaru, 22 July 2012 - 01:38 PM.

Everything is better with Metal.


#4 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 22 July 2012 - 09:00 PM

Low-level network communication between different languages is pretty much the same as file I/O between different languages. You've got a series of bytes in a particular format, and you just need to implement reading/writing for that format.

If their programs are expecting to interop over the network with other programs, they'll probably serialize data out in a very straightforward manner (either they'll use a widely-used format like XML or JSON, or will use a very simple language-agnostic binary format and provide libraries in various languages or good documentation for other developers to implement their own libraries).

I'm interested in a language-agnostic binary format. Is there work being done in this area? Just for strings there are the formats papalazaru mentioned and the matter of the string length -- fixed (2,3,4 bytes) or variable-length.


If they aren't expecting to interop, they may use their preferred language or framework's built-in seialization features, which would be a lot harder to deal with from a language without an implementation of those features. For example, in .Net, someone might use the built-in BinarySerializer. If you want to try reading that data using native C++ (without using C++/CLI to get access to .Net), you'd need to implement a complex deserializer, which is a LOT more work.



At the network connection level, most languages provide an implementation of 'sockets' to communicate over TCP, UDP, etc. Generally the sockets don't care what language is using them, so they should be able to talk to each other no matter what language is on each end. Each end should be aware of the expected endianness, if applicable.

I think endianness is hardware related rather than language related. (The code I linked to in the OP is able to deal with either big or little endian machines.)

#5 hplus0603   Moderators   -  Reputation: 5109

Like
1Likes
Like

Posted 23 July 2012 - 02:53 AM

I'm interested in a language-agnostic binary format. Is there work being done in this area?


It's been done, and re-done, since the 1960s. Notable examples that offer compact, binary, cross-language marshaling include:

ASN.1
XDR
Protobuf
Thrift
DCOM
CORBA
HLA / IEEE-1516

These are all very widely adopted, with literally thousands of users each. There are more in that class -- and there's several orders of magnitude more than that of company-specific or system-specific or vendor-specific or project-specific re-solutions of this problem.

enum Bool { True, False, FileNotFound };

#6 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 24 July 2012 - 01:55 PM

I've taken this from the Protobuf page you mentioned.
message Point {
  required int32 x = 1;
  required int32 y = 2;
  optional string label = 3;
}

message Polyline {
  repeated Point point = 1;
  optional string label = 2;
}

I dislike the "repeated" semantics. CORBA has this weakness too. I'm aiming for specifying the protocol and then letting anyone who is interested work with that. Given how many C++ containers there are, something like repeated doesn't help much.

If you specify only the protocol, both ends can use whatever containers they want.

Edited by wood_brian, 24 July 2012 - 01:57 PM.


#7 ApochPiQ   Moderators   -  Reputation: 14674

Like
0Likes
Like

Posted 24 July 2012 - 02:32 PM

I'm not following you here. Protocol buffers are container-agnostic. That's why they say "repeated" instead of "use a vector" or "use a list." First you seem to be against this, then you seem to be advocating it. I'm confused?

#8 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 24 July 2012 - 04:23 PM

I'm not following you here. Protocol buffers are container-agnostic. That's why they say "repeated" instead of "use a vector" or "use a list." First you seem to be against this, then you seem to be advocating it. I'm confused?


I haven't used Protocol buffers so perhaps it's me. How do you populate Polyline's container of Points? It seems you either have to use only Protobuf's general container or copy/move everything from the container of your choice to their generic container.

If the answer is the latter, nothing is gained functionally by the copying/moving -- it just slows things down.

Edited by wood_brian, 24 July 2012 - 04:33 PM.


#9 ApochPiQ   Moderators   -  Reputation: 14674

Like
0Likes
Like

Posted 24 July 2012 - 04:48 PM

Try reading the C++ protocol buffers tutorial. It should hopefully clarify things a bit.

#10 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 25 July 2012 - 02:34 PM

Try reading the C++ protocol buffers tutorial. It should hopefully clarify things a bit.


I've read that more now and downloaded a copy of Protocol Buffers. From what I can tell they implement repeated messages with a RepeatedPtrField.
RepeatedPtrField is derived from a RepeatedPtrFieldBase, which has a
void*  initial_space_[kInitialSize];

. Anyway, it looks like users have to write code to populate Protobuf's generic container at run-time.

#11 hplus0603   Moderators   -  Reputation: 5109

Like
0Likes
Like

Posted 26 July 2012 - 09:42 AM

Anyway, it looks like users have to write code to populate Protobuf's generic container at run-time.


It sounds like you are confusing the three layers of Protobuf. These are:

- The IDL description language, which describes the semantics of the data.
- The wire layout, which describes what the actual bits are on the wire for a particular data structure.
- The tools and libraries that implement protobuf in a particular language.

Note that there can (and do) exist more than one implementation of bindings for a particular language, and part of implementing bindings is to decide how the user uses the library, and what code gets generated. We've developed bindings for protobuf to Erlang, PHP and Javascript, and they each make different decisions on how to expose the data structures to the native language.

enum Bool { True, False, FileNotFound };

#12 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 26 July 2012 - 01:20 PM

Iiuc, you're saying language bindings provide the code to populate protobuf's generic container and users don't have to write it themselves. That's better than what I was thinking, but what about the run-time aspect?

Edited by wood_brian, 26 July 2012 - 01:36 PM.


#13 ApochPiQ   Moderators   -  Reputation: 14674

Like
0Likes
Like

Posted 26 July 2012 - 02:49 PM

You have two options, basically: use the provided protobuf bindings for your language of choice (which means using their provided container code) or write the bindings yourself to comply with the IDL and use whatever containers you want.

So if you're genuinely in a situation where you need the speed, you can directly store data transmitted via protobufs into whatever representation you want. In 90% of situations, where getting the job done is more important, you just use the provided bindings and wrap them in whatever way makes sense.


I'm not sure what's unclear about this?

#14 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 26 July 2012 - 08:52 PM

You have two options, basically: use the provided protobuf bindings for your language of choice (which means using their provided container code) or write the bindings yourself to comply with the IDL and use whatever containers you want.

So if you're genuinely in a situation where you need the speed, you can directly store data transmitted via protobufs into whatever representation you want. In 90% of situations, where getting the job done is more important, you just use the provided bindings and wrap them in whatever way makes sense.


I'm not sure that getting it done the faster way development-time-wise offers a good foundation for reworking it later if you decide to shift gears.

I'm not sure what's unclear about this?


It's clear with your explanation here. I guess I think the extra code that has to be generated, built, loaded, and run is a bigger deal than some people.

#15 hplus0603   Moderators   -  Reputation: 5109

Like
0Likes
Like

Posted 27 July 2012 - 03:44 AM

I'm not sure that getting it done the faster way development-time-wise offers a good foundation for reworking it later if you decide to shift gears.


The fact that you manage all your external (protocol) data using a well-defined IDL, a well-defined wire format, and a tool chain that supports many languages, is a FANTASTIC start on being able to optimize/improve the parts that matter, once you have an actual system where you can measure what matters.

I think the extra code that has to be generated, built, loaded, and run is a bigger deal than some people.


Two things:

1. In many languages, you can parse Protobuf descriptions at runtime, and interpret them rather than compile them. Python, for example, can easily do this, as can Javascript and most other truly dynamic languages. The Protobuf in-memory representation even supports transparent version up/down-shifting, by storing unrecognized fields in a "copy-forward" format. This means you can upgrade your protobuf IDL files (which means the network protocol) at runtime. Sometimes, that's quite valuable, and other times, that's probably a bad idea. Network wires are typically slower than the CPUs interpreting the data, so the overhead of interpretation is often not important in a profile of the running system.

2. When you know that CPU cost is important, then you want dedicated marshaling/demarshaling code that is compiled. You could write this code manually. That code would have to be built, debugged, loaded and run, to be able to understand the protocol. Unfortunately, manual marshaling is a very bug-prone way of development. Thus, you can instead generate the code from the IDL files. This doesn't generate appreciably more or worse codtoue than the hand-coded version, but you can be certain that there are no marshaling bugs. Additionally, implementing changes is as simple as re-running "make," rather than, for example, having to update half a dozen different source files that touch the data.

enum Bool { True, False, FileNotFound };

#16 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 27 July 2012 - 03:31 PM

a well-defined wire format,


I don't think there's anything preventing the development a well-defined wire format for this.

and a tool chain that supports many languages, is a FANTASTIC start on being able to optimize/improve the parts that matter, once you have an actual system where you can measure what matters.


Perhaps it would be possible to use languages' built-in serialization support more in distributed systems development rather than having this functionality duplicated in CORBA implementations, Protocol Buffers, Thrift, etc. A different approach might be helpful in terms of making it easier for more languages to be used in distributed systems.

#17 ApochPiQ   Moderators   -  Reputation: 14674

Like
0Likes
Like

Posted 27 July 2012 - 06:38 PM

That's a great theory, until you want to write one half of your distributed system in Erlang and the other in Stackless Python.

#18 hplus0603   Moderators   -  Reputation: 5109

Like
0Likes
Like

Posted 29 July 2012 - 03:41 AM

Perhaps it would be possible to use languages' built-in serialization support


Then again, perhaps it wouldn't. Each language serializes differently, and almost all languages are too verbose with their "native" serialization for games purposes.
If you want to try this, I suggest trying to find a way to load a Python Pickle into a Java Serialization stream, or perhaps make a DCOM RPC into an Erlang port. I'd be very interested in hearing how that goes!
enum Bool { True, False, FileNotFound };

#19 wood_brian   Banned   -  Reputation: 197

Like
0Likes
Like

Posted 30 July 2012 - 01:11 PM

Then again, perhaps it wouldn't. Each language serializes differently, and almost all languages are too verbose with their "native" serialization for games purposes.


When you say each language serializes differently that goes back to my initial question.
How is built-in serialization verbose?

If you want to try this, I suggest trying to find a way to load a Python Pickle into a Java Serialization stream, or perhaps make a DCOM RPC into an Erlang port. I'd be very interested in hearing how that goes!


From a practical perspective I think it makes sense to focus on scenarios where one of the languages is C++... thus the idea to adapt my middle tier to work with whatever language that Java code generator is written in. I found getting away from a web-based front end to be liberating.

#20 hplus0603   Moderators   -  Reputation: 5109

Like
0Likes
Like

Posted 30 July 2012 - 02:18 PM

How is built-in serialization verbose?


By including too much information compared to what you can get away with when you have lots of domain specific knowledge.

I found getting away from a web-based front end to be liberating.


Is this the same web-based front end that you last year suggested would be much better than everything that had come before it?
You may want to go back and re-review the discussion that was had about that a year or two ago. If I remember right, a lot of the same information that's in this thread was provided at that time, too.

enum Bool { True, False, FileNotFound };




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS