Sending variable size arrays from client to server

Started by
24 comments, last by wood_brian 14 years, 2 months ago
I'm in a peculiar situation, maybe somebody could offer some advice. I'm using ENet as my main mode of client/server communications at the moment, and have been able to successfully send basic structs back and forth for quite some time. I am now in the position, however, in which I need to send variable-sized arrays back and forth between these two machines. I'd like to store this information in a struct, as sending and receiving structs has been pretty easy so far. Sending structs with variable-sized arrays in them appears to be a lot different, though. Do I need to use some kind of object serialization or is there a simpler way to do this that I haven't discovered yet? The basic idea of the struct I need to send is as follows:

typedef struct _StructName
{
    int arr_size;
    int arr[];
} StructName;
Thanks in advance for any help or suggestions.
Advertisement
Structs are fixed-size, and you want to send a variable sized object, so no, you can't do it that way.

The typical solution is simply this:

Sender: send number of elements, then each element in turn.
Receiver: read number of elements, then read that many elements in turn.

I'm assuming you already have a method for determining which object type you're about to receive.
Yes, you need to serialize your array somehow. A prefix length followed by the data is common. (You don't need to go full stateless object network serialization, though, just serializing this particular type would be fine)
enum Bool { True, False, FileNotFound };
Kylotan, sending each individual object seems like a waste of network traffic. Or were you implying that I send "each object in turn" as a batch?

hplus, I think I agree (mostly, although I'd like to implement object serlialization eventually, but right now I'm not sure I'm quite up to snuff for it). Is there any way in particular you've heard/known of to do this via ENet?

Clearly I'm capable of sending basic objects back and forth as it is, or I wouldn't have posted asking about variable-size objects. Currently I'm (sort of) abusing the "channel"s feature in ENet, and simply designating different channels for different structs. I've posted similar "how should I send/identify different data" messages on the mailing lists, and while I've received from incredibly helpful responses, most of it involves a lot of pre-pending binary data and writing packet wrapper classes, or very simple (or complex) implementations of object serlialzation (hence the original reference, no pun intended).

In any case, is there a way to setup some sort of struct hierarchy (I've heard about unions and joins on structs, is this similar to typical database concepts of union and joins, only with structs), and using said operations to identify incoming packets. Is this common practice, what are the good and bad points by using this technique?

Sorry to seem like I'm asking you guys for a handbook. I'm still fairly new to the network programming scene, and I'm trying to discover/soak up as much as I can.
C++ as a language has no reflection capabilities built-in, so there is really no way to "automatically" serialize objects. It's worked for you so far just be treating structs as a raw byte array, but unfortunately you're going to find that it's not a workable solution going foward. In particular, it makes versioning of your network protocol very difficult and in a more general sense, you won't be able to build your program on different machine architectures and expect them all to work together (for example, if you ever did an Xbox port of your game, you'll find that the Xbox is big-endian and all your bytes will be swapped around).

I've seen your posts to the mailing list (I'm more of a lurker on there... I prefer to answer questions in forums where they're more easily searched [wink]) and most of the solutions given there were pretty good. For myself, I've written a fairly simple "buffer" class than my "packet" classes know how to serialize themselves to. Here's a quick sample (off the top of my head: I don't have the code in front of me right now):

class packet_buffer{private:  std::stringstream str_;  std::string value_;public:  void write(uint8_t const *bytes, int count);  void read(uint8_t *bytes, int count);};inline packet_buffer &operator << (packet_buffer &buf, int32_t val){  buf.write(reinterpret_cast<uint8_t *>(&val), 4);}inline packet_buffer &operator << (packet_buffer &buf, std::string const &val){  // strings are length-prefixed  buf << val.size();  buf.write(reinterpret_cast<uint8_t const *>(val.c_str()), val.size());}// etc... I've got additional overloads for some of my "build-in" classes like 3D vectors, etc// also, similar methods for reading from the buffer via operator >>void packet_buffer::write(uint8_t const *bytes, int count){  str_.write(bytes, count);}void packet_buffer::read(uint8_t *bytes, int count){  str_.read(bytes, count);}

Then, to use the code, I've got a packet class which each sub-class inherits from and overrides the "serialize" and "deserialize" methods:

class login_request_packet{private:  std::string username_;  std::string password_;public:  void serialize(packet_buffer &buf);  void deserialize(packet_buffer &buf);};void login_request_packet::serialize(packet_buffer &buf){  buf << username_ << password_;}void login_request_packet::deserialize(packet_buffer &buf){  buf >> user_name >> password_;}

To differentiate packets, I just use the "identifier-prefixed" approach. That is, each packet class has a unique integer identifier that all packets are prefixed with, so on the other end I know which one to construct and pass the packet_buffer to for deserialization.
I used to do serialization using all kinds of fancy templates and macros. You can create pretty elegant systems that way. However, at some point, simplicity should win out. Here's a system that might work just fine for you:

A simple packet class, which really is all you need:
class packet {public:  packet() : pos_(0) {}  void append(void const *data, size_t size) {    data_.insert(data_.end(), (char const *)data, (char const *)data + size);  }  void read(void *data, size_t size) {    if (size > data_.size() - pos_) throw std::invalid_argument("bad size");    memcpy(data, &data_[pos_], size);    pos_ += size;  }  void seek(size_t pos) {    if (pos > data_.size()) throw std::invalid_argument("bad pos");    pos_ = pos;  }  size_t size() const { return data_.size(); }  void const *data() const { return data_.size() ? &data[0] : 0; }private:  std::vector<char> data_;  size_t pos_;};


You probably want something simple to deal with big-endian and little-endian data:

struct net16 {  unsigned char data_[2];  operator int() const { return ((int)data_[0] << 8) | (int)data_[1]; }  net16 &operator=(int i) { data_[0] = i&0xff; data_[1] = (i>>8)&0xff; }  template<typename P> void write(P &p) {    p.write(data_, 2);  }  template<typename P> void read(P &p) {    p.read(data_, 2);  }};struct netString : public std::string {  template<typename P> void write(P &p) {    net16 len = size();    len.write(p);    p.write(c_str(), size());  }  template<typename P> void read(P &p) {    net16 len;    len.read(p);    resize(len);    p.read(&(*this)[0], len);  }};template<typename T> struct netVector : public std::vector<T> {  template<typename P> void read(P &p) {    net16 len;    len.read(p);    resize(len);    for (iterator i(begin()), n(end()); i != n; ++i) {      (*i).read(p);    }  }  template<typename P> void write(P &p) {    net16 len = size();    len.write(p);    for (iterator i(begin()), n(end()); i != n; ++i) {      (*i).write(p);    }  }};


Here, I just say that anything you want to serialize has a "read(P)" and "write(P)" function, and that those functions will call read(data, size) and write(data, size) on the argument (and/or delegate to other members that in turn do that).

Finally, define your messages:

struct ChatMessage {  net16 channel_id;  netString message;  template<typename P> void write(P &p) {    channel_id.write(p);    message.write(p);  }  template<typename P> void read(P &p) {    channel_id.read(p);    message.read(p);  }};struct ItemMessage {  netVector<Item> items;  template<typename P> void write(P &p) {    items.write(p);  }  template<typename P> void read(P &p) {    items.read(p);  }};


Note that I'm assuming that you know what the packet is through some data that comes before the packet. And, if you're on TCP, I'm assuming you know how big the data is, again through some data before the packet. A typical such "framing header" might look like:

struct FramingHeader {  net16 type;  net16 size;  template<typename P> void write(P &p) { type.write(p); size.write(p); }  template<typename P> void read(P &p) { type.read(p); size.read(p); }};


I'll stop now before I write an entire message sending and receiving/dispatching system here, but if you build it up like this, it should be pretty straightforward. On the socket data side, you want to pack all the outgoing data into one big vector, and at a regular interval (each frame, 10 times a second, or whatever) you want to enqueue all the available data. Even if you use UDP, that's how you do it; combining many messages into a single packet to cut down on overhead.

Same thing for incoming; when the socket is readable, you receive as much as you can into the end of some big array that has currently pending data. Then, if there's at least sizeof(FramingHeader) available, decode the framing header and check for how much data it needs. If there's that much additional data available, then decode it (based on the type in the header), remove the data from the incoming array, and repeat. Typically you'll want to use a cyclic buffer of some sort rather than vector::erase() to remove the consumed data, for performance reasons.
enum Bool { True, False, FileNotFound };
Quote:Original post by amilstead3
Kylotan, sending each individual object seems like a waste of network traffic. Or were you implying that I send "each object in turn" as a batch?

I wasn't addressing that issue. Your message overhead is none of my concern. :) I was listing what you had to send, not the method in which you sent it. Since I am not too familiar with ENet I can't give you much more detail. I believe it works on a message-based system so hopefully you could write the whole lot into one message.
Just to add one thing, which is probably just a rehash of what has already been said, but usually with any networking system there's a sort of invisible boundary between fully-serialised types and primitive types. The fully-serialised type usually starts with a unique binary ID and that tells the receiver what type they have. The contents of that type might be further fully-serialised types, described in a recursive fashion. Or, they might be primitive types, read one by one in order. They aren't annotated in the data stream to tell you what they are, you just know what to expect because you know what struct you're reading.

The simplest system in common use often comes down to one collection of message types (which are fully-serialised, ie. annotated with a type), and each message type has its own read/write methods that read or write a collection of primitive types. In a sense, if you use the ENet channel system you have done away with the message type and instead inferred that value from the channel. In the long term it's probably better to think about moving that into your application code instead.

As for unions, I don't think that's anything to do with the database concept. It's more to do with the C++ feature that allows you to define several structs that share some members in common. eg. You might define numerous message structs as part of a union which share a message_type value, and maybe a message_length value or some other generic flags too. Personally I prefer not to take this route and to write things out explicitly rather than rely on fixed struct layouts.
Rather than being terribly specific at this hour, I thought I'd give a handful of random advice --

Firstly, be aware of struct padding and packing; become familiar with your compiler's "packing" directive. Using Microsoft's compiler, I believe it's goes something like #pragma pack(push) #pragma pack(1) <struct definitions> <pragma pack(pop). Eventually you'll have to implement proper serialization for versioning and portability, but until then you can at least avoid sending padding bytes. Also, order your data from the largest datatype down in your structs (and be aware that is the order the variables are initialized in if you have constructors with parameter lists. Keep in mind that such "unnatural" alignment makes the structs less performant to access, so it's probably worth having a packed version for network transmission, and an unpacked version to perform calculations against.

When you do get to serialization, you can send data very efficiently -- say, packing a value of 1-50 in 6 bits, or a Boolean value in just 1. Or maybe by compressing the message payload. It takes some work to extract, but in general you're going to expect to get about 10 full network updates over the WWW per second, and the clients have what is effectively forever to decode messages.

The Decorator pattern is very useful in building up packets or file IO, look into it.

Message size is somewhat of a trade-off against latency -- that is, the longer you spend waiting for enough data to fill a message to the brim, its that much longer the receiver has to wait for fresh data. It may be worth implementing some kind of timer that will cut off the current message if its been waiting around too long.

Most systems use two primary streams, on UDP-based for fast-paced game data, and a second TCP-based for data which is not time-sensitive, such as chat text, score updates, etc.

throw table_exception("(? ???)? ? ???");

Thanks so much for the incredible responses, guys!

It'll take some time to sort through and digest it all, but I really appreciate it all. Some great suggestions in here!

This topic is closed to new replies.

Advertisement