Sign in to follow this  

Packet Auto Build/Decomposition Design

This topic is 3311 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Currently, im working on a Packet Build and Decomposition system that will allow for easy integration of new packets into my server. I have the build part of the system down and fairly 'set in stone'. The overall design will result with a class-per-packet that will be compiled into a 'class library'. For the build system, i have it setup so when you want to create a new packet, you just call the constructor of that certain packet, and through inheritance, it will fill in the rest of the protocol, as such:

// Base class for all packets.
public class Packet
{
  // All of my special Write functions for the stream, here.
}


//-----------------------------------------------------------------

public class DataPacket : Packet
{
    DataPacket(uint32 Opcode)
    {
      // Write the opcode and sequence
    };
}

//------------------------------------------------------------------

public class PingPacket : Packet
{
    PingPacket();
}

//------------------------------------------------------------------

public class MovementPacket : DataPacket
{
   MovementPacket(float Speed, float Direction) : DataPacket(0xDEADCODE)
   {
     // Write the variables for movement, but skip writing the head for Data Packets.
    // The DataPacket class is already handling that.
   }
}




So, in the end, i just need to retrieve the MemoryStream as a byte[] and send the packet. IMO, very effective. ..... Now, what im stuck on, is Auto decomposition. Since my server is Multithreaded and uses Async Socket calls, I look at where some potential bottlenecks could be (other then a slow ass connection), and one i found (or have notices in the past with other emulators ive worked on), is decomposition of packets in a multi-threaded enviornment. In all of the previous projects ive worked on, the packet was all decomposed in the Player class, or the Server class, causing it to be locked, so that wouldnt allow the processing of multiple packets at once. So, what ive come up with, is an auto decomposition system, that will decompose a packet in its own class and the hierarchy. So, when i receive a packet, i would stick it in the DataPacket class (these packets btw, have dynamic lists in them, so i cant use static structs to read them) constructor that takes (byte[]). From there, it would determine the next packet in the hierarchy of the protocol, while staying out of the player class. Once done, it would queue itself in the player class for processing (the auto decomposition is done in another thread other then the player object, so we need to queue it for processing), all pre-parsed and ready. Now, personally, since i have never tryed something like this, im not sure how well it would work. So to my question, 'Does anyone know of any methods of packet decomposition, that allows for a multi-threaded enviornment? Do you think this would work, and how well?' Sorry for my horrible grammar and spelling.

Share this post


Link to post
Share on other sites
Hmmm... in our system, we receive a bytestream (threaded, which puts them in a memory-block) & then have a loop in another thread which gets the current mem-block and decomposes the data into the distict classes,
and than an 'execute' function is called..

I don't think we could parellelize this, because the order in which the events are executed is extremely important (think of a create-entity-packet, followed by a update-entity-value-packet, you really don't want to change the order because you would update a non-existing entity).

We don't have any performance problems with this approach (running
live with 16player servers, and internally tested with 32 player servers).

Even if you would decompose them at the same time, you'd probably need a sync-object, and make sure they are handle in the correct order.

Share this post


Link to post
Share on other sites
Quote:
Original post by lordcorm
Now, what im stuck on, is Auto decomposition.

Since my server is Multithreaded and uses Async Socket calls, I look at where some potential bottlenecks could be (other then a slow ass connection), and one i found (or have notices in the past with other emulators ive worked on), is decomposition of packets in a multi-threaded enviornment.


The process known as serialization is rarely a bottleneck. There is also no need for the process to be thread-aware.

Through your network you receive data at 10 megabytes per second. Serialization needs to memcpy each byte from buffer into your structure. RAM has throughput of over 1 gigabyte per second. As such, at maximum network capacity, this will take 1% of total CPU time at most.

Share this post


Link to post
Share on other sites
I really like this topic, so here's a response of me 'thinking outloud' on the matter with example code.

Lately, I've been considering weakly typed systems to handle network data. I look at the conventional means, such as declaring your packet formats in struts/classes and then pack/unpack, marshal/demarshal, etc... them into different typed objects and think about the different ways of doing this in a loose format that would allow for easy change and modifications. The reason for this is to offer a framework of prototyping out a project and then being able to easily modify it to see just what works best before committing to a hard design. The immediate trade off of such a design is additional overhead and loss of the strong typed nature of C++, but in return you get a lot of flexibility and ease of use design wise.

What follows now are 3 examples that sum up my line of thinking. The comments are the observations on the idea.

Example 1 - Traditional (The way this packet is built does not matter, this is shown as the simplest form possible)

// "Traditional" method. Each packet opcode has its own format. This approach offers the
// least amount of packet size overhead at the expense of more complicated parsing logic.
// Also, the downside is, if the packet format were to change, then a number of changes
// are required to keep the packing/unpacking in sync. Imagine if a DWORD were to be removed
// above and not from below.
{
cStreamBuilder builder;
builder.Append<WORD>(0); // size place holder
builder.Append<WORD>(1); // opcode
builder.Append<DWORD>(10);
builder.Append<DWORD>(100);
builder.Append<DWORD>(1000);
builder.Append<DWORD>(10000);
builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size

cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize());
WORD size = reader.Read<WORD>();
WORD opcode = reader.Read<WORD>();
if(opcode == 1)
{
DWORD dw1 = reader.Read<DWORD>();
DWORD dw2 = reader.Read<DWORD>();
DWORD dw3 = reader.Read<DWORD>();
DWORD dw4 = reader.Read<DWORD>();
}
}



Example 2 - Auto Decomposition, but traditional build

// "Theoretical Method 1". One packet format for all opcodes. Trivial to process, but the
// expense is an extra byte for each field in the packet + 1 extra byte. If N fields exist
// in the packet to parse, packet size has an extra N bytes + 1 of overhead. However, any
// changes made in the packet format do not affect the parsing or packing logic.
{
cStreamBuilder builder;
builder.Append<WORD>(0); // size place holder
builder.Append<WORD>(1); // opcode
builder.Append<BYTE>(4); builder.Append<DWORD>(100); // size -> data format
builder.Append<BYTE>(2); builder.Append<WORD>(10); // size -> data format
builder.Append<BYTE>(1); builder.Append<BYTE>(1); // size -> data format
builder.Append<BYTE>(0); // no more data
builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size

cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize());
WORD size = reader.Read<WORD>();
WORD opcode = reader.Read<WORD>();
BYTE data[255] = {0}; // Let's only allow 255 bytes of data to exist in any field
BYTE count = 0;
int index = 0;
do
{
count = reader.Read<BYTE>();
reader.ReadArray(data, count);

// Based on the opcode, process data / count in accordance
// to the packet format. Perhaps a function that is called
// passing the data along to an object to be stored. Here is
// one example, not optimized or anything just to show a
// possibility.

if(opcode == 1)
{
// void SomeClass::AddField(int index, LPBYTE data, BYTE count);
AddField(index++, data, count);
}
}
while(count);
}



Example 3 - Auto Decomposition, Auto Build

// "Theoretical Method 2". This method combines the two previous methods, but
// shifts where the overhead occurs as well as what changes during a format change.
// The packet format itself only has to change in one place. Fields of the format
// naturally have to be changed as needed where they are used. The overhead added
// is added in terms of overall processing done and memory consumption, but no
// additional packet overhead is added.

SetupPacketFields();
cStreamBuilder builder;
{
cEntity sourcedata;
sourcedata.SetDecimal("Gold", 1000.35);
sourcedata.SetString("Name", "Drew");
sourcedata.SetNumber("NameLength", 4);
sourcedata.SetNumber("HP", 200);
builder = Pack(1, sourcedata); // Build a packet from a data object


cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize());
WORD size = reader.Read<WORD>();
WORD opcode = reader.Read<WORD>();

cEntity thisObj;
Unpack(opcode, reader, thisObj);

// Based on the opcode, process the 'packet' object. Can use the
// GetNumber, GetString, and GetDecimal functions to retrieve data
// and then Set_XX functions to add data and even pass the data on
// to other functions as needed. Here is one example, not optimized
// or anything just to show a possibility.
if(opcode == 1)
{
std::string name = thisObj.GetString("Name");
DWORD hp = thisObj.GetNumber("HP");
double gold = thisObj.GetDecimal("Gold");

printf("[%s][%i hp][%.0f gold]\n", name.c_str(), hp, gold);
}
}



Example 3 (Continued) (Support code)

struct tPacketField
{
BYTE type;
BYTE size;
std::string name;
};

std::map<WORD, std::vector<tPacketField> > packet_formats;

void SetupPacketFields()
{
// Hard coded here, but imagine loading from files/database

tPacketField format1;
format1.type = 0; // 0 - number
format1.size = 4; // number types can be 1-4 bytes
format1.name = "HP";

tPacketField format2;
format2.type = 1; // 1 - decimal
format2.size = 8; // all decimal types are doubles, 8 bytes
format2.name = "Gold";

tPacketField format3;
format3.type = 0; // 0 - number
format3.size = 1; // number types can be 1-4 bytes
format3.name = "NameLength";

tPacketField format4;
format4.type = 2; // 2 - string
format4.size = 0; // Size of 0 means use the previous entry's data, otherwise the # of bytes to read
format4.name = "Name";

packet_formats[1].push_back(format1);
packet_formats[1].push_back(format2);
packet_formats[1].push_back(format3);
packet_formats[1].push_back(format4);
}

void Unpack(const WORD opcode, cStreamReader & reader, cEntity & object)
{
// This code would be the same unpack code for all opcodes
std::map<WORD, std::vector<tPacketField> >::iterator itr = packet_formats.find(opcode);
if(itr != packet_formats.end())
{
std::vector<tPacketField> & formatVec = itr->second;
const size_t length = formatVec.size();
for(size_t x = 0; x < length; ++x)
{
tPacketField & field = formatVec[x];

// Number
if(field.type == 0)
{
long data = 0;
reader.ReadArray<BYTE>((LPBYTE)&data, field.size); // By program convention a # is 1-4 bytes only
object.SetNumber(field.name, data);
}
// Decimal
else if(field.type == 1)
{
double data = reader.Read<double>(); // By program convention a decimal is only a double, never a float
object.SetDecimal(field.name, data);
}
// String
else if(field.type == 2)
{
char data[256] = {0};
if(field.size)
{
reader.ReadArray(data, field.size);
}
else
{
long size = object.GetNumber(formatVec[x - 1].name);
reader.ReadArray(data, size);
}
object.SetString(field.name, data);
}
}
}
}

cStreamBuilder Pack(const WORD opcode, cEntity & sourcedata)
{
cStreamBuilder builder;

builder.Append<WORD>(0); // size place holder
builder.Append<WORD>(opcode); // opcode

std::map<WORD, std::vector<tPacketField> >::iterator itr = packet_formats.find(opcode);

// This code would be the same pack code for all opcodes
if(itr != packet_formats.end())
{
std::vector<tPacketField> & formatVec = itr->second;
const size_t length = formatVec.size();
for(size_t x = 0; x < length; ++x)
{
tPacketField & field = formatVec[x];

// Number
if(field.type == 0)
{
long data = sourcedata.GetNumber(field.name);
builder.AppendArray<BYTE>((LPBYTE)&data, field.size);
}
// Decimal
else if(field.type == 1)
{
double data = sourcedata.GetDecimal(field.name);
builder.Append<double>(data);
}
// String
else if(field.type == 2)
{
// Since we are mixing std::string with a C string format, we have a bit of extra work to take care of
std::string data = sourcedata.GetString(field.name);
if(field.size)
{
char * strdata = new char[field.size];
memset(strdata, 0, field.size);
int count = 0;
if(field.size > data.size())
count = data.size(); // make sure we don't read more data than there is
else
count = field.size; // we have more data than we need, truncate
memcpy(strdata, data.c_str(), count); // Just copy the data, '\0' is handled by other end
builder.AppendArray<const char>(strdata, field.size);
}
else
{
long size = sourcedata.GetNumber(formatVec[x - 1].name); // If everything is used correctly, this will be data.size()
builder.AppendArray<const char>(data.c_str(), size);
}
}
}
}
builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size
return builder;
}



Example 3 Continued (cEntity header)

class cEntity
{
private:
CRITICAL_SECTION cs;
std::map<std::string, std::string> classStrings;
std::map<std::string, long> classNumbers;
std::map<std::string, double> classDecimals;

public:
cEntity();
~cEntity();

// Clears all associated numbers with the class
void ClearNumbers();
// Clears all decimals numbers with the class
void ClearDecimals();
// Clears all associated strings with the class
void ClearStrings();

// Sets a key data pair. If the key exists, its previous contents are overwritten with the new data
void SetNumber(const std::string & key, long data);

// Removes the entry with the specific key. Returns true if the key was successfully
// removed or false if the key was not found
bool RemoveNumber(const std::string & key);

// Modifies an existing entry's data. If the key is not found a const char * std::exception is thrown
long ModifyNumber(const std::string & key, long data);

// Returns the number data associated with the key of name. Throws a const char * std::exception if the
// name does not exist.
long GetNumber(const std::string & key);

// Sets a key data pair. If the key exists, its previous contents are overwritten with the new data
void SetDecimal(const std::string & key, double data);

// Removes the entry with the specific key. Returns true if the key was successfully
// removed or false if the key was not found
bool RemoveDecimal(const std::string & key);

// Modifies an existing entry's data. If the key is not found a const char * std::exception is thrown
double ModifyDecimal(const std::string & key, double data);

// Returns the decimal data associated with the key of name. Throws a const char * std::exception if the
// name does not exist.
double GetDecimal(const std::string & key);

// Sets a key data pair. If the key exists, its previous contents are overwritten with the new data
void SetString(const std::string & key, std::string data);

// Removes the entry with the specific key. Returns true if the key was successfully
// removed or false if the key was not found
bool RemoveString(const std::string & key);

// Returns the string data associated with the key of name. Throws a std::string std::exception if the
// name does not exist.
std::string GetString(const std::string & key);

// Saves the entity's data to a file. Any entries prefixed with TMP_ are not saved.
// If the file cannot be saved, false is returned, otherwise true indicates success.
bool Save(const std::string & filename);

// Loads data from a file. Any entries prefixed with TMP_ are not loaded. Returns
// false if the file could not be loaded or true on success.
bool Load(const std::string & filename);
};



Ok so the code is not perfect, I literally wrote it (everything but cEntity/cStreamBuilder/cStreamReader) while I was typing this post, but I think you can see the idea. If we compromise the extra memory and processing, yet leave the network data that is sent untouched, I think an auto packet build / decomposition system could be realized through the last example. Adding new packets is trivial and does not affect anything else if you were to load the formats from a file.

Further more, the data that is in the packets is only in the packets and does not affect the objects that originally contained the data. This is important if you have an object that has X fields and you only should send some N <= X - 1 fields of that object.

I will play with this system some more after writing it for this post, I've been trying to come up with a simple to use system like this, but this thread really got my mind going. Comments?

Share this post


Link to post
Share on other sites
Your system looks like a typical manual marshaling system. That starts becoming unwieldy when the number of packets grows, and the really bad problem starts happening when the pack and unpack functions go out of sync. Finding those problems need really good unit test coverage.

Meanwhile, the visit/visitor pattern used by my previously documented system can be used manually, if you don't want to use an interface description language of some sort. And because there is only one visit function per data structure, there is no chance of it getting out of sync.

In brief, you write a Visit() function for each data structure:

struct MyStruct {
bool flag;
double variable;
std::string name;
std::list<OtherStruct> list;
};

template<typename Visitor> bool Visit(MyStruct const &ms, Visitor &v) {
return v.Begin(ms, "MyStruct") &&
v.Visit(ms.flag, "flag") &&
v.Visit(ms.variable, "variable", -10, 10, 1024) && // min, max and quanta
v.Visit(ms.name, "name") &&
v.Visit(ms.list, "list") &&
v.End(ms);
}


To save the data out, you pass in a "SaveOutVisitor" instance. To read data in, you pass in a "SaveInVisitor" instance. To print to XML, you pass in a "PrintToXML" visitor. To build a property sheet GUI, you pass in a "BuildPropertySheetGUI" visitor. The cool thing is that each of the visitors just need to implement the Begin/Visit/End protocol, and then they can do what they want with the data structure!

If you generate the Visit() function code from an IDL, then you can easily change your mind about the layout, and the rest of the project gets updated (that's what my perl script does). If you want to support by-name field access, similar to your named field container, then that can easily be done using a NamedFieldAccessor visitor, too:

template<typename FieldType>
class NamedFieldVisitor {
public:
NamedFieldVisitor(FieldType &r, char const *name) : result_(r), name_(n) {
}
FieldType &result_;
std::string name_;
template<typename Struct> bool operator()(Struct const &s) {
return !Visit(s, *this);
}
template<typename Struct> bool Begin(Struct const &s, char const *n) { return true; }
template<typename Type> bool Visit(Type const &t, char const *name) { return true; }
template<typename Type> bool Visit(Type const &t, char const *name, Type a, Type b, int c) { return true; }
template<> bool Visit(FieldType const &t, char const *name) {
if (name_ == name) { result_ = t; return false; }
return true;
}
template<> bool Visit(FieldType const &t, char const *name, FieldType a, FieldType b, int c) {
if (name_ == name) { result_ = t; return false; }
return true;
}
template<typename Struct> bool End() {
return true;
}
};

MyStruct ms;
double d;
NamedFieldVisitor nfv(d, "variable");
if (nfv.Visit(ms)) { printf("The variable is: %f\n", d); }


With a modern optimizing compiler, the template specialization will make sure that most field accesses are optimized out, and this access is about as efficient as a map or hash table (more optimal for some cases, less for others).

Share this post


Link to post
Share on other sites
Speaking off out of sync, and speed, (this is a bit off topic) do you think it would be smart to setup my server so it uses an AsyncReceiveFrom, then have it so when it receives the packet, it will find the corresponding session and then execute a ThreadPool task for that session? (im using C#)

Or should i got thread per-client?

As for the 'getting out of sync', do you think it would be smart to process all my packets and push them in a queue, then before the ThreadPool task is done, Send all the packets at the same time?

Or should i link it to my EventClock, and have all queued packets in each Session sent every 5 or 10 ticks?


Thanks

Share this post


Link to post
Share on other sites
Thread-per-client is never the right choice, IMO.

A thread pool plus asynchronous I/O is usually well performing on Windows.

Generally, you will defer packet sending, and send at a fixed rate (say, every 50 ms or every 100 ms). Often, you will send a packet even if there's nothing to send, to keep things like acknowledge and timer sync flowing.

Share this post


Link to post
Share on other sites
Ok, so i should make a method in my Server class called something like, "ProcessSessions()", that will iterate through my Session Dictionary and call SendQueuedMessages().

Then when my server starts-up, i should add ProcessSessions to the EventClock?

Share this post


Link to post
Share on other sites
That would be one way of doing it.

If you're sending entity updates on a timer, you might want to queue not the message itself, but the fact that you want to update entity X. Then when the process time comes, you generate the entity update at that point; this will make sure you send the freshest data you have available at the time, instead of having an update message sit in the queue and wait.

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
A thread pool plus asynchronous I/O is usually well performing on Windows.

Could you elaborate on this? There are quite a few people that talk about using thread pools but there's rarely much detail. Is it primarily for pushing application-specific message decoding logic to a background thread so that the main I/O thread can continue?

Share this post


Link to post
Share on other sites
The way that I/O completion ports work on Windows, a number of threads (a pool) can sit waiting for I/O completion to come in. When the completion comes in, the next ready thread picks it up and runs with it. The theory is that the blocking I/O will be handled by the overlapped (async) I/O request, and the computation-bound processing of I/O is handled by the thread pool. For high-load situations, you will then spawn one thread pool thread per physical core, plus one or a few threds for UI and management that doesn't normally need intensive processing.

In general, thread pools are a useful way of spreading load that you can compartmentalize across available physical computation cores. If you have a "here's some work to do" abstraction, then you build a queue of those work items, and have a set of threads pull work from that queue and perform it. That way, you don't need to have one thread per subsystem, but can scale up to the number of available cores (as long as there is work to do). The draw-back is that some systems don't really decompose well into right-sized computational chunks.

Share this post


Link to post
Share on other sites
I've used various systems in the past, auto-generated packets, hand coded seralziation fuctions, etc.. Eventually I settled upon a macro based scheme where you define the serazlation fucitons inline with the class/struct and it auto generates the read/write functional hooks. I also used marcos to support versioning since I've used this scheme for both networking and general save/load scheme, but not in this version. In the same vien, for packet registeration it's also auto-regsiterting but you do have to define some additional marcos in the implementation.

Looks something like this.


class netFileDownloadPacket : public netPacketBase
{
NET_PACKET_SIMPLE(netFileDownloadPacket,eNetFileDownloadPacket);
public:

netFileDownloadPacket(){}

BIND_(1, SvcNet::NetBuffer, mData);
BIND_(2, StringType, mPath);
BIND_END_VIRTUAL(2,netFileDownloadPacket);
};

BIND_OBJECT(netFileDownloadPacket);
NET_PACKET_REGISTER(netFileDownloadPacket);




Some people don't like the marcos being in the class, header but it's very convient to keep the implementatino and serazlation in sync as they are one in the same.

Here's a snippet of the marcos



//serialization macros create interface functions
//n = number of the param, will generate a read/write function for the given number
//t = type of the member variable
//m = member variable name itself
#define BIND_(n,t,m) t m; void _W##n(SvcNet::NetBuffer& in) const {in.Write(m);} void _R##n(SvcNet::NetBuffer& out){out.Read(m);}

//similar to BIND_, but defines a fixed sized array instead, param (s) is the size of the array
#define BIND_A(n,t,m,s) t m[s]; void _W##n(SvcNet::NetBuffer& in) const {in.WriteArray(m,s);} void _R##n(SvcNet::NetBuffer& out){out.ReadArray(m,s);}

//similar to BIND_, but defines custom read/write functions (r,w)
#define BIND_C(n,t,m,r,w) t m; void _W##n(SvcNet::NetBuffer& in) const {r(in,m);} void _R##n(SvcNet::NetBuffer& out){w(out,m);}





this scheme does depend upon global templated functions hooks for read/write, that's what the BIND_OBJECT does as such



//defines global binding function, the object must implement the binding interface of Write/Read
#define BIND_OBJECT(T) inline void Write (const T&t, SvcNet::NetBuffer& buff){t.Write(buff);} inline void Read (T&t, SvcNet::NetBuffer& buff){t.Read(buff);}

//similar as BIND_END but makes the function virtual, allow for inheritance on the read/write
#define BIND_END_VIRTUAL(n,p) virtual void Write (SvcNet::NetBuffer& buff)const{BIND_WRITE(p,n);} virtual void Read (SvcNet::NetBuffer& buff){BIND_READ(p,n);}




Ultimately the hooks are called inside the NetBuffers read/write functions to complete seralzaition/deseralzitions. The last bit of the puzzle is the autogernation of the seralzaition functions with the proper number of calls to the auto generated read/write funcitons. You'll have to use a marco unrolling technique, it's a common template meta-programming techinque, used in libraries such as Boost, Luabind, etc..

It looks like this:



#define BIND_WRITE1(type,n) type::_W1(buff);
#define BIND_WRITE2(type,n) BIND_WRITE1(type,n-1) type::_W2(buff);
#define BIND_WRITE3(type,n) BIND_WRITE2(type,n-1) type::_W3(buff);
#define BIND_WRITE4(type,n) BIND_WRITE3(type,n-1) type::_W4(buff);
#define BIND_WRITE5(type,n) BIND_WRITE4(type,n-1) type::_W5(buff);
#define BIND_WRITE(type,n) BIND_WRITE##n(type,n-1)





And there u go!

Enjoy!

-ddn

Share this post


Link to post
Share on other sites

This topic is 3311 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this