// Base class for all packets.
public class Packet
{
// All of my special Write functions for the stream, here.
}
//-----------------------------------------------------------------
public class DataPacket : Packet
{
DataPacket(uint32 Opcode)
{
// Write the opcode and sequence
};
}
//------------------------------------------------------------------
public class PingPacket : Packet
{
PingPacket();
}
//------------------------------------------------------------------
public class MovementPacket : DataPacket
{
MovementPacket(float Speed, float Direction) : DataPacket(0xDEADCODE)
{
// Write the variables for movement, but skip writing the head for Data Packets.
// The DataPacket class is already handling that.
}
}
Packet Auto Build/Decomposition Design
Currently, im working on a Packet Build and Decomposition system that will allow for easy integration of new packets into my server.
I have the build part of the system down and fairly 'set in stone'. The overall design will result with a class-per-packet that will be compiled into a 'class library'.
For the build system, i have it setup so when you want to create a new packet, you just call the constructor of that certain packet, and through inheritance, it will fill in the rest of the protocol, as such:
So, in the end, i just need to retrieve the MemoryStream as a byte[] and send the packet. IMO, very effective.
.....
Now, what im stuck on, is Auto decomposition.
Since my server is Multithreaded and uses Async Socket calls, I look at where some potential bottlenecks could be (other then a slow ass connection), and one i found (or have notices in the past with other emulators ive worked on), is decomposition of packets in a multi-threaded enviornment.
In all of the previous projects ive worked on, the packet was all decomposed in the Player class, or the Server class, causing it to be locked, so that wouldnt allow the processing of multiple packets at once.
So, what ive come up with, is an auto decomposition system, that will decompose a packet in its own class and the hierarchy.
So, when i receive a packet, i would stick it in the DataPacket class (these packets btw, have dynamic lists in them, so i cant use static structs to read them) constructor that takes (byte[]).
From there, it would determine the next packet in the hierarchy of the protocol, while staying out of the player class. Once done, it would queue itself in the player class for processing (the auto decomposition is done in another thread other then the player object, so we need to queue it for processing), all pre-parsed and ready.
Now, personally, since i have never tryed something like this, im not sure how well it would work. So to my question, 'Does anyone know of any methods of packet decomposition, that allows for a multi-threaded enviornment? Do you think this would work, and how well?'
Sorry for my horrible grammar and spelling.
Hmmm... in our system, we receive a bytestream (threaded, which puts them in a memory-block) & then have a loop in another thread which gets the current mem-block and decomposes the data into the distict classes,
and than an 'execute' function is called..
I don't think we could parellelize this, because the order in which the events are executed is extremely important (think of a create-entity-packet, followed by a update-entity-value-packet, you really don't want to change the order because you would update a non-existing entity).
We don't have any performance problems with this approach (running
live with 16player servers, and internally tested with 32 player servers).
Even if you would decompose them at the same time, you'd probably need a sync-object, and make sure they are handle in the correct order.
and than an 'execute' function is called..
I don't think we could parellelize this, because the order in which the events are executed is extremely important (think of a create-entity-packet, followed by a update-entity-value-packet, you really don't want to change the order because you would update a non-existing entity).
We don't have any performance problems with this approach (running
live with 16player servers, and internally tested with 32 player servers).
Even if you would decompose them at the same time, you'd probably need a sync-object, and make sure they are handle in the correct order.
Quote:Original post by lordcorm
Now, what im stuck on, is Auto decomposition.
Since my server is Multithreaded and uses Async Socket calls, I look at where some potential bottlenecks could be (other then a slow ass connection), and one i found (or have notices in the past with other emulators ive worked on), is decomposition of packets in a multi-threaded enviornment.
The process known as serialization is rarely a bottleneck. There is also no need for the process to be thread-aware.
Through your network you receive data at 10 megabytes per second. Serialization needs to memcpy each byte from buffer into your structure. RAM has throughput of over 1 gigabyte per second. As such, at maximum network capacity, this will take 1% of total CPU time at most.
If you want a flexible system that allows you to define your various data structures, and then marshal/demarshal between structures and blocks of bytes, you could read my article on generated marshaling code.
I really like this topic, so here's a response of me 'thinking outloud' on the matter with example code.
Lately, I've been considering weakly typed systems to handle network data. I look at the conventional means, such as declaring your packet formats in struts/classes and then pack/unpack, marshal/demarshal, etc... them into different typed objects and think about the different ways of doing this in a loose format that would allow for easy change and modifications. The reason for this is to offer a framework of prototyping out a project and then being able to easily modify it to see just what works best before committing to a hard design. The immediate trade off of such a design is additional overhead and loss of the strong typed nature of C++, but in return you get a lot of flexibility and ease of use design wise.
What follows now are 3 examples that sum up my line of thinking. The comments are the observations on the idea.
Example 1 - Traditional (The way this packet is built does not matter, this is shown as the simplest form possible)
Example 2 - Auto Decomposition, but traditional build
Example 3 - Auto Decomposition, Auto Build
Example 3 (Continued) (Support code)
Example 3 Continued (cEntity header)
Ok so the code is not perfect, I literally wrote it (everything but cEntity/cStreamBuilder/cStreamReader) while I was typing this post, but I think you can see the idea. If we compromise the extra memory and processing, yet leave the network data that is sent untouched, I think an auto packet build / decomposition system could be realized through the last example. Adding new packets is trivial and does not affect anything else if you were to load the formats from a file.
Further more, the data that is in the packets is only in the packets and does not affect the objects that originally contained the data. This is important if you have an object that has X fields and you only should send some N <= X - 1 fields of that object.
I will play with this system some more after writing it for this post, I've been trying to come up with a simple to use system like this, but this thread really got my mind going. Comments?
Lately, I've been considering weakly typed systems to handle network data. I look at the conventional means, such as declaring your packet formats in struts/classes and then pack/unpack, marshal/demarshal, etc... them into different typed objects and think about the different ways of doing this in a loose format that would allow for easy change and modifications. The reason for this is to offer a framework of prototyping out a project and then being able to easily modify it to see just what works best before committing to a hard design. The immediate trade off of such a design is additional overhead and loss of the strong typed nature of C++, but in return you get a lot of flexibility and ease of use design wise.
What follows now are 3 examples that sum up my line of thinking. The comments are the observations on the idea.
Example 1 - Traditional (The way this packet is built does not matter, this is shown as the simplest form possible)
// "Traditional" method. Each packet opcode has its own format. This approach offers the // least amount of packet size overhead at the expense of more complicated parsing logic.// Also, the downside is, if the packet format were to change, then a number of changes// are required to keep the packing/unpacking in sync. Imagine if a DWORD were to be removed// above and not from below.{ cStreamBuilder builder; builder.Append<WORD>(0); // size place holder builder.Append<WORD>(1); // opcode builder.Append<DWORD>(10); builder.Append<DWORD>(100); builder.Append<DWORD>(1000); builder.Append<DWORD>(10000); builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize()); WORD size = reader.Read<WORD>(); WORD opcode = reader.Read<WORD>(); if(opcode == 1) { DWORD dw1 = reader.Read<DWORD>(); DWORD dw2 = reader.Read<DWORD>(); DWORD dw3 = reader.Read<DWORD>(); DWORD dw4 = reader.Read<DWORD>(); }}
Example 2 - Auto Decomposition, but traditional build
// "Theoretical Method 1". One packet format for all opcodes. Trivial to process, but the // expense is an extra byte for each field in the packet + 1 extra byte. If N fields exist // in the packet to parse, packet size has an extra N bytes + 1 of overhead. However, any // changes made in the packet format do not affect the parsing or packing logic.{ cStreamBuilder builder; builder.Append<WORD>(0); // size place holder builder.Append<WORD>(1); // opcode builder.Append<BYTE>(4); builder.Append<DWORD>(100); // size -> data format builder.Append<BYTE>(2); builder.Append<WORD>(10); // size -> data format builder.Append<BYTE>(1); builder.Append<BYTE>(1); // size -> data format builder.Append<BYTE>(0); // no more data builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize()); WORD size = reader.Read<WORD>(); WORD opcode = reader.Read<WORD>(); BYTE data[255] = {0}; // Let's only allow 255 bytes of data to exist in any field BYTE count = 0; int index = 0; do { count = reader.Read<BYTE>(); reader.ReadArray(data, count); // Based on the opcode, process data / count in accordance // to the packet format. Perhaps a function that is called // passing the data along to an object to be stored. Here is // one example, not optimized or anything just to show a // possibility. if(opcode == 1) { // void SomeClass::AddField(int index, LPBYTE data, BYTE count); AddField(index++, data, count); } } while(count);}
Example 3 - Auto Decomposition, Auto Build
// "Theoretical Method 2". This method combines the two previous methods, but// shifts where the overhead occurs as well as what changes during a format change.// The packet format itself only has to change in one place. Fields of the format// naturally have to be changed as needed where they are used. The overhead added// is added in terms of overall processing done and memory consumption, but no// additional packet overhead is added.SetupPacketFields();cStreamBuilder builder;{ cEntity sourcedata; sourcedata.SetDecimal("Gold", 1000.35); sourcedata.SetString("Name", "Drew"); sourcedata.SetNumber("NameLength", 4); sourcedata.SetNumber("HP", 200); builder = Pack(1, sourcedata); // Build a packet from a data object cStreamReader reader(builder.GetStreamData(), builder.GetStreamSize()); WORD size = reader.Read<WORD>(); WORD opcode = reader.Read<WORD>(); cEntity thisObj; Unpack(opcode, reader, thisObj); // Based on the opcode, process the 'packet' object. Can use the // GetNumber, GetString, and GetDecimal functions to retrieve data // and then Set_XX functions to add data and even pass the data on // to other functions as needed. Here is one example, not optimized // or anything just to show a possibility. if(opcode == 1) { std::string name = thisObj.GetString("Name"); DWORD hp = thisObj.GetNumber("HP"); double gold = thisObj.GetDecimal("Gold"); printf("[%s][%i hp][%.0f gold]\n", name.c_str(), hp, gold); }}
Example 3 (Continued) (Support code)
struct tPacketField{ BYTE type; BYTE size; std::string name;};std::map<WORD, std::vector<tPacketField> > packet_formats;void SetupPacketFields(){ // Hard coded here, but imagine loading from files/database tPacketField format1; format1.type = 0; // 0 - number format1.size = 4; // number types can be 1-4 bytes format1.name = "HP"; tPacketField format2; format2.type = 1; // 1 - decimal format2.size = 8; // all decimal types are doubles, 8 bytes format2.name = "Gold"; tPacketField format3; format3.type = 0; // 0 - number format3.size = 1; // number types can be 1-4 bytes format3.name = "NameLength"; tPacketField format4; format4.type = 2; // 2 - string format4.size = 0; // Size of 0 means use the previous entry's data, otherwise the # of bytes to read format4.name = "Name"; packet_formats[1].push_back(format1); packet_formats[1].push_back(format2); packet_formats[1].push_back(format3); packet_formats[1].push_back(format4);}void Unpack(const WORD opcode, cStreamReader & reader, cEntity & object){ // This code would be the same unpack code for all opcodes std::map<WORD, std::vector<tPacketField> >::iterator itr = packet_formats.find(opcode); if(itr != packet_formats.end()) { std::vector<tPacketField> & formatVec = itr->second; const size_t length = formatVec.size(); for(size_t x = 0; x < length; ++x) { tPacketField & field = formatVec[x]; // Number if(field.type == 0) { long data = 0; reader.ReadArray<BYTE>((LPBYTE)&data, field.size); // By program convention a # is 1-4 bytes only object.SetNumber(field.name, data); } // Decimal else if(field.type == 1) { double data = reader.Read<double>(); // By program convention a decimal is only a double, never a float object.SetDecimal(field.name, data); } // String else if(field.type == 2) { char data[256] = {0}; if(field.size) { reader.ReadArray(data, field.size); } else { long size = object.GetNumber(formatVec[x - 1].name); reader.ReadArray(data, size); } object.SetString(field.name, data); } } }}cStreamBuilder Pack(const WORD opcode, cEntity & sourcedata){ cStreamBuilder builder; builder.Append<WORD>(0); // size place holder builder.Append<WORD>(opcode); // opcode std::map<WORD, std::vector<tPacketField> >::iterator itr = packet_formats.find(opcode); // This code would be the same pack code for all opcodes if(itr != packet_formats.end()) { std::vector<tPacketField> & formatVec = itr->second; const size_t length = formatVec.size(); for(size_t x = 0; x < length; ++x) { tPacketField & field = formatVec[x]; // Number if(field.type == 0) { long data = sourcedata.GetNumber(field.name); builder.AppendArray<BYTE>((LPBYTE)&data, field.size); } // Decimal else if(field.type == 1) { double data = sourcedata.GetDecimal(field.name); builder.Append<double>(data); } // String else if(field.type == 2) { // Since we are mixing std::string with a C string format, we have a bit of extra work to take care of std::string data = sourcedata.GetString(field.name); if(field.size) { char * strdata = new char[field.size]; memset(strdata, 0, field.size); int count = 0; if(field.size > data.size()) count = data.size(); // make sure we don't read more data than there is else count = field.size; // we have more data than we need, truncate memcpy(strdata, data.c_str(), count); // Just copy the data, '\0' is handled by other end builder.AppendArray<const char>(strdata, field.size); } else { long size = sourcedata.GetNumber(formatVec[x - 1].name); // If everything is used correctly, this will be data.size() builder.AppendArray<const char>(data.c_str(), size); } } } } builder.Overwrite<WORD>(0, builder.GetStreamSize() - 4); // Set the final packet size return builder;}
Example 3 Continued (cEntity header)
class cEntity{private: CRITICAL_SECTION cs; std::map<std::string, std::string> classStrings; std::map<std::string, long> classNumbers; std::map<std::string, double> classDecimals;public: cEntity(); ~cEntity(); // Clears all associated numbers with the class void ClearNumbers(); // Clears all decimals numbers with the class void ClearDecimals(); // Clears all associated strings with the class void ClearStrings(); // Sets a key data pair. If the key exists, its previous contents are overwritten with the new data void SetNumber(const std::string & key, long data); // Removes the entry with the specific key. Returns true if the key was successfully // removed or false if the key was not found bool RemoveNumber(const std::string & key); // Modifies an existing entry's data. If the key is not found a const char * std::exception is thrown long ModifyNumber(const std::string & key, long data); // Returns the number data associated with the key of name. Throws a const char * std::exception if the // name does not exist. long GetNumber(const std::string & key); // Sets a key data pair. If the key exists, its previous contents are overwritten with the new data void SetDecimal(const std::string & key, double data); // Removes the entry with the specific key. Returns true if the key was successfully // removed or false if the key was not found bool RemoveDecimal(const std::string & key); // Modifies an existing entry's data. If the key is not found a const char * std::exception is thrown double ModifyDecimal(const std::string & key, double data); // Returns the decimal data associated with the key of name. Throws a const char * std::exception if the // name does not exist. double GetDecimal(const std::string & key); // Sets a key data pair. If the key exists, its previous contents are overwritten with the new data void SetString(const std::string & key, std::string data); // Removes the entry with the specific key. Returns true if the key was successfully // removed or false if the key was not found bool RemoveString(const std::string & key); // Returns the string data associated with the key of name. Throws a std::string std::exception if the // name does not exist. std::string GetString(const std::string & key); // Saves the entity's data to a file. Any entries prefixed with TMP_ are not saved. // If the file cannot be saved, false is returned, otherwise true indicates success. bool Save(const std::string & filename); // Loads data from a file. Any entries prefixed with TMP_ are not loaded. Returns // false if the file could not be loaded or true on success. bool Load(const std::string & filename);};
Ok so the code is not perfect, I literally wrote it (everything but cEntity/cStreamBuilder/cStreamReader) while I was typing this post, but I think you can see the idea. If we compromise the extra memory and processing, yet leave the network data that is sent untouched, I think an auto packet build / decomposition system could be realized through the last example. Adding new packets is trivial and does not affect anything else if you were to load the formats from a file.
Further more, the data that is in the packets is only in the packets and does not affect the objects that originally contained the data. This is important if you have an object that has X fields and you only should send some N <= X - 1 fields of that object.
I will play with this system some more after writing it for this post, I've been trying to come up with a simple to use system like this, but this thread really got my mind going. Comments?
Your system looks like a typical manual marshaling system. That starts becoming unwieldy when the number of packets grows, and the really bad problem starts happening when the pack and unpack functions go out of sync. Finding those problems need really good unit test coverage.
Meanwhile, the visit/visitor pattern used by my previously documented system can be used manually, if you don't want to use an interface description language of some sort. And because there is only one visit function per data structure, there is no chance of it getting out of sync.
In brief, you write a Visit() function for each data structure:
To save the data out, you pass in a "SaveOutVisitor" instance. To read data in, you pass in a "SaveInVisitor" instance. To print to XML, you pass in a "PrintToXML" visitor. To build a property sheet GUI, you pass in a "BuildPropertySheetGUI" visitor. The cool thing is that each of the visitors just need to implement the Begin/Visit/End protocol, and then they can do what they want with the data structure!
If you generate the Visit() function code from an IDL, then you can easily change your mind about the layout, and the rest of the project gets updated (that's what my perl script does). If you want to support by-name field access, similar to your named field container, then that can easily be done using a NamedFieldAccessor visitor, too:
With a modern optimizing compiler, the template specialization will make sure that most field accesses are optimized out, and this access is about as efficient as a map or hash table (more optimal for some cases, less for others).
Meanwhile, the visit/visitor pattern used by my previously documented system can be used manually, if you don't want to use an interface description language of some sort. And because there is only one visit function per data structure, there is no chance of it getting out of sync.
In brief, you write a Visit() function for each data structure:
struct MyStruct { bool flag; double variable; std::string name; std::list<OtherStruct> list;};template<typename Visitor> bool Visit(MyStruct const &ms, Visitor &v) { return v.Begin(ms, "MyStruct") && v.Visit(ms.flag, "flag") && v.Visit(ms.variable, "variable", -10, 10, 1024) && // min, max and quanta v.Visit(ms.name, "name") && v.Visit(ms.list, "list") && v.End(ms);}
To save the data out, you pass in a "SaveOutVisitor" instance. To read data in, you pass in a "SaveInVisitor" instance. To print to XML, you pass in a "PrintToXML" visitor. To build a property sheet GUI, you pass in a "BuildPropertySheetGUI" visitor. The cool thing is that each of the visitors just need to implement the Begin/Visit/End protocol, and then they can do what they want with the data structure!
If you generate the Visit() function code from an IDL, then you can easily change your mind about the layout, and the rest of the project gets updated (that's what my perl script does). If you want to support by-name field access, similar to your named field container, then that can easily be done using a NamedFieldAccessor visitor, too:
template<typename FieldType>class NamedFieldVisitor { public: NamedFieldVisitor(FieldType &r, char const *name) : result_(r), name_(n) { } FieldType &result_; std::string name_; template<typename Struct> bool operator()(Struct const &s) { return !Visit(s, *this); } template<typename Struct> bool Begin(Struct const &s, char const *n) { return true; } template<typename Type> bool Visit(Type const &t, char const *name) { return true; } template<typename Type> bool Visit(Type const &t, char const *name, Type a, Type b, int c) { return true; } template<> bool Visit(FieldType const &t, char const *name) { if (name_ == name) { result_ = t; return false; } return true; } template<> bool Visit(FieldType const &t, char const *name, FieldType a, FieldType b, int c) { if (name_ == name) { result_ = t; return false; } return true; } template<typename Struct> bool End() { return true; }}; MyStruct ms; double d; NamedFieldVisitor nfv(d, "variable"); if (nfv.Visit(ms)) { printf("The variable is: %f\n", d); }
With a modern optimizing compiler, the template specialization will make sure that most field accesses are optimized out, and this access is about as efficient as a map or hash table (more optimal for some cases, less for others).
Speaking off out of sync, and speed, (this is a bit off topic) do you think it would be smart to setup my server so it uses an AsyncReceiveFrom, then have it so when it receives the packet, it will find the corresponding session and then execute a ThreadPool task for that session? (im using C#)
Or should i got thread per-client?
As for the 'getting out of sync', do you think it would be smart to process all my packets and push them in a queue, then before the ThreadPool task is done, Send all the packets at the same time?
Or should i link it to my EventClock, and have all queued packets in each Session sent every 5 or 10 ticks?
Thanks
Or should i got thread per-client?
As for the 'getting out of sync', do you think it would be smart to process all my packets and push them in a queue, then before the ThreadPool task is done, Send all the packets at the same time?
Or should i link it to my EventClock, and have all queued packets in each Session sent every 5 or 10 ticks?
Thanks
Thread-per-client is never the right choice, IMO.
A thread pool plus asynchronous I/O is usually well performing on Windows.
Generally, you will defer packet sending, and send at a fixed rate (say, every 50 ms or every 100 ms). Often, you will send a packet even if there's nothing to send, to keep things like acknowledge and timer sync flowing.
A thread pool plus asynchronous I/O is usually well performing on Windows.
Generally, you will defer packet sending, and send at a fixed rate (say, every 50 ms or every 100 ms). Often, you will send a packet even if there's nothing to send, to keep things like acknowledge and timer sync flowing.
Ok, so i should make a method in my Server class called something like, "ProcessSessions()", that will iterate through my Session Dictionary and call SendQueuedMessages().
Then when my server starts-up, i should add ProcessSessions to the EventClock?
Then when my server starts-up, i should add ProcessSessions to the EventClock?
That would be one way of doing it.
If you're sending entity updates on a timer, you might want to queue not the message itself, but the fact that you want to update entity X. Then when the process time comes, you generate the entity update at that point; this will make sure you send the freshest data you have available at the time, instead of having an update message sit in the queue and wait.
If you're sending entity updates on a timer, you might want to queue not the message itself, but the fact that you want to update entity X. Then when the process time comes, you generate the entity update at that point; this will make sure you send the freshest data you have available at the time, instead of having an update message sit in the queue and wait.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement