Jump to content
  • Advertisement
Sign in to follow this  
Xanather

Using Unicode with a TCP Stream

This topic is 2220 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm going to begin coding another C# game server for learning purposes soon (ive made servers before and each time i'm trying to re-implement it better and better). It will use TCP (its not a first person shooter and I do not want to waste time adding reliability on UDP for no point) and will also use the delimiter messaging system. This means all data sent through the network will be in Unicode format (using UTF-8 encoding).

There is a problem though, this time I am switching from using 1-byte-long-ANSII to Unicode. Unicode however can be more than 1 byte long (up to 5 bytes!) and since TCP is really a stream how will I know if I have received all Unicode characters in full? I might receive half a Unicode character, what happens if I try to decode those bytes that only consist half of a Unicode character? Would I have to either use ANSII, or, a different messaging system, or, some sort of special strategy?

All replies are appreciated. Thanks Xanather.

Share this post


Link to post
Share on other sites
Advertisement
The UTF8 format makes it possible to detect where a character starts, and once you have the start character, you know how long the character should be.
I would recommend using the length-data format, rather than the data-delimiter format, though. It makes everything easier IMO.

Share this post


Link to post
Share on other sites
Now when I think about it, for a TCP stream, a length-data format would probably be better and I wouldn't have to use up characters to detect delimiters or make sure to see if all Unicode bytes have arrived. Thanks. Edited by Xanather

Share this post


Link to post
Share on other sites
One quick question before I re attempt making another server. I think I have come up with a good idea with this length-data networking format which is to have a 32-bit integer (4-bytes long) to represent the length of the soon to arrive message and maybe one byte after this 32-bit integer representing the core type of the message (e.g. is it a unicode string, or is it a serialized object, or maybe something that is lower-level based).

So really it looks like this (what the bytes look like):

234:256:234:234:0
The first 4 bytes represents the 32-bit integer and the last byte (which is 0) represents the core message type. 0 in the last byte in this case represents a Unicode string message. What is your opinion on this strategy? I should probably be more confident in what I think of but sadly I have this thing where I have to learn things the RIGHT way, but really there is no right way in coding xD.

Although just a reply on your opinion of this (or anyone's opinion) on this strategy would be appreciated.

Thanks, Xanather.

Share this post


Link to post
Share on other sites
Just one data point for you: good-old 7-bit ASCII is a proper subset of the UTF-8 encoding of Unicode. In other words, an ASCII strings is a UTF-8 string. You can save the last byte of your message header, since it will always have the same value. Edited by Bregma

Share this post


Link to post
Share on other sites
Yes I am aware of that, thanks anyway though. Isn't it when you start using symbols other than the English language characters it starts moving from 1 byte to 2 bytes? Also what if I want to send serialized objects? How would you do this without the core-message-types/that 1 byte I was talking about in my previous post?

Share this post


Link to post
Share on other sites
I think 4 bytes for length-prefix isn't needed for a game. Sending a four-gigabyte stream of data (the max representable by 4 bytes) over any network will take a long time, longer than you want to wait for an individual game message.

I find myself using single bytes when I'm talking about messages (individual units of information,) and double bytes when talking about packets (the full data transmission unit, like a UDP packet, or or "network tick" packet.)

Another useful optimization is varint; values 0 through 127 are encoded as a single byte; higher values are encoded 7 bits at a time, with the highest bit set to 1. To decode, keep reading bytes, masking off the high bit, and shift by 7, until you get a byte with a clear high bit. You can also encode negative numbers this way by sending a "negate this" bit as the last value bit (thus packing 6 bits of numeric info into the last byte.)
For a stream over TCP (or file,) this is useful if you want to be able to go really large once in a blue moon, but don't want to spend the overhead for each and every little unit of data.

Share this post


Link to post
Share on other sites
Wow, thank you for that, I never knew that smile.png.

So your saying with UTF-8 encoding, when you use the English language (well ASCII characters) the last bit is constantly in the binary state of 1, as this is a full character itself? This really makes sense if this is true. And other non-ASCII characters may have several 7-bit data bytes with the last bit (8th bit) having a binary state of 0 (unless of course, its the last byte of the character indicating that you have all the data in order to decode)?

What I dont understand though is how you could pack 6-bits of numeric info into the last byte?

(I didnt type that too well... heh) Edited by Xanather

Share this post


Link to post
Share on other sites
It sounds like you're getting too complicated, send whatever you need and don't worry too much about payload size, you can optimize it later if needed. Optmization before profiling is usually wasted effort.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!