• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
tufflax

How Common are Data Errors in TCP?

10 posts in this topic

Hi!

How common are data corruption/errors in TCP? I mean errors that slip by the correction facilities.

I had my game server running 30 minutes, during that time had 2 clients sending about 150 KB of data to the server. After about 30 minutes, my server crashed because it read an illegal value (my server does not yet deal well with illegal client data). The server ran on the same computer as the clients, and although it is possible that I have made a mistake, I find it to be a bad explaination, because for those 30 minutes basically the same messages were sent over and over. It has happened 3 times in the last 2 days. So it's kind of rare, too rare for me to think that it is a coding mistake, yet common enough to make me wonder what it can be.

I was under the impression that TCP is very reliable, especially when just sending to 127.0.0.1.

Do you have any ideas of what might be wrong? Edited by tufflax
0

Share this post


Link to post
Share on other sites
You say you dont handle illegal client data do you know if anything else could possibly be connecting to it and sending some data your not expecting? What port are you using?
0

Share this post


Link to post
Share on other sites
Think of it this way.

Unaugmented TCP is basically what is used to download .EXE files and such from the web.

When was the last time you downloaded something and discovered it was corrupted? When was the last time you found a spelling mistake on a web page that magically went away when you refreshed, because it was actually a TCP transmission error?


TCP doesn't corrupt your data.
0

Share this post


Link to post
Share on other sites
[quote name='jeff8j' timestamp='1308860314' post='4826979']
You say you dont handle illegal client data do you know if anything else could possibly be connecting to it and sending some data your not expecting? What port are you using?
[/quote]

Good point! Maybe that's what's happening. I'll look into this. I was using port 2345 or 1300 something like that. Just typing 4-digit numbers... Although I'm beind a firewall... Where does standard ports that might be open end?


[quote name='ApochPiQ' timestamp='1308861358' post='4826991']
Think of it this way.

Unaugmented TCP is basically what is used to download .EXE files and such from the web.

When was the last time you downloaded something and discovered it was corrupted? When was the last time you found a spelling mistake on a web page that magically went away when you refreshed, because it was actually a TCP transmission error?


TCP doesn't corrupt your data.
[/quote]

Yeah that's what I thought...
0

Share this post


Link to post
Share on other sites
[quote name='tufflax' timestamp='1308858710' post='4826968']
Hi!

How common is data corruption/errors in TCP? I mean errors that slip by the correction facilities.

I had my game server running 30 minutes, during that time had 2 clients sending about 150 KB of data to the server. After about 30 minutes, my server crashed because it read an illegal value (my server does not yet deal well with illegal client data). The server ran on the same computer as the clients, and although it is possible that I have made a mistake, I find it to be a bad explaination, because for those 30 minutes basically the same messages were sent over and over. It has happened 3 times in the last 2 days. So it's kind of rare, too rare for me to think that it is a coding mistake, yet common enough to make me wonder what it can be.

I was under the impression that TCP was very reliable, especially when just sending to 127.0.0.1.

Do you have any ideas of what might be wrong?
[/quote]

TCP data basically doesn't get corrupted. Each little segment is protected with a separate CRC, as well as generally by physical-layer systems (depending on specifics). Consider: When verifying md5sums of downloaded files, I've never had a download fail, even if the download is a 4 GB ISO file.

My guess is that your client or server code is wrong, or that someone else connected to your server and sent it garbage. For example, a memory corruption bug could cause random data to get corrupted.
0

Share this post


Link to post
Share on other sites
[quote name='tufflax' timestamp='1308861642' post='4826994']
[quote name='jeff8j' timestamp='1308860314' post='4826979']
You say you dont handle illegal client data do you know if anything else could possibly be connecting to it and sending some data your not expecting? What port are you using?
[/quote]

Good point! Maybe that's what's happening. I'll look into this. I was using port 2345 or 1300 something like that. Just typing 4-digit numbers... Although I-m beind a firewall... Were does standard ports that might be open end?

[quote name='ApochPiQ' timestamp='1308861358' post='4826991']
Think of it this way.

Unaugmented TCP is basically what is used to download .EXE files and such from the web.

When was the last time you downloaded something and discovered it was corrupted? When was the last time you found a spelling mistake on a web page that magically went away when you refreshed, because it was actually a TCP transmission error?


TCP doesn't corrupt your data.
[/quote]

Yeah that's what I thought...
[/quote]
0

Share this post


Link to post
Hm, just tried again; I ran the server for about 20 minutes before the same problem occurred again. I was keeping track of all the connected clients, no new one ever connects. The message is from one of my own clients. But I'm using asserts in the client to make sure the client never sends illegal data. Moreover, I'm sending the exact same message about 30000 times or so before the problem occurs. I just connected the client and let it sit there sending and receiving updates. This is starting to annoy me. I find it totally believable that I have made a mistake, I just find it strange that it just happens all of a sudden after I have done the exact same thing ten thousand times before.

Any ideas?

Some code (in Clojure, but using Java socket channels. If you know Java and are curious you can probably figure out what I'm doing). The error occurs in this first file when I'm reading from the channel, in this piece (I get something negative out from getShort on the buffer:

Edit: Darn the indentation is messed up...

[code]
(zero? (.remaining rbuf)) (let [len (.. rbuf (flip) (getShort))
test (if (neg? len)
(println (.getInetAddress (.socket chan))))
new-netmap (assoc netmap :read-tmp (ByteBuffer/allocate len))]
[/code]

which is part of this next file

Shared networking code:
[code]
;;;; Remember the following while working with this code:
;;;;
;;;; * You must use channels and buffers when using select, not
;;;; the socket streams.
;;;;
;;;; * When using select, you have to call functions in this order:
;;;; select
;;;; selectedKeys
;;;; do stuff with keys in the set
;;;; remove keys from the key set
;;;;
;;;; You cannot rely only on the flags of selection keys.
;;;;
;;;; * When writing to buffers, first write, then flip, then read, then clear.
;;;;
;;;; * Newly created buffers have (.remaining buffer) = their length,
;;;; so if you want to create a buffer used for writing, it can appear to
;;;; have data in it from the beginning. Use (doto new-buff (.flip)).


(ns game.networking
(:import java.nio.ByteBuffer
(java.io ObjectOutputStream ByteArrayOutputStream
ObjectInputStream ByteArrayInputStream
IOException)))



(defn- to-byte-array
"Turns obj into a byte[]."
[obj]
(with-open [baos (ByteArrayOutputStream.)
oos (ObjectOutputStream. baos)]
(.writeObject oos obj)
(.toByteArray baos)))

(defn- from-byte-array
"Turns the byte[] ba into an object."
[ba]
(with-open [bais (ByteArrayInputStream. ba)
ois (ObjectInputStream. bais)]
(.readObject ois)))



(defn- channel-operation
"Calls op (a read/write fn) on chan and returns
true if it filled/emptied the buffer."
[op chan buffer]
(op chan buffer)
(zero? (.remaining buffer)))


(defn- read-into-buffer
"Reads from chan into buffer. If the buffer becomes full,
returns true, else false."
[chan buffer]
(channel-operation (memfn read buf) chan buffer))


(defn- write-from-buffer
"Writes from buffer into chan. If the whole buffer was written,
returns true, else false."
[chan buffer]
(channel-operation (memfn write buf) chan buffer))



(defn read-from-channel
"Reads data from a channel found in netmap.
The netmap needs to contain chan, rbuf and read-tmp.

If netmap has been changed as a result of reading, calls
new-netmap-f with the new netmap as argument.

If a message has been completely read, calls new-msg-f
on the message.


Operates in the following manner:

1) Checks if there is an incomplete message in read-tmp and,
if so, tries to complete it.

2) If not, checks if the length of the next message has
been read, and if so, creates a new ByteBuffer of the
correct length and assocs it with read-tmp in netmap.

3) Otherwise, try to read the length of the next
message into rbuf."
[{:keys [chan rbuf read-tmp] :as netmap} new-netmap-f new-msg-f]
(cond
; incomplete msg: read more
read-tmp (if (read-into-buffer chan read-tmp)
(let [msg (from-byte-array (.array read-tmp))
new-netmap (assoc netmap :read-tmp nil)]
(new-netmap-f new-netmap)
(new-msg-f msg new-netmap)
(recur new-netmap new-netmap-f new-msg-f)))
; have read length: prepare buffer for msg
(zero? (.remaining rbuf)) (let [len (.. rbuf (flip) (getShort))
test (if (neg? len)
(println (.getInetAddress (.socket chan))))
new-netmap (assoc netmap :read-tmp (ByteBuffer/allocate len))]
(.clear rbuf)
(new-netmap-f new-netmap)
(recur new-netmap new-netmap-f new-msg-f))
; new msg: read 2 bytes from chan into rbuf (the length of the next message)
; (tries to fill rbuf, so if it can only read 1 byte,
; will come back to here later)
:else (if (read-into-buffer chan rbuf) (recur netmap new-netmap-f new-msg-f))))




(defn write-to-channel
"Writes data to a channel, found in netmap. The
netmap needs to contain chan, wbuf, write-tmp and write-q.

If netmap has been changed as a result of writing, calls
new-netmap-f with the new netmap as argument."
[{:keys [chan wbuf write-tmp write-q] :as netmap} new-netmap-f]
(cond
; length not written: write it
(pos? (.remaining wbuf)) (if (write-from-buffer chan wbuf)
(recur netmap new-netmap-f))
; msg not written: write it
write-tmp (if (write-from-buffer chan write-tmp)
(let [new-netmap (assoc netmap :write-tmp nil)]
(new-netmap-f new-netmap)
(recur new-netmap new-netmap-f)))
; no incomplete msg: see if there's one in write-q
write-q (let [test (assert (= (class "a") (class (last write-q))))
msg (to-byte-array (last write-q))
len (alength msg)]
(.clear wbuf)
(assert (pos? (short len)))
(.putShort wbuf (short len))
(.flip wbuf)
(let [new-netmap (assoc netmap
:write-tmp (ByteBuffer/wrap msg)
:write-q (butlast write-q))]
(new-netmap-f new-netmap)
(recur new-netmap new-netmap-f)))))
[/code]

Module on the client side. Everything is sent with send-msg, and every msg is a String.
[code]
(ns game.networking.client
(:import java.net.InetSocketAddress
java.nio.channels.SocketChannel
java.nio.ByteBuffer)
(:require [game.networking :as net])
(:use game.utility))


(def- netmap (atom nil))


(def- msgs (atom nil))

(defn get-msgs []
(let [ms (reverse @msgs)]
(reset! msgs nil)
ms))



(defn send-msg [msg]
(let [new-netmap (update-in @netmap [:write-q] conj msg)]
(reset! netmap new-netmap)))


; make private later!
(defn update-netmap [nm]
(reset! netmap nm))

; make private later!
(defn handle-msg [msg _]
(swap! msgs conj (read-string msg)))


; make private later!
(defn create-netmap [ip port]
{:chan (let [sc (doto (SocketChannel/open (InetSocketAddress. ip port))
(.configureBlocking false))]
(.. sc (socket) (setTcpNoDelay true))
sc)
:write-q nil
:read-tmp nil
:write-tmp nil
:rbuf (ByteBuffer/allocate 2)
:wbuf (doto (ByteBuffer/allocate 2) (.flip))})


(defn connect-to-server [ip port]
(let [nm (create-netmap ip port)]
(update-netmap nm)))


(defn update-network []
(net/write-to-channel @netmap update-netmap)
(net/read-from-channel @netmap update-netmap handle-msg))
[/code]


A networking module on the server side:
[code](ns game.networking.server
(:import java.nio.ByteBuffer
java.net.InetSocketAddress
(java.nio.channels Selector ServerSocketChannel SocketChannel
SelectionKey)
java.io.IOException)
(:use game.utility
clojure.contrib.repl-utils)
(:require [game.networking :as net]))



; ska vara private sen!
(def- clients (atom {}))
; ska vara private sen!
(def- id-counter (atom 0))
; ska vara private sen!
(def- msgs (atom nil))

(def- connected-addresses (atom nil))


(def- disconnected-clients (atom nil))

; debug
(def- bugg (atom []))



(defn get-msgs []
(let [ms (reverse @msgs)]
(reset! msgs nil)
ms))



(defn send-msg [ids msg]
(dorun (map (fn [id]
(let [new-clients (update-in @clients [id :write-q] conj msg)]
(reset! clients new-clients)))
ids)))



(defn create-server
"Creates a non-blocking TCP server that listens for incoming
connections and, when a client connects, calls f on the new
client's SelectionKey.

The server uses select to check for incoming connections and
to check for incoming data on established connections. Returns
a function that should be called periodically to call select.
The returned function returns the set of selected SelectionKeys."
[port f]
(let [selector (Selector/open)
ssc (doto (ServerSocketChannel/open) (.configureBlocking false))
ss (doto (.socket ssc) (.bind (InetSocketAddress. port)))
selection-key (.register ssc selector SelectionKey/OP_ACCEPT)]
(fn []
(swap! bugg conj (count (.keys selector)))
(.select selector)
(let [keys (.selectedKeys selector)]
; One has to check if the selected keys contains a key, not just the keys flags
(if (and (.contains keys selection-key) (.isAcceptable selection-key))
(let [sc (doto (.accept ssc) (.configureBlocking false))
client-key (doto (.register sc selector (bit-or SelectionKey/OP_READ
SelectionKey/OP_WRITE))
(.attach (swap! id-counter inc)))]
(show (.socket sc))
(flush)
(swap! connected-addresses conj (.getInetAddress (.socket sc)))
(.. sc (socket) (setTcpNoDelay true))
(f client-key)
(.remove keys selection-key)))
keys))))


(defn- new-client
"Takes a SelectionKey as input and returns a map
of various player-related things."
[selection-key]
{:s-key selection-key
:chan (.channel selection-key)
:write-q nil
:read-tmp nil
:write-tmp nil
:rbuf (ByteBuffer/allocate 2)
:wbuf (doto (ByteBuffer/allocate 2) (.flip))
:id (.attachment selection-key)})


(defn add-new-client
"Adds a new client to players."
[selection-key]
(let [client (new-client selection-key)]
(swap! clients assoc (.attachment selection-key) client)))



(defn- collect-new-msg [msg netmap]
(swap! msgs conj (conj (read-string msg) (netmap :id))))


(defn- update-client-netmap [{id :id :as netmap}]
(swap! clients assoc id netmap))


(defn- disconnect-client [{:keys [id chan]}]
(swap! disconnected-clients conj id)
(.close chan)
(swap! clients dissoc id))


(defn get-disconnected-clients []
(let [dcs @disconnected-clients]
(reset! disconnected-clients nil)
dcs))

(defn update-network
"Calls the fn server, which performs a select operation on the server,
and accepts new incoming connctions. Also reads messages from the clients,
and collect them, in the form of lists, in msgs."
[server]
(let [keys (server)]
(doseq [key keys]
(let [client (@clients (.attachment key))]
(try
(if (.isReadable key)
(net/read-from-channel client update-client-netmap collect-new-msg))
(if (and (client :write-q) (.isWritable key))
(net/write-to-channel client update-client-netmap))
(catch IOException e (disconnect-client client)))))
(.clear keys)))
[/code]
0

Share this post


Link to post
Share on other sites
[quote name='ApochPiQ' timestamp='1308861358' post='4826991']

TCP doesn't corrupt your data.
[/quote]

I believe google published an article on errors in their data centers. Data corruption is in the order of 1 per petabyte (maybe even messages) or so. It's absurdly low, but it's there. Only application-level checksums detected those.

It's also highly unlikely it happened here, there are way too many other more likely factors to consider.



However, nio used to have some obscure bugs and unexpected or undefined behavior, some even conflicting official documentation. My bet in this case would be that networking errors are caused by those. They were never fixed even in standard library and most ended up with WONTFIX. They also depend on individual platform stacks, so they are not universally reproducible. Problems are caused by inconsistent implementation of Berkeley socket API and incorrect interpretation of something or another, I forgot.


Ideally, you would go rummaging through Sun's bug tracker for nio related issues and try to ensure they don't occur in Clojure bindings. But that's rather tedious task. Mina (IIRC) project did a framework on top of nio which might contain some more insight into that.

I also seem to remember there being a race condition or synchronization bug with nio selector.

Beside that, there could be a lot wrong in Clojure bindings. The selectors work fine most of the time, but I do recall there being some obscure edge cases that simply aren't handled. I think even most Java examples are broken in this way.
0

Share this post


Link to post
Share on other sites
[quote name='tufflax' timestamp='1308871164' post='4827047']
I'm sending the exact same message about 30000 times or so
[code]
(zero? (.remaining rbuf)) (let [len (.. rbuf (flip) (getShort))
[/code]
[/quote]

Something to consider: The maximum number in a signed short is 32767. If you use a message sequence number or similar, and store it as a short, you would get an error after that value.

Sounds like you have a reproducible case, though. That's good! Print out the number of messages you have gotten every 100 messages or so, and check what the value is after crash. Then set a breakpoint after that number of messages, and re-run the case, so you can debug when the crash happens. Or, if you have access to VMWare Workstation, try using Replay Debugging.

You can also log all the data to a big file after receiving it, for later analysis. You may be able to then pipe that file back into the server to repeat the behavior that clients already had, to reproduce the crash faster so you can debug it.

I'm not that familiar with Clojure (been 20 years since I did Scheme :-) so I didn't take the time to read through all your code, sorry. Maybe someone else on the board?
0

Share this post


Link to post
Share on other sites
[quote name='Antheus' timestamp='1308872554' post='4827056']
[quote name='ApochPiQ' timestamp='1308861358' post='4826991']
TCP doesn't corrupt your data.
[/quote]

I believe google published an article on errors in their data centers. Data corruption is in the order of 1 per petabyte (maybe even messages) or so. It's absurdly low, but it's there. Only application-level checksums detected those.

It's also highly unlikely it happened here, there are way too many other more likely factors to consider.



However, nio used to have some obscure bugs and unexpected or undefined behavior, some even conflicting official documentation. My bet in this case would be that networking errors are caused by those. They were never fixed even in standard library and most ended up with WONTFIX. They also depend on individual platform stacks, so they are not universally reproducible. Problems are caused by inconsistent implementation of Berkeley socket API and incorrect interpretation of something or another, I forgot.


Ideally, you would go rummaging through Sun's bug tracker for nio related issues and try to ensure they don't occur in Clojure bindings. But that's rather tedious task. Mina (IIRC) project did a framework on top of nio which might contain some more insight into that.

I also seem to remember there being a race condition or synchronization bug with nio selector.

Beside that, there could be a lot wrong in Clojure bindings. The selectors work fine most of the time, but I do recall there being some obscure edge cases that simply aren't handled. I think even most Java examples are broken in this way.
[/quote]

Clojure does not use bindings. They are the exact same Java classes.

But OK, that sounds bad...
0

Share this post


Link to post
Share on other sites
[quote name='hplus0603' timestamp='1308872640' post='4827057']
[quote name='tufflax' timestamp='1308871164' post='4827047']
I'm sending the exact same message about 30000 times or so
[code]
(zero? (.remaining rbuf)) (let [len (.. rbuf (flip) (getShort))
[/code]
[/quote]

Something to consider: The maximum number in a signed short is 32767. If you use a message sequence number or similar, and store it as a short, you would get an error after that value.

Sounds like you have a reproducible case, though. That's good! Print out the number of messages you have gotten every 100 messages or so, and check what the value is after crash. Then set a breakpoint after that number of messages, and re-run the case, so you can debug when the crash happens. Or, if you have access to VMWare Workstation, try using Replay Debugging.

You can also log all the data to a big file after receiving it, for later analysis. You may be able to then pipe that file back into the server to repeat the behavior that clients already had, to reproduce the crash faster so you can debug it.

I'm not that familiar with Clojure (been 20 years since I did Scheme :-) so I didn't take the time to read through all your code, sorry. Maybe someone else on the board?
[/quote]

The number is the length of the messages, not a sequence number of some kind. If it was off I would notice it right away because I use a ByteArrayInputStream to read the messages back, and it would not work if I didn't give it a byte array with a whole object in it (the byte array is the next <length> bytes of the socketchannel). When I said it was the exact same messages over and over I didn't lie. :P
0

Share this post


Link to post
Share on other sites
Hm, now the server has run for an hour or so without running into the problem, the client has send over 200 000 messages to it. Not so reproducible after all...

But maybe it's a good idea for me to use a framework such as MINA anyway. Do any of you have any experience with it? Any tips?

Btw, Antheus, I have not been able to find any info on bugs in nio. It sounds strange that Sun (or Oracle?) would just not fix bugs in it. And Mina is built upon it, so it must be manageable somehow? You are making me paranoid. :P
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0