Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


Like
0Likes
Dislike

The Internet, A Summary Introduction to TCP/IP, and Losing Underwear

By Jered Wierzbicki | Published Sep 14 1999 05:57 AM in Multiplayer and Network Programming

data internet tcp protocol network protocols host tcp/ip address
If you find this article contains errors or problems rendering it unreadable (missing images or files, mangled code, improper text formatting, etc) please contact the editor so corrections can be made. Thank you for helping us improve this resource

CONTENTS

Relevant Background on the Internet


The TCP/IP Suite
What's a protocol and how do I use it?
How are protocols maintained by the Global Internet?
The TCP Protocol
The UDP Protocol
The IP Protocol
IP Addresses
The Domain Name System
The ICMP Protocol
The GGP Protocol
The ARP and RARP Protocols
Physical Protocols and Why we Don't Care

TCP/IP Programming

Win32 TCP/IP Support: WinSock
Initializing the WinSock Library
Creating Sockets
Binding and Connecting
Addressing and Name Service Support
Network and Host Byte-order
Sending and Receiving Data
How to get your sockets to listen()
Blockage and Synchronicity
Errors in WinSock
Cleaning up after yourself
What I didn't cover
Coda

Disclaimers


Relevant Background on the Internet

The history of the Internet is vast and vibrant, filled with exciting people, stunning breakthroughs and unbelievable events, intruige, imagination, and all of that great sort of thing. Tens of thousands of pages have been written on its development alone. Obviously, then, it is not my intention to sit and talk to you about the history of this marvel of technology, of this manefestation of our science. I'm just going to present you with enough trinkets to realize that all (well, most, anyway) of its arcane complexity is absolutely vital.

The United States first suggested a communications Internet as a tactical advantage in 1962. Communism was on the rise, tensions were high, and the increasing threat of nuclear warfare was widely perceived. The United States Air Force challenged a select group of R&D people to come up with a network that could withstand a nuclear attack. The network that they envisioned was called a Distributed Communications Network, or a "decentralized" network, which was a theretofore unheard of concept. The idea was simple. If one node was destroyed, the capacity to exchange data had to remain in place. Paul Baran, with the RAND corporation at the time, conceptualized such a network in full detail (FULL detail). Unfortunately, due to a rather breaucratic review of Baran's research, nothing happened until 1969. Finally the time was ripe: Research merged with funding and this with renewed government interest (on behalf of DARPA, Defense Advanced Research Projects Agency), resulting in the birth of the ARPANET.

The ARAPNET gradually grew and expanded in amazing ways not very important to us. It started with four hosts. Four years later, it had 40. With the invention of e-mail in '72, a new surge of data relied on the network. Incredible though it may seem, most of the data sent and received via ARPANET was e-mail at the time. Now keep in mind, in the mid-70's, TCP/IP also came along and fundamentally redefined the way that communications over the network were carried out. By the first few years of the 80's, there were hundreds of hosts. In 1988, there were over 51,000. In 1990, there were well, well over 300,000 hosts. Another milestone in '90: The government gave control of the network to the National Science Foundation (NSF), who subsequently released it to the public domain due to excessive backbone operating and upkeeping costs. Commercial business promptly moved in for the kill.

Well, to make a long story short, a group at the U of Minnesota came up with Gopher, an independent European innovator conceived developed HTML, along with the web browser, and the rest is history. Public recognition of the Internet spread like wildfire. Don't ask me who counts, but the size of the Internet has increased by over 25,000% from 1969 to the present.

To clear up a few misconceptions: The Internet is not one network. It is a set of networks linked by gateways. The term "internet" means any combination of networks in general: In fact, to avoid confusion with "The Internet", as we call it, corporations denote their internal internets "intranets." Gateways are devices that connect networks in an internet, as was just implied. Routers are devices that decide which way data is going to travel over a network to get from point A to point B. Incidentally, a subnet is a network in an internet. Go figure.

The TCP/IP Suite

"So where does TCP/IP come in," you wonder? To set the record straight, TCP (Transmission Control Protocol) and IP (Internet [general, not specific to the Internet] Protocol) are just two internet data transfer protocols. IP is the basis of all Internet communications. TCP is the underlying protocol that makes communications on the global Internet reliable. The TCP/IP suite, on the other hand, is a group which contains most of the essential internet-related protocols that you'll ever worry about. It is comprised of protocols in two basic layers. The first and foremost layer in the suite is the Network Layer, which contains the basic, underlying protocols that make Internet communications possible. The layer above this is the Application Layer, which contains high-level protocols (those with which the user must interact) designed to transmit commands specific to a certain task.

What's a protocol and how do I use it?

Loosely stated, a protocol is an established format for data transactions over a network. Network-level protocols in the TCP/IP family, namely DP, TCP, ICMP, IP, ARP, and RARP (those are the ones which you will ever even remotely have a chance of using as a game developer, in that order) are almost always implemented by the operating system, which is a blessing beyond all other blessings.

This means that when we're concerned with developing applications that use the TCP/IP protocol suite for Internet communications, we usually interface to its Network level protocols by calling upon the operating system or operating system extension API functions. Attempting to implement the entire Network-level TCP/IP suite is overkill, especially since it has already been done for just about every platform (including DOS) that is currently in widespread use. If you ever did decide to implement it, the information to do so is freely available...but it's a lot of work, and might I add a lot of unnecessary work.

How are protocols maintained by the Global Internet?

Protocols are only a small part of a large process of Internet standardization. Standardization and modernization is an entirely open, ongoing effort, carried out by way of memos called Requests for Comments (RFCs). When a proposed standard or effort that requires the attention of the entire professional Internet community is to be addressed, usually the ideas involved are published in the form of an RFC. Note that RFCs are a form of technical Internet intercommunication, and are usually not intended as documentation; generally, they either express respected and developed research that may or may not improve and update the network, or refine older standards. They are numbered in order of publication, starting with RFC 1, which was, incidentally enough, put out back in 1969. At the time of this writing, they approach 2500 in number [that should keep this "current" for long enough...]

In order to facilitate better understanding of the contents of Requests for Comments, they are classified into various groups. Informational RFCs are, obviously, informational, and Exprerimental RFCs are experimental (surprise surprise).

Standards Track RFCs are RFCs in various stages of standardization, ranging from Proposed Standard, to Draft Standard, to Internet Standard. In order to become a standard, an RFC must move along the full run of the standards "track". The "track" is a set of rigorous phases of scrutiny, testing, review, and refinement that any standard on the Internet must go through before it can really be called "standard." The Internet Standards Track Reference is a document containing the state of standardization of various standards.

Historic and Obsoleted RFCs are somewhat different, at least I like to think so; the historic RFC is one maintained for its historical interest, as if it ever had a practical one. Among these are included memos from the 60's or 70's discussing network meetings. Obsoleted RFCs contain information which has been outdated by newer developments. Usually, where an obsolete RFC is listed, the RFC which has obsoleted it is listed alongside it.

Incidentally, all RFCs of which I am aware are distributed on an unlimited basis, which means that you can pretty much find them anywhere. They're available freely. However, since there are absolutely no distribution restrictions whatsoever, you also have the right to rip people off for them (but only an ignoramus would buy what he can get in the same condition for free!) A good RFC archive is at http://info.internet...notes/rfc/files. You might want to refer to the RFCs for in-depth discussions on unusual standards. The information is all there.

By the way, I don't believe I mentioned it...RFC 2324 is my present favorite.

The TCP Protocol

TCP is crucial in reliable Internet communications. It does quite a bit. Foremost, it monitors what gets through to a remote host and what does not, and retransmits anything that doesn't. It also must perform other assorted tasks. Say that a packet is too large to fit into one IP packet: TCP will split it up into smaller packets. The TCP protocol uses the IP protocol to actually transmit data. Basically, the goal of TCP is to make sure that data gets from a place to another place in 100% intact condition. This "perfectness" is absolutely essential for the Internet to be worth-while at all: If one op-code in a binary is misplaced, the whole thing falls appart. One misplaced digit in a military data file could spell disaster.

Multiple conversations can be going on between two machines at the same time between more than one process on these machines. The ability to handle this scenario is called multiplexing, and TCP provides a set of "ports", uniquely identified via "port numbers" on a given host, through which various data are sent and received, in order to pull it off. There are assigned TCP port numbers for various application protocols. Beyond this, a user process may independently select ports for a conversation. This port information, together with the information that the IP protocol uses to identify an individual machine on the Internet, forms what is called a TCP socket, or alternatively, just a socket. All sockets are completely unique on the Internet, that is, a socket is on one and only one host and one and only one port on that host.

To use TCP, you must connect a socket on the local machine to a socket on the remote machine. This occurs in three steps: First, TCP sends over a connect message from the local socket. The remote machine acknowledges, the remote socket sending a response to the local socket. Then the socket on the local host again acknowledges. A TCP connection is then said to have been opened. This process of negotiating a connection is called handshaking. Specifically, the handshaking method that TCP uses is called a three-way handshake. Once a connection has been formed, TCP can transmit data. TCP communication is bidirectional, that is, both parties in a connection can send and receive data. A single TCP socket can participate in more than one connection at a time, and can have more than one remote socket connected to it. Because of the significance of establishing connections in TCP, it is refered to as a connection-oriented protocol.

TCP takes basic data that needs to be transfered and wraps a header around it. The header is complicated, but then again, so is TCP. Here's generally what it looks like:

TCP Header

<center><table border="3" cellpadding="0" cellspacing="0" width="50%"><tbody><tr><th>Item</th><td class="tblhdr">Size (bits)</td></tr><tr><td>Source Port</td><td>16</td></tr><tr><td>Destination</td><td>16</td></tr><tr><td>Sequence #</td><td>32</td></tr><tr><td>Ack. #</td><td>32</td></tr><tr><td>Data Offset</td><td>4</td></tr><tr><td>[Reserved]</td><td>6</td></tr><tr><td>Control Bits</td><td>6</td></tr><tr><td>Window</td><td>16</td></tr><tr><td>Checksum</td><td>16</td></tr><tr><td>Urgent Pointer</td><td>16</td></tr><tr><td>Options</td><td>Not fixed (multiple of 8)</td></tr><tr><td>Padding</td><td>Optional, only used if packet doesn't end on 32-bit boundary</td></tr></tbody></table></center>In general, you can expect to see TCP headers getting to about 22 octets (an octet is an 8-bit byte) on average. Despite the meager sound of it, this is a quite large and cumbersome header size. It is for this reason, the huge header, coupled with the reason that TCP is just fundamentally "the wrong thing" for a game, that TCP is rarely used in games. What I mean by "the wrong thing" is explained best by way of example...if you shoot someone in a deathmatch and a TCP packet is sent to indicate it, what's the point of retransmitting it ten times until it's in perfect order? The fellow you shot is already cold by the time it finally gets through.

The UDP Protocol

The User Datagram Protocol is another common protocol in the TCP/IP suite. The protocol was designed to provide an ad-hoc direct transportation mechanism for information without oversized TCP headers and the complexity associated with them. However, what UDP gains in elegance, it loses in reliability. Unlike TCP, UDP does not implement "quality assurance" with regard to the data that it transmits, in any way, shape, or form. UDP packets may be duplicated, out of order, or not received at all. On the other hand, UDP does cut back on the mammoth TCP header overhead, which adds up over time, especially if your TCP packets are so large and ill-designed as to consistently be split up.

The aspect of UDP that stems largely from its intended design as a generic, low-key data transport mechanism is that it is entirely connectionless. That is to say, one does not have to establish a connection to a remote host in order to transmit packets to it or receive packets from it under UDP. Packets are simply transmitted over the network. Although it may seem odd, this model is basically what allows UDP to remain as compact as it is.

The User Datagram Protocol tacks the following header onto the top of the data:

UDP Header

<center><table border="3" cellpadding="0" cellspacing="0" width="50%"><tbody><tr><td class="tblhdr">Item</td><td class="tblhdr">Size (bits)</td></tr><tr><td>Source Port</td><td>16</td></tr><tr><td>Destination</td><td>16</td></tr><tr><td>Length</td><td>16</td></tr><tr><td>Checksum</td><td>16</td></tr></tbody></table></center>
8 octets as opposed to 22+...it's quite a simplification indeed. Always bear in mind, however, that UDP is unreliable, and you never know how unreliable. Design carefully.

It should also be noted that in the context of UDP, port numbers have less significance on the Network level than do TCP port numbers. Rather, UDP port numbers are more significant towards an individual application. There are, however, established UDP protocol port numbers, as there are established TCP port numbers, for various application level protocols. Rule of thumb: Always, always check both the assigned port numbers and the commonly used port numbers before you attempt to use a port for personalized data transport, and even then only use a port towards the top of the accepted "user" port list. You will usually find links to assigned port information on the web alongside RFC listings.

The IP Protocol

IP is relied upon by just about everything. It's the cornerstone of all Internet communications. IP also is rather conceptually simple. It has exactly one goal and one dream in life, and that's to get packets from a source address to a destination address. IP transmits data in the form of packets called datagrams. Datagrams are composed of a header, which is 20 octets large, and a data area. Although IP doesn't give a rip about what's inside the data area, usually the data area will be composed of a TCP or UDP header followed by raw data when IP sees it. Thus, the IP header is just plopped on top of any other headers that might be present over the data. This is how Internet communications works in general; layers of protocols on the local host wrap their headers around the data and it is transmitted, then various layers peal the headers away from the data as it is processed by the remote host, and eventually a specific application gets the data.

IP Header

<center><table border="3" cellpadding="0" cellspacing="0" width="50%"><tbody><tr><td class="tblhdr">Item</td><td class="tblhdr">Size (bits)</td></tr><tr><td>Version Number</td><td>4</td></tr><tr><td>Header Length</td><td>4</td></tr><tr><td>Type of Service</td><td>8</td></tr><tr><td>Packet Length</td><td>16</td></tr><tr><td>Packet ID</td><td>16</td></tr><tr><td>Fragmentation</td><td>16</td></tr><tr><td>Time to Live</td><td>8</td></tr><tr><td>Protocol</td><td>8</td></tr><tr><td>Header Checksum</td><td>16</td></tr><tr><td>Source Address</td><td>32</td></tr><tr><td>Destination Address</td><td>32</td></tr></tbody></table></center>
IP Addresses

IP addresses are the standard means of identifying Internet hosts. An IP address is a 32-bit integer. By convention, in written form, each octet of an IP address is written in base-10, separated by a ., big endian. Consider that you are on a little-endian machine (the least significant octet of a word trails in memory). When an IP address, or any data for that matter, is received from a big-endian remote host, you must convert it to little-endian before it is significant to the machine. Big-endian notation is always used to represent IP addresses. For this reason, big-endian is said to be the network byte order. TCP/IP implementations will usually provide functions to convert from the network byte order to the local host's byte order, and should, if competent, process the information in the header correctly regardless of byte order.

For instance, the IP address 127.0.0.1 is the "local loop-back" address by convention for the Linux TCP/IP implementations, which means that data sent to this address will be looped back to you. (Just an interesting tid-bit.)

In order to facilitate networks of different sizes, IP addresses are divided into a network address and a host address. The network address identifies the network in an internet to which a datagram is being sent, and the host address identifies the specific host on that network to give the datagram to. The way in which these portions of the address are allocated determines the "class" of an IP address. IP addresses are organized into four distinct classes.

IP Address Classes

<center><table border="3" cellpadding="0" cellspacing="0" width="90%"><tbody><tr valign="top"><td class="tblhdr">Class</td><td class="tblhdr">Number of Network Addresses</td><td class="tblhdr">Number of Host Addresses</td><td class="tblhdr">Generalization</td></tr><tr valign="top"><td>A</td><td>126</td><td>16,177,214</td><td>One of few mammoth networks</td></tr><tr valign="top"><td>B</td><td>16,383</td><td>65,534</td><td>One of many larger networks</td></tr><tr valign="top"><td>C</td><td>2,097,162</td><td>254</td><td>One of a lot of small networks</td></tr><tr valign="top"><td>Ext.</td><td>N/A</td><td>Indefinite</td><td>N/A</td></tr></tbody></table></center>
We never really need to worry about these classes, but if you're curious, the class of an IP address can be physically determined in two ways: First off, if there's a pink flamingo anywhere on its front lawn and it drives a sport utility vehicle, then it is probably of a higher class. The other common approach is to look at the range of its most significant octet. A MSO from 0-127 indicates class A, from 128-191 class B, from 192 to 223 class C, and from 224-255 an extension address.

The extended IP address class, described in RFC 1112, is used for multicasting (sending a single packet to more than one host at a time), and we will not concern ourselves with it.

One would like to think that this is it. However, on a lower level, there's still the concern of how IP gets a packet to a given IP address. This process is called routing, and can become moderately complicated. Fortunately enough, for our purposes, it's simple, because we don't have to implement it.

The Domain Name System

IP addresses can get a bit cryptic, and there are certainly a lot of them, as you have seen (to be specific, there are 4,294,967,296 possible IP addresses--but this will one day be not-enough! Remember, there are already about six billion humans on Earth.) While it may be possible in class C networks and smaller internets to simply maintain a hosts file that contains the name of every host and an IP to go along with it, this is not even CLOSE to practical for the 10,000,000+ host Global Internet of today. A file containing the complete database for all IP addresses would be about (conservatively) 94,489,280,512 bytes in size: 90,112 megabytes, 88 gigabytes. Few hosts could sustain such a database, and fewer still could search it. This would limit and centralize network capacity, which would be a big no-no based on the decentralized Internet philosophy. Moreover, it would be a real pain to append and mend these records. Therefore, as a solution to the problems associated with maintaining IP databases, the Domain Name System (DNS) was developed.

DNS is a hierarchal IP address lookup system that uses text strings to classify IP addresses in increasing precision. In other words, it uses a tree system to decentralize the process of looking up hosts. Host addresses under the DNS take the form host.subdomain.domain. The domain at the highest level of the tree is called the root domain. The root domain contains all of the top level domains, among the most common of which are gov, edu, com, mil, and org, and ISO two-letter country abbreviations.

A name server is a server program that maintains information about the domain structure and how it relates to IP addresses. Name servers do not contain information for all parts of the domain name hierarchy; rather, they contain pointers to other name servers than know about parts of the domain name hierarchy that they do not. Similarly, if a name server knows about the structure of a particular part of the domain name hierarchy, other servers may be linked to it. Other programs called resolvers then access name servers to map domain names to IP addresses, and vise-versa. In this manner, host address information can be maintained without centralizing it, and in such, the network is able to sustain itself without relying too much on any particular host. Most ISPs maintain (a) name server(s).

RFCs 882, 1043, and 1035 provide a decent higher-level introduction to DNS. This is recommended reading for anyone curious as to how it actually works. Fortunately once again, we get DNS as a freebie with civilized operating systems.

The ICMP Protocol

An integral part of IP is ICMP. It is difficult to explain the role of the the Internet Control Message Protocol from an application programming standpoint, because it is really a low-level sort of thing that's a direct part of IP. However, to state it simply, ICMP is used by gateways (devices which connect the various networks of an internet) to communicate with hosts or vise-versa. Its application is in reporting or querying errors in the data transaction process. Because of the nature of ICMP, I will not discuss its specifics. The only real application-level use for ICMP that I have ever run across is the ping network utility, which sends ICMP packets to test whether or not a remote host is accessible and measure the transaction time between that host and another (this is sometimes called the "ping time"). While this might have applications in a game, excessive pinging can clog things up, something which we want to avoid doing in a multiplayer simulation. (In fact, there is a common [but stupid, primarily because you're basically giving away your address to the person you're cracking, and it's illegal] denial-of-service attack called flood pinging which is rather similar...)

The GGP Protocol

The Gateway to Gateway Protocol is another protocol that lives with IP. It is designed for communication about routing between two gateways. Suffice it to say that it does not concern us at all, because obviously we are not managing IP-level transactions.

The ARP and RARP Protocols

The Address Resolution Protocol and Reverse Address Resolution Protocol are designed to take Internet addresses and convert them to specific local hardware addresses. This task is obviously way beyond the scope of what we need to consider for application-level TCP/IP programming.

Physical Protocols and Why we Don't Care

Physical protocols are protocols which manage actually moving a piece of data across an internet. IP deals with working data through routers on a "high level", but it does absolutely nothing about actually getting data from one router to the next. This is where physical protocols come into play: Once a piece of data has reached a router via IP and the data needs to move on, the services of a physical protocol are called upon to do this moving on. The router slops another header on top of the IP header (and sometimes adds a footer to the end) and ships the packet away. When the destination router receives it, it peals the header off, and appends another one which will be needed to get it to the next router, which repeats the process, ad yada yada yada. The only time that we even remotely care about physical protocols is when we're talking about the last leg in the data transaction process for the typical end-user/gamer: From the ISP to the modem.

The Point to Point Protocol (PPP) is the most commonly used physical protocol for transferring such data between modems. Let's say that a packet destined for a dial-up user has been sent from a remote host and has finally been routed to the appropriate place on the ISP. For the ISP to actually route the packet to the user's machine, a physical protocol needs to be called upon. This is probably going to be PPP. So the ISP attaches the PPP header and footer and ships the packet over to the user via the established PPP modem connection, it is received by the user's modem, and picked up by whatever local low-level telecommunications support is running. The PPP header is then pealed off by the local host's PPP support, and the IP header is seen. The IP header is processed, pealed off, and the TCP or UDP header is seen. The local host uses the information in the TCP or UDP header to direct the packet to the correct port on the local host, then peals the headers away, leaving just the data to be processed by an application running on that port.

We care about that process for three reasons: First of all, it's interesting, second of all, it's easy to explain, and finally, because it sounds impressive. Other than that, we needn't concern ourselves with physical protocols. They are at a way lower level than we will ever touch as practical game programmers. The only time when we will even remotely care about PPP is when we're optimizing and investigating where overhead comes from.

If we were actually implementing TCP/IP support for a new platform, then yes, we'd care about physical protocols quite a bit, or if we were going into overkill mode and doing it all ourselves, we'd care about it. But as it is, if we've got support for TCP/IP on our target platform, then it's almost a given that there will be atomic support for PPP or SLIP or an equivalent physical protocol in place and active. My other articles on multiplayer technologies for Game Programming MegaSite have been primarily extremely low-level, concerned mainly with programming for the modem: If I were to continue that trend, I would go into the specifics of PPP and the hell of implementing it and all the layers of complexity above it. Unfortunately, I don't think that perplexed.com has a big enough disk for that, and I'd be wasting your time. So, we don't really care.

TCP/IP Programming

When we speak of developing TCP/IP applications, we rely on four layers of functionality:
  • An application protocol specific to what we're trying to do (SMTP, for instance)
  • TCP or UDP to oversee data transaction
  • IP, to actually transmit data packets to a given destination
  • Physical protocols like PPP to make data transfer happen.
Most of these layers are supported directly by the operating system. Our task usually comes into play when it boils down to implementing or supporting a specific application-level protocol, whether standard (as would be the case with a common system utility like finger or sendmail) or propietary (as would be the case with Quake).

On any civilized platform, layers 2-4 will be provided for you. If you have to implement any of layers 2-4, you're not dealing with a civilized platform and you're probably wasting your time. Those operating systems which support development of TCP/IP applications with components that are part of the main operating system API are said to have "native" Internet support. Examples of such operating systems are 4.2BSD, Windows NT, and Windows 95. Operating systems which require the use of a propietary API for applications which utilize TCP/IP are said to have no native support for TCP/IP. Examples of these operating systems include Windows 3.11 and DOS.

Win32 TCP/IP Support: WinSock

Win32 supports TCP/IP via WinSock, a port of the popular Berkley Sockets technology. Socket, of course, refers to a TCP or UDP socket in a connection. WinSock suffered from some slight loss of elegance when it was ported from BSD, namely because of fundamental differences between BSD and Win32. The most visible change for experienced Berkley Sockets developers would be that you have to (as in, must) initialize and de-initialize the WinSock library. On a deeper, uglier level, files and sockets were--before 2.0--completely different ball-parks in WinSock, because of course BSD != Windows 95. A recent effort in 2.0 has endeavored to cover the tracks of the Win32 platform by making this situation ad-hocable. (Thank you, come again.) WinSock, like Berkley Sockets, is purely procedural.

What you should not use WinSock for is implementation of extremely common application-level TCP/IP protocols, if you can at all avoid it or if you don't need excessive control. Win32 already provides rather sufficient application-protocol-level Internet support for mail protocols, HTTP, FTP, and gopher. For most purposes, this support is enough. Low-level implementation of these protocols would be tremendously educational, but otherwise, a waste of time (for the most part; although you might consider writing an HTTP or FTP server for Windows 95).

On a personal note, I think WinSock is a stupid name, even if it is also a bad pun. I would've preferred that it be called "Lose Underwear." People can identify with this more. Frequently, I have observed that people lose underwear, whereas they very seldom win socks, unless they're playing a game of five card stud without a full deck or they're the 72nd shopper at K-Mart. That's just me, though.

Initializing the WinSock Library

To initialize WinSock, you call WSAStartup(). The first parameter of WSAStartup() gives the highest version of WinSock that your program can use. The library then uses a decision table based on the version given and the versions supported to decide whether or not it can effectively support your application. Parameter 1, the version requested, is a WORD. WinSock versions that you can request include 1.0, 1.1, and 2.0 (at present). Use the MAKEWORD macro to form the version for the first parameter, as follows: MAKEWORD(2,0) = version 2.0. For simplicity and maximal compatibility, it's a good idea to request the earliest version of WinSock that supports what you need to do.

The second parameter of WSAStartup() is a pointer to a WSADATA struct to get the specific initialization information. The WSADATA structure will contain information about the actual version of WinSock that you've gotten a hold of, as well as some limits that might come in handy for something and might not.

The return value of WSAStartup() is an error code. If the error code is 0, then obviously there was no error and we can proceed to use the WinSock library. The constants representing error codes usually take the form WSAEx, where x is the specific descrition of the error.

WSADATA wsaData;
if (WSAStartup(MAKEWORD(1, 0), &wsaData)) {
   ErrorMessage("Error while attempting to initialize WinSock");
}

Creating Sockets

To create a socket, you call either socket() (real shock, eh?) or WSASocket(). As is the case with many WinSock functions, the fancy WSA prefix functions are easily distinguished from their counterparts preserved for portability. In this tutorial, I will cover the preserved Berkley forms of the functions in WinSock for the most part. Why? Because most of them don't have "extensions" built in, and are consequentially easier to understand.

The first parameter of socket() specifies the address family that a socket will use. This is related to the way/place in which data transferred over the socket will be sent/will go. For the purpose of Internet-based TCP/IP programming, AF_INET will work. However, if you feel like having fun, check out the AF_x line of constants in winsock.h or winsock2.h. You might be surprised at the sheer number of networks that Winsock can be applied to.

The second parameter of socket() is the socket type. SOCK_STREAM will form a stream-based socket (surprise surprise) that supports bidirectional data transfer and an internally managed transfer buffer, and is generally more reliable than other socket types, such as SOCK_DGRAM. Check out the SOCK_x line of constants in winsock2.h for more of these socket types (if you have Winsock 2.0...1.0 and 1.1 support only the two mentioned above).

The third parameter of socket() is the protocol that the socket is to use. These take the form IPPROTO_x. IPPROTO_TCP, for instance, specifies the TCP protocol. IPPROTO_UDP specifies the UDP protocol. IPPROTO_IP specifies the IP protocol (right...). Once again, consult your handy header file for more information.

socket() returns a handle to the newly created socket, an integer (SOCKET is #defined as a u_int). In fancy-talk, this integer is called a socket descriptor.

SOCKET s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

Binding and Connecting

Immediately after its creation, a socket can do nothing. Before it can do anything, we must define which port number a socket is to pay attention to and which host that port is on: That is, we must actually associate our socket with a socket! To attach a socket to a port on the local machine, use the bind() function. To connect a socket to a socket on a remote host, use the connect() function.

There is a crucial distinction that needs to be made between bind() and connect(). First and foremost, as has already been stated, bind() is used to attach a socket to a specific port on the local machine, and this only before connection. If you attempt to connect() without binding a socket to a port on the local machine, connect() will bind it randomly for you. If you attempt to bind() while connected, you will be doing something very strange. The second distinction is that bind() is usually used in conjunction with listen(), which listens for a connection on a given socket. This does not mean that bind() has to be used in this manner. Seldom does one have to bind to a specific local port when performing a transaction with a remote machine. The similarity between bind() and connect() would be that they have effectively the same parameter list with the middle parameter replaced with the local host's address in bind().

The first parameter of connect()/bind() is a socket descriptor. The second parameter is a pointer to a sockaddr struct. This parameter indicates what the socket is to be bound to/connect to, i.e., what machine we're dealing with. This is what a sockaddr looks like:

struct sockaddr {
    	u_short sa_family;
    	char sa_data[14];
};

sa_family is yet another instance of the family things that we've been seeing; the AF_x line of constants can be used to specify this. This family will more likely than not be the same one used in the creation of the socket. In fact, if it's not, give me a ring, because that would be extremely unusual. sa_data contains the address of the remote machine.

One might ask what char sa_data[14] really means, or indeed, what this whole structure really means; that is, how the address is really formatted. In TCP/IP networks, the above structure can be rewritten like this:

struct sockaddr_in {
   	short sin_family;
   	u_short sin_port;
   	struct in_addr sin_addr;
   	char sin_zero[8];
};

(note short instead of u_short; this, I can only assume, is an error in the published WinSock documentation, upon which I have based these structure listings in order to avoid unintentionally introducing any further errata.)

For the overly curious, in_addr looks like this:

struct in_addr {
            	union {
                    	struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;
                    	struct { u_short s_w1,s_w2; } S_un_w;
                    	u_long S_addr;
            	} S_un;
};

And now you want to know why you care. Well, if you'll think back to our discussion on how IP datagrams are routed from one place to another, you'll recall the concept of an Internet address that uniquely identifies a host on a network. sockaddr_in specifies an address for a unique port on a unique host. in_addr is the actual Internet address of that host. You will note four bytes, referenced in the union through chars, shorts, and longs, just as I promised back in the IP discussion (four octets). But we don't need to worry about things on this level. That's just how it fits together.

The third and final parameter of bind()/connect() is the size of the address structure given as the second parameter. This is most commonly specified using the sizeof keyword in conjunction with the actual address structure. So, for instance:

struct sockaddr_in addr;

...

SOCKET s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

//assume that we fill addr somewhere in here

...

connect(s, (struct sockaddr *)&addr, sizeof(addr));

...

That's the general idea.

Addressing and Name Service Support

It should be noted that sockaddr_in is a wrapper that identifies an IP address and a port on the machine identified by that IP address for practical TCP/IP development purposes. The very important question of how we know this IP address comes into play. There are really only two ways to know the IP address of a host: You can either know it outright, or look it up based upon a domain name service. WinSock encapsulates the functionality of a resolver (discussed earlier when we talked about DNS) in the gethostbyname() function. This will look up a host-name and get the IP(s) of the host or hosts associated with it if possible.

gethostbyname() has only one parameter, namely, the name of the host to get the address of. The address itself is returned in the form of a pointer to a hostent structure. WinSock will allocate this for you, so all you really need to do is get a pointer to it. Straight from the headers (well, almost), this is what hostent looks like:

struct hostent {
   	char FAR *h_name;
   	char FAR *FAR *h_aliases;
   	short h_addrtype;
   	short h_length;
   	char FAR* FAR* h_addr_list;
#define h_addr  h_addr_list[0]
};

Nasty structure, really, but fortunately you won't have to worry about much of it. h_name contains the name of the host. h_aliases is an array of strings which contain aliases (pseudonyms; nick-names; alternative identities indicating one and the same thing) of the host. h_addrtype is simply the type of the addresses in the h_addr_list array. This is unnecessary for our purposes, because we're already pretty certain what they are. h_length is the length of the addresses. And finally, where our interest comes in, the h_addr_list array. This will contain the IP address(es) of the host that we have looked up. It may come as a surprise, but there may be more than one IP associated with a host, just as there may be more than one host associated with an IP.

Usually, we only want the first address in this list of returned addresses to evade unnecessary complexity. That would be h_addr_list[0]. This, in past versions, was represented by h_addr; in order to maintain backward compatibility, a #define has been added to fudge this in Winsock 2.

struct hostent *host_addr;
host_addr = gethostbyname("gamedev.net");

It is also possible to resolve an IP address into a host name. This can be done by passing a 32-bit integer representing the IP address in network byte order to the gethostbyaddr() function. As we have already discussed, the standard format for the IP address in written form is a decimal-point denoted set of four base-10 integers. It is for this reason that WinSock provides the inet_addr() function to convert an IP-address string to an appropriate unsigned long integer. gethostbyaddr() takes as its first parameter a pointer to a numeric IP [having been converted from a string with inet_addr()] in network byte order. Its second parameter is the length of the address, which is obviously sizeof(unsigned long), and its third parameter is the type of the address, which is obviously AF_INET in this case. Then the hostent structure's h_name field will contain the symbolic name of the host. As a side-note, the inet_ntoa is the converse of the inet_addr function.

struct hostent *host_addr;
host_addr = gethostbyaddr(inet_addr("127.0.0.0"), sizeof(unsigned long), AF_INET);

Network and Host Byte-order

This is not a very complicated subject. WinSock provides the functions htons(), ntohs(), htonl() and ntohl() for converting short ints (WORDs) and long ints (DWORDs) from and to host and network byte order, respectively. You should perform a ntohl() or ntohs() when appropriate for received data.

Sending and Receiving Data

One might expect this to be complicated and arcane. It's not. Let me give you a hint: The send() function sends data, and the recv() function receives it. Beyond this, there are a few specifics, but that's the general model that has been adopted.

First of all, it is important to make the distinction between connection-oriented and connectionless sockets. The connection-orientation of a socket is given by one of two things: Whether or not it wears baggy pants (ask Jim about that one), or what protocol it uses for communication. The UDP protocol obviously is not oriented at connections, as was already discussed. The question then arises as to how data is sent from a connectionless socket if there's no connection (because if there's no connection, we can't possibly know where to send the data TO).

To send data from a connectionless socket, we use the sendto() function, which allows us to specify this information. We can also use sendto() for connection-oriented sockets, but it's really not necessary to do so, because we already know where our data are supposed to go. Similarly, a recvfrom() function is provided. This function returns, in addition to whatever might be waiting for us to receive, relatively detailed information about the host sending the information to our socket.

So, with no further adieu, let's delve into these functions, which are really sort of the meat 'n potatoes of the entire WinSock development experience (...).

send() takes as its first parameter a socket descriptor. Its second parameter is a pointer to a buffer which will be sent. The third parameter is the length of the data in the buffer to be sent, in bytes. The fourth and final parameter is an arcane flag that specifies special conditions for how the data should be sent. It presently has two possible values, MSG_DONTROUTE and MSG_OOB. The "don't route" flag is self explanatory: It specifies that the sent information shouldn't be routed. Don't ask me how they manage that one. The other value relates to Out-Of-Band data, which is somewhat interesting but generally outside of our interests. (OOB data is any sort of data that has a higher priority or a separate meaning than the normal data transmitted in the datastream. In TCP, this is called urgent data. Because we're not really concerned with much except getting our data from one point to another on an internet as game programmers, we're not going to worry about it. If you'd really like to know about OOB data, for whatever reason, check out the section about "urgent data" in RFC 793 [the TCP RFC]).

SOCKET s;
char *data = "Hello"
...

//We'd create socket s somewhere in here and connect it

...

send(s, &data, 5, 0);

Well, one down. Three to go.

recv() is very similar to send(), with the exception that the buffer parameter (the second) is now to contain a pointer to a buffer which will be used to store whatever incoming data there happens to be from a remote socket. The third parameter is then the length of that buffer. The fourth parameter is, again, an obscure flag, this time with rather more useful values. MSG_PEEK can be passed as the flag parameter in recv() to copy the data from the socket's received buffer into the buffer specified, but to not remove that data from the socket's received buffer. MSG_OOB is also a valid flag here, and can be used to take a look at the OOB data coming in on this socket. It should be noted that recv() can be used on either connectionless or connection-oriented sockets. The only downside of using it on connectionless sockets is that you won't know where in Hades (or should I say Hayes?) the data that you've received is coming from.

SOCKET s;
char data[256];
...

//We'd create socket s somewhere in here

...

recv(s, &data, sizeof(data), 0);

And then there were two...

The sendto() function is the next most intuitive in our little group, here. Its first four parameters have the same meaning as they do in the context of send(). The fifth parameter of sendto() gives the address of the remote host to send to. This is optional along with the last parameter when the socket that we're sending from is connected to another socket, but must be filled for anything meaningful to happen when our socket is connectionless. Note that this pointer is to a struct of type sockaddr, or in the case of TCP/IP, sockaddr_in.

SOCKET s;
struct sockaddr_in addr;
char data[256];
...

//We'd create socket s somewhere in here and fill in addr

...

sendto(s, &data, sizeof(data), 0, &addr, sizeof(addr));

Last but not least, recvfrom(): The hallmark of uselessness. The first four parameters of recvfrom() are the same as they were for recv(), and the last two are essentially the same as they were for sendto(), except that they're now "from" parameters that will hold the address of the remote host. The exception is that the last parameter of recvfrom(), for some ungodly reason, is a pointer to the size of the sockaddr_in structure which will hold the address of the remote host instead of a direct value for it. Don't ask me what genius thought that one up. Anyway, it should also be noted that the last two parameters are again optional.

SOCKET s;
struct sockaddr_in addr;
char data[256];
int x;
...

//We'd create socket s somewhere in here

...

x = sizeof(addr);
recvfrom(s, &data, sizeof(data), 0, &addr, &x);

How to get your sockets to listen()

Aside from making them an offer that they can't refuse, the other way to get your sockets to listen is to use the listen() function (and you call this un-intuitive?). listen() has the main goal of getting a socket to listen for connections, and, if a connection is attempted, to add it to queue so that it can be processed and accepted by the accept() function (I know that I'm starting to see a pattern, here...)

In order for listen to have any meaning, two conditions must be met. First, a socket must be bound to a port on the local host. This is, as has already been discussed, achieved via the bind() function. Second, the socket must not be connected to a remote host at the time that it is listening for a connection. If you stop and think about it for a moment, this rather makes sense.

The first parameter of listen() is a socket descriptor of the socket which is to listen for incoming connections. The second parameter is the size of the backlog, or the queue to hold incoming connection requests, for the socket. There are numerous and creative ways to describe the backlog size parameter, but let it suffice to say that its maximum value is given by a constant SOMAXCONN. For our purposes, we may as well use this constant, although I highly doubt that (based upon the value of SOMAXCONN in Winsock 2) we will ever receive such a load of connections. The return value of listen() is 0 if no error occurs, and otherwise is 0.

SOCKET s;

...

//create and bind socket s

...

listen(s, SOMAXCONN);

Before accept() can be used, it is of course logical that the socket we wish to accept a connection on must be paying attention, i.e., listening. accept() is almost exactly the same as connect(), except that the second parameter, the pointer to a sockaddr_in structure, will get the address of the remote host instead of hold it, and the third parameter, this time for a worthy reason, is a pointer to an int that will get the length of the address returned. Note that the third parameter, in accordance with correct WinSock practices, should contain the sizeof the sockaddr_in structure on calling accept().

SOCKET s;
struct sockaddr_in addr;
int x = sizeof(addr);
...

//create and bind socket s, then listen for connetions

...

accept(s, &addr, &x);

Blockage and Synchronicity

While the title of this section sounds rather like a prospective Tom Clancy novel, it's really a somewhat important issue. Up until this point, we have rather candidly ignored a fundamental question associated with all of the functions that we have explored. Let's consider the example of the recv() function. What would happen if there were no data to be received and we called recv()? If the connection had been closed on the other end, recv() would return politely and give us no data. However, if the connection is still intact, and the problem is simply that no data exists to be received, recv() would sit and wait for data to come in. This behavior of recv() defines it as a blocking function, one that blocks other code in a program from taking effect until it finishes. This is better stated by saying that recv() is a synchronous function as opposed to an asynchronous function.

In Berkley Sockets, this situation can be remedied by using the select() function. select() will take as its parameters a set of arrays of sockets to check for readability, the ability to write, error conditions, and other great things of the sort. Then, when something happens on one of these sockets, the event will be processed. This, while it worked fine and well for BSD, is not a very elegant solution under the Win32 development paradigm.

As a direct consequence of this problem, WinSock introduced asynchronous socket I/O functions that operate based upon the Windows messaging scheme. For once, I have to give credit to Win32 for introducing a better way of doing things.

The WSAAsyncSelect() function is the "control center" for the WinSock asynchronous socket support. It takes as its first parameter a socket descriptor. Its second parameter is the handle of a window whose WindowProc will receive control messages for the socket. The third parameter is very cool: It allows one to specify what message will be sent to one's WindowProc when an event does occur on a socket. The final parameter to WSAAsyncSelect() is a set of flags for events that a message will be sent on. Here's a table, straight from (well, slightly modified from) the documentation. I've taken the liberty of removing what is useless for the purposes of a game programmer.

WSAAsyncSelect Event Flags

<center><table border="3" cellpadding="0" cellspacing="0" width="90%"><tbody><tr><td class="tblhdr">Flag</td><td class="tblhdr">Effect</td></tr><tr><td>FD_READ</td><td>Notifies the window proc that data are waiting for the socket</td></tr><tr><td>FD_WRITE</td><td>Notifies the window proc that data can be sent from the socket</td></tr><tr><td>FD_ACCEPT</td><td>Notifies the window proc on an incoming connection request</td></tr><tr><td>FD_CONNECT</td><td>Notifies the window proc on a connection having been completed</td></tr><tr><td>FD_CLOSE</td><td>Notifies the window proc when the socket closes</td></tr></tbody></table></center>
These can be combined as any bit-flags can. In the WindowProc, when the socket event message is received, wParam will contain the socket to which the event pertains, the low word of lParam will contain the event code (from the table above), and the high word of lParam will contain whatever errors if any have occurred.

#define SOCKET_EVENT_HAS_HAPPENED   	25252

SOCKET s;
HWND mywnd;

...

//create s and mywnd

...

WSAAsyncSelect(s, mywnd, SOCKET_EVENT_HAS_HAPPENED, FD_READ | FD_WRITE | FD_ACCEPT);

Note that there are a lot of details involved in the WSAAsyncSelect() function which I have neglected to cover. The basics discussed here are sufficient to attain decent use of WSAAsyncSelect() without too many problems. The rest is readily apparent from standard documentation.

In addition to WSAAsyncSelect(), asynchronous counterparts of the name service functions that we discussed a few sections back are also provided. These also require no elucidation, as they are a logical extension of their synchronous counterparts.

Errors in WinSock

The error-handling scheme in WinSock operates generally based upon return codes, but to figure out exactly what error has occurred, you need to call the WSAGetLastError() function when the return value of a function is non-zero (WSAStartup() would be the exception; it will return literally the error that has occurred if anything goes wrong). WSAGetLastError() takes no parameters and returns an integer error code.

Cleaning up after yourself

Upon exiting a WinSock application, it is a good practice to call the WSACleanup() function to clear up whatever you may have messed with in the course of the application itself. You should also use the closesocket() function, whose only parameter is a socket descriptor, to close a socket when you're done with it. There is no disconnect() function in Berkley Sockets; closing a socket is equivalent to disconnecting. However, in WinSock, it is possible to call WSADisconnect(), and it is generally polite to do so, but not required.

What I didn't cover

There is a lot more material regarding WinSock. What I have covered in this tutorial is what you need to know to use WinSock effectively for the purposes of a game programmer. If you want to know everything about every cubic centimeter of it, my only recommendation would be to go out and buy a reference on it. I cannot humanly be expected to accurately document the 84 functions in WinSock 2 and give detailed descriptions of their practical applications.

Coda

Well, I was going to present a demo with this article, but after it got to be over 50k, I decided that if it wasn't elicit enough, there's something wrong. I will use this space to send a message to the powers-that-be in article-writing: Help! I need assistance to complete this section. It took me a long time to put this documentation together, and it will take me even LONGER to cover the plethora of other standards out there. The present wish-list is for DirectPlay-related information and some demos. Anything you can throw together for this section would be greatly appreciated.

Disclaimers

[1] If you don't like the use of the pronoun "his" in the general case, either append the English language or go back to the 13th century and bring Xena the Warrior Princess along.

[2] If you read this tutorial and find any errors within it, you should report them. It's no loss of face for me; I'd rather have errors exposed and corrected than ingrained in the minds of readers, only to be struck out with a blunt object when they get into an argument in a bad neighborhood over the role of GGP or ICMP or something of the sort ten years down the line.

[3] If you have any complaints or gripes with my style, my sense of humor, or anything relating to me or my existence, you can endulge yourself in fifty industrial-sized cans of "Beef-a-rooni", shove said complaints where the sun don't shine, and enrich the world with your beautiful music. Or, in short, write a better tutorial if you've got a problem with this one.

[4] Any copyright or trademark infringement in this document is probably intentional, and while it would be only ethical to sue me, the legal costs that you would incur in the process would far exceed my net worth.

Happy coding, finish that game, and send it over to GDW! ;)





Comments

Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS