Jump to content
  • Advertisement
Sign in to follow this  
The-Moon

Coding a Multiuser Server in C/C++

This topic is 3848 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ive been off and on for the past year at least trying to find the best way to program a TCP server. I have a question. I did ask this question in my other topic "Trouble with sockets" but no one responded. I want to know what is the best way to code a server with multi client support. What i'm talking about, is blocking and non blocking... Is there a easy way in windows to poll a sockfd for if it has information ready to be recv() or someone to be accept()ed, or is it better for me to make a new thread for each client that connects, and inside those threads i have a recv() waiting for data from the clients? As well as a thread for the listen sockfd. Or is there a better way to do it without threads or the use of select(). Or is it best to use a combination of select() and threads. Where say i have 1 thread watching (64) diffrent client sockfd. I think 64 is the default max for a fd_set. I am still a n00b to sockets, and i would really like to clear these questions up, and get on with my programming, rather then trying to figure out whats the best way to do this.

Share this post


Link to post
Share on other sites
Advertisement
The question is very platform specific. For Windows you probably want to use IOCP, for Linux you probably want to use epoll, for BSD kernels you probably want to kqueue and for SUN I think they use /dev/poll. There's libraries that deal with this for you, such as libevent which scale very well.

Share this post


Link to post
Share on other sites
Quote:
Original post by asp_
The question is very platform specific. For Windows you probably want to use IOCP, for Linux you probably want to use epoll, for BSD kernels you probably want to kqueue and for SUN I think they use /dev/poll. There's libraries that deal with this for you, such as libevent which scale very well.


Would it be better that i use libevent or IOCP, or would giving each client sockfd its own thread work just as well?

Or is it best i learn one of those rather then using threads.

Share this post


Link to post
Share on other sites
One thread per connection is the worst thing you can do unless you have very very few and long lived connections. It's the easiest thing to do though.

I would learn libevent (C) or asio (C++) depending on if you use C or C++. Asio has a few issues imo, I use it myself and it does a large number of memory allocations, especially if you use their timers as well. It's well abstracted though and does allow you to get started very quickly. Also part of boost now so it has been audited by a few good men.

Share this post


Link to post
Share on other sites
Quote:
Original post by asp_
One thread per connection is the worst thing you can do unless you have very very few and long lived connections. It's the easiest thing to do though.

I would learn libevent (C) or asio (C++) depending on if you use C or C++. Asio has a few issues imo, I use it myself and it does a large number of memory allocations, especially if you use their timers as well. It's well abstracted though and does allow you to get started very quickly. Also part of boost now so it has been audited by a few good men.


Alright well then i think that clears things up for me, i need to learn/use Asio.

Thank you very much asp_.

Anyone else have anything to add?

Share this post


Link to post
Share on other sites
Quote:
Original post by asp_

Asio has a few issues imo, I use it myself and it does a large number of memory allocations, especially if you use their timers as well.


Use 3.9 (and it appears version 1.0 has been released in mean time). It supports custom allocators. Also, if you don't use boost::function for callbacks, but provide your own callback delegates, there won't be any extra allocations.

Timers however are a software resource, which means they should be used sparingly.

I experimented with timers to provide bandwidth throttling. With one timer per client, it was always the NIC that broke first, never the application. Thousands of clients managed this way don't even dent the CPU (literally, echo server should run at idle CPU).

Quote:
Or is there a better way to do it without threads or the use of select()


For non-blocking sockets, have a single thread. recv() for n milliseconds, process the data that arrived, then send. Optionally, you can just recv() the data that is in the network buffer until you get WOULD_BLOCK error. Then send. Repeat as needed.

[Edited by - Antheus on April 1, 2008 9:35:09 AM]

Share this post


Link to post
Share on other sites
Quote:

there won't be any extra allocations

Extra being kind of relative here. Shared pointers alone are 2 allocations (switched them out for intrusive pointers). An async_receive is 1 allocation. expires_from_now and async_wait together causes 3 memory allocations. Currently memory allocations account for about 15 - 20% of the overall execution time because the connections are extremely short lived and they send extremely brief requests and get very brief responses. Now I've eliminated all the large allocations and asio is now only requesting really small pieces of memory and since most standard allocators are pooled in one form or another it's acceptable. ZLIB is by far the worst culprit allocating almost 300 KB of memory in varying sizes during one compression pass and it ended up with a custom memory allocator.

ASIO saved me a lot of time, in my opinion it's not perfect, but it's probably one of the best libraries out there for what it tries to achieve and it's pretty customizable. It reminds me of boost in general to be honest and it seems to me it was originally written to fit in with the suite. I've been very happy with it and anything I wasn't happy with I was able to work around without rewriting the library which is awesome.

Share this post


Link to post
Share on other sites
Quote:
Original post by asp_

ZLIB is by far the worst culprit allocating almost 300 KB of memory in varying sizes during one compression pass and it ended up with a custom memory allocator.


Hmmm. You aren't creating and de-allocating the (defalteInit,deflateEnd) on every call, are you?

I allocate one z_stream per active object. Typically this will be a single one per system, there may be others, but this avoid sharing it over threads. And this approach has no reasonable problem coping with 100Mbit connection. This allocation is also fixed, 256k typically + dynamic overhead for per compression, which apparently should be < 64k. This is where multiplexed design comes handy, rather than having one global z_stream, or one per connection, it's one per worker thread.

I won't vouch that there really aren't any extra allocations, but between all things, I never noticed that to be a problem.

Share this post


Link to post
Share on other sites
Antheus, that's a good idea. I could use a thread local storage pointer of memory to a z_stream object. Would significantly cut the memory requirements for deflation. I verified that deflate doesn't do any memory allocations at all beyond what deflateInit does. Under the assumption that deflate can be run multiple times on the base initialized z_stream and that one thread runs one request from start to finish this should be safe and be a significant save for me both performance wise and memory wise. Off to do some testing.

Share this post


Link to post
Share on other sites
TLS is slow, and with ASIO you don't need it.

#ifndef CAPDUMP_COMPRESSION_HPP
#define CAPDUMP_COMPRESSION_HPP

#include <zlib.h>

class MemoryBuffer;

class ZlibCodec
{
public:
typedef unsigned int size_type;

ZlibCodec(void);
~ZlibCodec(void);

int encode(unsigned char *src, size_type srcLen, unsigned char *dest, size_type destLen);
int decode(unsigned char *src, size_type srcLen, unsigned char *dest, size_type destLen);

int encode( const MemoryBuffer &src, MemoryBuffer &dst );
int decode( const MemoryBuffer &src, MemoryBuffer &dst );
private:
z_stream streamR;
z_stream streamW;

void reportError( const z_stream & s) const;
void initStream(z_stream &s) const;
void setInput(z_stream &s, unsigned char *dest, size_type destLen, unsigned char *src, size_type srcLen) const;
};


#endif // CAPDUMP_COMPRESSION_HPP


#include <iostream>
#include "memorybuffer.hpp"

#include "compression.hpp"

ZlibCodec::ZlibCodec(void )
{
initStream(streamR);

if (inflateInit(&streamR) != Z_OK) reportError(streamR);

initStream(streamW);

if (deflateInit(&streamW, Z_BEST_SPEED) != Z_OK) reportError(streamW);
}

ZlibCodec::~ZlibCodec(void)
{
(void) inflateEnd(&streamR);
(void) deflateEnd(&streamW);
}

void ZlibCodec::initStream(z_stream &s) const
{
s.zalloc = Z_NULL;
s.zfree = Z_NULL;
s.opaque = Z_NULL;
s.data_type = Z_BINARY;
}

void ZlibCodec::reportError( const z_stream & s) const
{
std::cout << std::endl << "ZLIB ERROR: ";
std::cout << ((s.msg) ? s.msg : "Undefined error") << std::endl;
}

inline void ZlibCodec::setInput(z_stream &s, unsigned char *dest, size_type destLen, unsigned char *src, size_type srcLen) const
{
s.next_in = src;
s.avail_in = srcLen;
s.next_out = dest;
s.avail_out = destLen;
}

int ZlibCodec::decode(unsigned char *src, size_type srcLen, unsigned char *dest, size_type destLen)
{
setInput(streamR, dest, destLen, src, srcLen);

if (inflateReset(&streamR) != Z_OK)
{
reportError(streamR);
return 0;
}

if (inflate(&streamR, Z_SYNC_FLUSH) != Z_STREAM_END)
{
reportError(streamR);

return -((long)streamR.total_out);
}

return streamR.total_out;
}

int ZlibCodec::encode(unsigned char *src, size_type srcLen, unsigned char *dest, size_type destLen)
{
setInput(streamW, dest, destLen, src, srcLen);

if (deflateReset(&streamW) != Z_OK)
{
reportError(streamW);
return 0;
}

if (deflate(&streamW, Z_FINISH) != Z_STREAM_END)
{
reportError(streamW);
return 0;
}

return streamW.total_out;
}


Then, you just allocate this on per-handler basis. ASIO gives you (optionally) a guarantee that handlers are executed safely.

This way, you can re-use the z_stream and the by far most costly allocation part between calls. The above should cut down the running time to a half or so for small buffers.

Note: the source is from a utility, so it doesn't cover all the cases.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!