Jump to content

  • Log In with Google      Sign In   
  • Create Account

hplus0603

Member Since 03 Jun 2003
Offline Last Active Today, 03:36 PM

#5300990 Arcade Physics Movement W/ Latency

Posted by on 16 July 2016 - 11:55 AM

The problem is latency.


Welcome to networking! :-)


You have three options:

1) Delay response to commands until you hear back from the server. That way, you know that everyone runs all the commands in the same order/time, so you can assume that the world is consistent. This is the "deterministic lockstep" approach, and is most popular for RTS games, but works for other systems too. When you get it working, it solves pretty much all problems -- but you have to accept the command latency!

2) Trust the client. If my client says that I hit you with a shot, then send that information to the other clients. If the hit player saw something else, spawn a new shot right where they're at so they know what hit them. This can get somewhat confusing, and is super open to cheating, but can also work well.

3) Run physics on the server, let the server determine what "actually" happens, and keep correcting/compensating the clients. This is what most FPS games do, because it hides the latency, and removes confusion/cheating, but instead sometimes causes "I got shot behind a wall" problems, and requires physics simulation on a server. (Note: Server could be user hosted)


#5299920 How do you deal with shared data and work load balancing?

Posted by on 09 July 2016 - 05:48 PM

I'm creating a thread whenever I need some system updated periodically.


You'd likely want to create a few threads (as many as there are cores) up front, and then "post" work to do to those threads, rather than creating new threads, or thread-per-subsystem.
That way, "work to do" is always well described as a work record (you're forced to structure it this way) which means you can easily-ish scale the number of threads between one and N.

Separately, do you actually need multiple threads?
Have you written a networked game before?
Did it run out of CPU in a single thread?
If so, where did the profiler say that you were spending your time?

In general, before you can build a "correctly" threaded application, you need to have some idea of where the problems will be.
For example: Do you have a lot of collision pairs to test in physics? Those could easily be parallelized.
Do you have a lot of database/file data to look up for each action request? Then parallelizing isn't the solution; asynchronizing is.
Without knowing exactly what your problem is, you can't possibly hope to effectively address it!
(And you'll have even lower chance of getting good advice from the internet if you can't describe it ...)

So, typically, the main problems that need CPU on the server are related to physics/entity simulation. Let's assume that's what you're going for.
In general, to make a multi-threaded game update engine, you need to structure all computation to be double-buffered.
During a particular time step, all entities/objects/systems are allowed to ONLY READ from the "current" copy, and ONLY WRITE to the "next" copy.
As long as you won't have multiple threads wanting to write to the same output state/copy at the same time, you will then not need locks.
Once the update has completed, you will have a short time of single-threaded-ness where the "current" and "next" states are swapped.
This may be as short as flipping an "current index" global variable between the values 0 and 1!

Separately, you will be generating messages to send to other objects/nodes/whatever.
You could use a lock-free FIFO for those messages.
Make it big enough that you will never generate more messages in a single step than there is space in the FIFO.
That way, you won't end up dropping any messages in the lockless phase.


#5299407 how rsa works on texts and signals?

Posted by on 06 July 2016 - 05:39 PM

All data in computers is just bytes in memory. (Or on disk.)
Those bytes in memory can be text (in various encodings -- ASCII, Latin-1, UTF-8, UTF-16, ...) or audio (in various formats) or video or executables or whatever.
Because those bytes are numbers, any encryption that can turn numbers into other numbers, and then decrypt by turning those other numbers back into the original numbers, can be used to encrypt whatever is stored in memory (or on disk.)
This is pretty basic computer science, though, so I suggest you read up on computer architecture using a good textbook of some sort.
Like perhaps http://amzn.to/29PIM2M


#5299082 Hundreds of timers...

Posted by on 04 July 2016 - 11:06 PM

this is a narrow interface, so try the simple thing first, and if it doesn't scale, swap it out


That's exactly the kind of thinking that leads to slow performance everywhere and the profile doesn't show you any obvious place to improve.
An experienced programmer will pick the right option up front, and thus doesn't need any re-work later.
std::priority_queue<> is pretty easy to use; std::map<> is trivial to use. Both will perform very well. Pick one of those, rather than something obviously bad, if you already know the volume of data.
 
Here's the declaration for priority_queue<>:

    typedef std::pair<my_time_type, my_action_type *> queue_item;
    std::priority_queue<queue_item, std::vector<queue_item>, std::greater<queue_item> >  my_timer_queue;

    // inserting in queue
    my_timer_queue.push(queue_item(time, action));

    // checking which items should run
    while (!my_timer_queue.empty() && my_timer_queue.top().first <= current_time) {
        auto ptr = my_timer_queue.top().second;
        my_timer_queue.pop();
        ptr->timer_expired();
    }
This will manage a heap in a linear container (vector,) which additionally has the potential cache benefits suggested above.
It will use log N insertion and removal times, and finding the next time to run (the time of the earliest expiring timer) is constant time.
This is very likely as close to optimal as you can get, and it's very simple to use.

Some other languages do not have well-behaved heap data structures, though; there you may have to sort an array yourself.
Make sure to test it with large numbers so you can tell whether the sort function you're using is pessimal or not.

every time your computer crashes, it's often caused by someone who chose a higher-performing and harder to maintain system


You're saying that choosing a well performing algorithm will cause crashes, which on the face of it seems like an absurd statement.

I'd view that as someone biting off more than they can chew. Which is also very, very, common -- although schedule pressure is often as much a reason for that as programmer competence.

That's a very different discussion, though :-)


#5298921 Keep checking credentials on Rest API?

Posted by on 03 July 2016 - 02:29 PM

This is no different from logging in to a website. On a website, you get a session cookie, which typically is something like "userid:expirytime:unique-id:hmac(userid,expirytime,unique-id,key)."

You can issue a similar session token to the users of your REST API. In fact, because you already use HTTP, you can still use cookies. (For UDP based connections, you'll need to manage this cookie at your own protocol layer.)

It's important that you do verify the cookie hmac, and that you do verify the timeout/expiry time. You will also want to issue a new session token when the expiry time gets close, if you want the user to keep the session alive.

Another option is to use OAuth for your API calls. However, as maunovaha suggests, implementations of that typically require looking up the token in some database. You could make that database be memcached or Redis, to make it fast, but it's still slower than "verify the hash."


#5298918 Hundreds of timers...

Posted by on 03 July 2016 - 02:25 PM

premature optimization


"Selecting the right algorithm" is not premature optimization; it's designing for performance.
Performance is a feature. You have to plan and build for it, at the high level.
The famous quote about premature optimization was in the context of worrying about things like whether a particular if statement would emit one extra instruction or not, and dropping to assembly to avoid it.
Hoare and Knuth would ABSOLUTELY tell you to choose the right algorithm for the particular problem you're trying to solve.

Finally: "add the element at the end, then sort the new array" is also bad advice, unless your sorting algorithm has explicit optimizations for sorting already-mostly-sorted containers.
Curiously, QuickSort could go quadratic in this case; meanwhile, most BubbleSort implementations may actually perform the expected O(N) work.
If you use a sorted array, you should expect to use a binary search (or linear scan from backwards) to find the location, and then do O(N) work to insert into the vector (moving the rest of the elements out of the way.)
If your array is large (thousands of elements, as suggested above,) then touching the N-sized amount of elements is bad, because it will evict a bunch of other data from L1 cache.
std::map<> will work much better in this case, as will std::priority_queue<>. It's the well-defined data structure intended exactly for this use case, so you should use it!

Put another way: Computers, when treated right, are really fast.
Every time you have to wait for a laggy computer, it's probably caused by someone not understanding the difference between "premature optimziation" and "designing for performance."


#5298757 Hundreds of timers...

Posted by on 01 July 2016 - 08:08 PM

Might the above be called 'flat' fibers ??


I don't understand the description of your system.

A fiber has a stack. Switching fibers involves saving the CPU registers to some memory, restoring other CPU registers for another fiber, changing the stack pointer to the new fiber, and finally jumping/returning to wherever that fiber was when it yielded.

It sounds a bit like you're simply describing having a lot of objects that get "poked" or "stepped" by a main thread, pulling state as needed. That's just object structure, not fibers.


#5298405 Hundreds of timers...

Posted by on 28 June 2016 - 10:49 AM

Also If you can avoid Threads and their overhead, do so - use fibers/microfibers (a software threading mechanism versus system resource hard threads)


The main cost of a system thread is the memory for the stack. You don't avoid that cost for a fiber. There are some other costs in pre-emptive threads, mainly having to do with the cost of kernel entry for context switch, but the difference in cost between fibers and threads is not an order of magnitude (except perhaps in some very synthetic cases.)


#5298249 Hundreds of timers...

Posted by on 27 June 2016 - 09:49 AM

Read from main memory will take 100+ clocks


{writes} ... are around 1 clock


Ah! You're looking at thread-blocking stall time. I'm looking at overall system throughput.

What will actually happen when you write, is that a cache line will be allocated for the write (which needs bus arbitration on multi-CPU systems,) and the bytes will be put in the right part of the cache line (or, for unaligned writes, cache lines.) Then a read will be issued, to fill the rest of the cache line. Then, once the cache line gets evicted, the memory bus will be used again, for putting the data back out.
At a minimum, this will take twice as long as reads. If you have data structures (or a heap) that alias cache lines to different pointers that different CPUs want to update, they will end up ping-ponging the cache line, costing more than 10x the "base" cost.
Similarly, unaligned writes that span cache line boundaries may cause significant additional slowdowns, depending on the specific (micro-)architecture. (Some RISC architectures disallow unaligned access for this reason.)

Btw: When a CPU thread stalls, waiting for a read to complete, a hyper-threaded CPU will switch to its next thread. Hopefully, that thread was previously stalled waiting for a cache line that has now loaded, so it can still do useful work.

The recommendation is still: If you can do a compare and a branch to avoid a write, that will improve your overall throughput if you are memory bound (which almost all workloads are.)

I think we're somewhat below the useful level of abstraction for timers based on priority queues, though :-)
And, finally: If you "step" each entity in your world each game tick anyway, you might as well just have an array of current attached timers inline in each object, and compare the current game time to the time in those timers; they don't need a queue separate from the collection of all game entities.


#5298152 In depth articles about fast paced multiplayer network architecture

Posted by on 26 June 2016 - 04:25 PM

I'm not aware of any such comprehensive resources, unfortunately. Mainly, because all of the different solutions put together, make for the better part of a game engine, so where all the pieces exist together, they will make up an engine (Source or Unreal or whatever.)
And, the few resources that exist for game engines, typically do not put networking as their main illustration, because networking is only used by a subset of games, AND it's pretty complex (as you are finding!)

The meta-advice I can give is to separate concerns between different pieces of code. In Object Oriented code, this should be trying to achieve the "single responsibility principle." In a more functional approach, keeping each function shorter/smaller, and composing functions to achieve the end goal, would achieve the same results.


#5298133 20Hz Server Performance Dilemma

Posted by on 26 June 2016 - 01:17 PM

You don't need to use TCP just because "a request cannot be dropped."
Re-sending the request over UDP until you receive acknowledgement can be equally effective.
Just make sure that, if you stop receiving anything back from the server, you stop trying after a little bit, else you'll end up with a system that makes congestion worse once it happens.

I personally think the mini $5.00 VPS way is ideal here.


If your game depends on low latency, don't say I didn't warn you!


#5298132 Server/Client architecture for turn-based card game developed with Unity3D

Posted by on 26 June 2016 - 01:13 PM

If some client doesn't send some kind of request, the server can not change any data about the match.


There are two ways of soliving this, for turn-based games.

One is to have some "timeout clock" that knows about active games, and makes a web request that "the clock is ticking for game X" every so often.
This clock process needs to run more permanently than you can do with a simple PHP request handler, and you need some kind of management of when this process dies so it restarts.
You'd also need to tell it about games when they start, and when they stop.

The other is to say that the game only advances when at least one player wants it to advance. Then, make each player request updated state X times per minute/second.
Because at least one player makes requests, the game can advance. When both players close their browsers, or there is a network outage for your server, or whatever -- the game will simply be frozen!
This could be seen as a feature, somewhat depending on the specific rules and implementation of the game.
 
 

I understood that Redis allow me (like the most of nosql db) to store and write/read a lot faster than a SQL.



Yes, storing things in RAM without a guaranteed commit to disk is a lot faster! The draw-back is that, if you crash, well, you didn't have a guaranteed commit, so the data may be lost.
Thus, it's usually a good idea to keep permanent data (log of games played, user account information, etc) in a durable transactional store (such as most SQL databases.)
If your game pace is slow enough, that polling for the game state out of SQL does not add undue burden on the database infrastructure, you don't need to use a key/value store for that, at least while starting out.
If there's a single table "current_games" that contain a primary key of game ID and a value of "serialized game state" as JSON or binary data, then that's super easy to migrate out to a key/value store later.
The main thing you lose is the automatic purging of old data which comes from the TTL values on Redis keys.
To implement that, you can have a cron which runs every 10 minutes and deletes rows that have not been modified in the last day, or whatever the right value is for your game.


#5298042 20Hz Server Performance Dilemma

Posted by on 25 June 2016 - 05:17 PM

The messages themselves are not a lot.
However, $5 VPS slices are generally run on highly oversubscribed hosts, and generally have some pretty bad scheduling jitter.
Also, you will typically find them on ISPs that may promise you several terabytes of transfer per month free, but the actual achievable throughput might be something like 100 KB per second, which won't actually let you get to those amounts.

Speaking of Node, it's single threaded, so if you host on a multi-core box, you'll want to have multiple instances run in multiple separate processes. This means each of them needs to listen to a differetn port.

Finally, if you use Nginx for load balancing, you're kind-of stuck with HTTP; you can't do UDP over that. Also, if your $5 VPS provider doesn't give you a guarantee that the load balancer and all game instances are on hte same subnet, you may get significant additional latency from doing that.


#5298041 Hundreds of timers...

Posted by on 25 June 2016 - 05:12 PM

Just because a MMO emulator, generally written by students with a lot of time but not much experience, does something, doesn't mean that that's the right way :-)
A priority queue has O(log N) or better behavior for insert-new-element and find-next-element-to-run. Typically implemented as a heap.

Typically, the code to run your timer events looks like:

  timestamp_t now = current_timestamp();
  while (!queue.empty() && queue.leftmost_item().timestamp <= now) {
    timer t = queue.pop_leftmost();
    t.execute();
  }
It really is that simple. What will take time is the work actually done by each timer function.
If your timer drives everything (not necessarily a good idea) then you might want to make timer execute() queue something to a pool of worker threads, but that's really only a win if the locking overhead is much less than the CPU cost of running all the events, which means that most events cannot have many dependencies on other parts of the system.


#5298008 Server/Client architecture for turn-based card game developed with Unity3D

Posted by on 25 June 2016 - 11:29 AM

Any server technology can be the back-end for a turn-based card game. Web servers/services are very diverse and you should pick one that uses an environment you are familiar with.
(Nginx/PHP, Tomcat/Java, Nodejs/Javascript, Rails/Ruby, Flask/Pyton, Wai/Haskell, IIS/C#, ...)
Separately, any game server can probably also be used for a turn based game. Photon, Bolt, Lidgren, Smartfox, and a bunch of others. If you go that route, you should go with the server technology that has he best client libraries for your engine, and perhaps also pay attention to the server OS (do you prefer Windows or Linux for 24/7 uptime systems?)

Now, if you use Steamworks for match making, you might be better off also using Steamworks for the networking. That, in turn, means you should probably choose a server platform that has a ready-made Steamworks integration, and/or makes it easy to link in C++ code.
Steamworks is great if you want to tie into the existing Steam social community. The matchmaking is OK, but not necessarily a feature that's so special that you have to mutate the whole rest of your architecture just to support it.

You have to figure out whether you actually want a persistent process per "current game" that deals with everything from game rules to player chat, or whether you want the game state to be network-attached in some back end (anything from memcached through Redis to SQL) so that you don't need stateful sessions, and solve chat using some other method (or polling.)
This, too, determines your approach.

The solution that's probably the easiest to implement and easiest to scale, but isn't necessarily the cheapest per hosted game at small scale, would be to store each game instance in Redis, use a simple polling-based protocol over HTTP that polls maybe 1 or 2 times a second, and make "what each person has said" a part of the game state, returned through polling.
The two main operations would then be "get me game state later than X" and "this is my player action at time Y." The front-end web service servers would receive the request, read the game state from Redis, update according to requests/rules, write back to Redis, and then return to the players.
This is a 100% stateless application server, which scales really, really, well. Also, as all game state lives in a single key (could be Redis, or Cassandra, or Riak, or perhaps even MySQL) this will scale (shard) horizontally in a trivial way.
A back-end with time-to-live support (such as Redis EXPIRE keys) lets you forget about a game if neither player has made any move for some number of minutes/hours.




PARTNERS