Jump to content

  • Log In with Google      Sign In   
  • Create Account

hplus0603

Member Since 03 Jun 2003
Offline Last Active Yesterday, 11:25 PM

#5302381 Best Client/server Architecture For A Mobile Management Game?

Posted by hplus0603 on 24 July 2016 - 04:54 PM

User A buys something on City 1 and so the price changed.
Users B,C,D and E, who the server has in memory that are looking at that city right now, recieve a pushed message from the server notifying of said price change.


You can totally do that over long-polling over HTTP. Our in-RAM message queue / bus system can push messages to online users through TCP sockets, websocket sockets, or HTTP long-polling, through different gateways for each method. To the server, it all looks the same: "post this message on this topic."

Separately, if a user app is not in the foreground, then on iOS, you *have* to use polled communications and/or mobile push; you don't get a persistent connection while backgrounded. On Android, you could have a persistent connection from a Service, but doing so will likely drain the battery faster.

I cannot loose a user buying/selling if the server goes down


I did suggest that important actions like trade should go to the database, so it sounds like we agree. As long as players don't trade "often" (dozens of times per second, or whatever,) then a database would be reasonable. They can always be horizontally sharded if necessary (at a cost, of course -- it's reducing cost that's the main goal here.)

Separately, if all players lose the same amount of time in a crash, and that amount of time is never more than 15 minutes, how bad is that? Only you can decide how much those kinds of trade-offs are worth.


#5302341 Best Client/server Architecture For A Mobile Management Game?

Posted by hplus0603 on 24 July 2016 - 02:34 PM

So, first, a dozen queries every 5 seconds isn't all that much. You can easily do 1000 parallel players on a single MySQL instance that way, making sure that you have SSDs and/or enough RAM for caching/indexing.
Once you run out of that, you can presumably horizontally shard the game.

Second, if the game is turn based, you can also look to in-RAM storage, like Redis, or even memcached, to get much faster querying. The draw-back is that you have to denormalize your data; ideally to the point where all state for a particular game instance is in a single data blob.

Third, why do you say HTTP won't work for the new Unity based game? Unity has the WWW and the UnityWebClient classes, so you can reasonably easily make it work. Especially if you serve on node.js or a similar server that makes it easy to "pend" incoming HTTP calls until there is actually data, or it times out. (Long polling)

Fourth, just embedding a webview may not be fast enough, but there might be other options. Anything from "compile javascript to Objective C" to "make your core website responsive and just install a shortcut used by the system browser on mobile." You may have pushed far enough along this path that you can tell for sure this won't work; or not; it's hard to tell from the data you gave.

Fifth, and now we get to the question you're asking, assuming the previous bits are already dead ends:

When building games with "current" game state, you don't want to store "current" game state in a database at all; you want to store it in RAM.
If the state is trivial to marshal/de-marshal (as it would be for a game of chess, or poker, say,) then you can store it in network-attached RAM, such as memcached or Redis. Each time someone wants to mutate the state, they do a transaction on the state by slurping, mutating, and writing back. This is trivial to shard horizontally if you get very successful, both the front end servers, and the back-end storage.
Once a game has an outcome, user information should be updated in a persistent database of some sort; similar for if you have consumables in the game.

If the rule set is more complex, or the simulation data is more complex, or you iterate the simulation in real time (think: physically simulated world,) then you don't want to demarshal/re-marshal for each simulation.
At that point, you inflate the game world/level into RAM in a game server, and route all player connections to the correct server process. The easiest way to do this is have users send their traffic/connect to a gateway, which knows where each game instance runs on the back end (again, assuming you're going to be successful and need to shard the game hosts.)
The hosts can periodically checkpoint state back to databases; they can also delegate to databases for very important stuff like player trade, commerce, etc, but most game updates happen in RAM, and if the server instance crashes, you lose whatever changes happened since the last save/checkpoint.

You can implement this on top of raw TCP sockets, or on top of websockets, or on top of HTTP long-poll, and it will work approximately the same (with some increased latency for each step to the right along that chain.)
Note that you can still implement a "message queue" of messages from server to client over HTTP. The client will simply make sure to always have a HTTP request outstanding to the server. When it comes in to the server, if the queue is empty, the server pends the request (doesn't send a response) until there is data; if there is data, the server immediately removes/returns it.
This doesn't mean the clients are "asking for it" -- it just means that you build an event transport on top of a double-buffered polled underlying network protocol.

Which option is the best? Depends on where you want to run clients, what client libraries you want to use, what server libraries you want to use, and how low latency your game really needs.

Finally, management games typically do NOT want to keep an instance per player running. Thus, they aren't really able to push messages to clients, except perhaps if you have a queue of alarms/timers where something happens on behalf of the player at a certain point in time.
Specifically, if the player starts action X, which will take 6 minutes, you don't actually keep that around for six minutes in a game server; instead you store "this thing will be done at time Y" and use the client to draw the progress bar. When the user asks for "what's the current state," you can wind time forward from the last stored state to the current time, return that, and store it back in the database. The wind-forward will be super fast for any typical management game.


#5301092 Arcade Physics Movement W/ Latency

Posted by hplus0603 on 17 July 2016 - 11:14 AM

The easiest way to calculate the current time step you should be at, is to store the high-precision real time that the level started at.
Then, subtract start time from current high-precision time, divide by time step length, this gives you the time step number you should be at.
Now, if your step counter is less than the target, step physics repeatedly until you catch up. You may want to put an upper limit so you never step more than, say, 30 steps before rendering again.
You can do this on the server, as well as on the client. However, because client clocks aren't generally in sync with the server, it's typically better to just send the target step number to clients from the server as part of the update packets.
There's another bit of optimization you'll typically do where the client will send its actual step number (or clock) to the server in each packet, and the server will send its step number (or clock) to the client in each packet, as well as the timestep/clock that the client provided in the last-received packet.
This allows the client to calculate approximately how much latency it should compensate for, or, put another way, now much offset to add to its local clock.

To read a high-resolution clock:
On Linux, use clock_gettime().
On Windows, use QueryPerformanceCounter().


#5301037 Arcade Physics Movement W/ Latency

Posted by hplus0603 on 16 July 2016 - 10:21 PM

You need to fix your time step, and count all times in step numbers.
When the player takes too long to render, you must step simulation more than once.
If you can't step simulation at real time speed, the player's computer is too slow, and needs to be dropped from the game.


#5301031 Arcade Physics Movement W/ Latency

Posted by hplus0603 on 16 July 2016 - 07:32 PM

Can I still use client prediction with the first approach so I can continue to have smooth movement?


Well, kind-of. You are no longer deterministic. The player will move based on delayed information about the world.
If two players race for a ground pick-up, they will both think that they are way ahead of the other (by one RTT) even if they are neck-to-neck.
You will need to detect and correct any cases where the forward-simulated player is out of sync with the world, which is generally any time that player interacts with another player.
The deterministic model isn't that great if you don't go all in, IMO. If you can't take the latency, and if the players interact a lot, then model 1 kind-of doesn't fit.
That being said, model 1 is actually the model I have the most experience with, including forward-projecting players, and including doing crazy stuff like letting the player upper body / aim / head move locally at ahead time, but the lower body / feet / center of mass move based on another player, which is useful for sitting as a passenger in a vehicle.
If I were to build a new networked game from scratch, and I couldn't take the latency of "pure" model 1, I'd use model 2 if I didn't need to worry about cheating (which most hobby games don't, actually!) and #3 if it were a competitive FPS type game or something like it.

However, your game may be different. Only you can make the choice for how you want your game to play, and which of the options work best for you. The fact that you actually know you can't make it perfect is more than half the battle won already :-)


#5300990 Arcade Physics Movement W/ Latency

Posted by hplus0603 on 16 July 2016 - 11:55 AM

The problem is latency.


Welcome to networking! :-)


You have three options:

1) Delay response to commands until you hear back from the server. That way, you know that everyone runs all the commands in the same order/time, so you can assume that the world is consistent. This is the "deterministic lockstep" approach, and is most popular for RTS games, but works for other systems too. When you get it working, it solves pretty much all problems -- but you have to accept the command latency!

2) Trust the client. If my client says that I hit you with a shot, then send that information to the other clients. If the hit player saw something else, spawn a new shot right where they're at so they know what hit them. This can get somewhat confusing, and is super open to cheating, but can also work well.

3) Run physics on the server, let the server determine what "actually" happens, and keep correcting/compensating the clients. This is what most FPS games do, because it hides the latency, and removes confusion/cheating, but instead sometimes causes "I got shot behind a wall" problems, and requires physics simulation on a server. (Note: Server could be user hosted)


#5299920 How do you deal with shared data and work load balancing?

Posted by hplus0603 on 09 July 2016 - 05:48 PM

I'm creating a thread whenever I need some system updated periodically.


You'd likely want to create a few threads (as many as there are cores) up front, and then "post" work to do to those threads, rather than creating new threads, or thread-per-subsystem.
That way, "work to do" is always well described as a work record (you're forced to structure it this way) which means you can easily-ish scale the number of threads between one and N.

Separately, do you actually need multiple threads?
Have you written a networked game before?
Did it run out of CPU in a single thread?
If so, where did the profiler say that you were spending your time?

In general, before you can build a "correctly" threaded application, you need to have some idea of where the problems will be.
For example: Do you have a lot of collision pairs to test in physics? Those could easily be parallelized.
Do you have a lot of database/file data to look up for each action request? Then parallelizing isn't the solution; asynchronizing is.
Without knowing exactly what your problem is, you can't possibly hope to effectively address it!
(And you'll have even lower chance of getting good advice from the internet if you can't describe it ...)

So, typically, the main problems that need CPU on the server are related to physics/entity simulation. Let's assume that's what you're going for.
In general, to make a multi-threaded game update engine, you need to structure all computation to be double-buffered.
During a particular time step, all entities/objects/systems are allowed to ONLY READ from the "current" copy, and ONLY WRITE to the "next" copy.
As long as you won't have multiple threads wanting to write to the same output state/copy at the same time, you will then not need locks.
Once the update has completed, you will have a short time of single-threaded-ness where the "current" and "next" states are swapped.
This may be as short as flipping an "current index" global variable between the values 0 and 1!

Separately, you will be generating messages to send to other objects/nodes/whatever.
You could use a lock-free FIFO for those messages.
Make it big enough that you will never generate more messages in a single step than there is space in the FIFO.
That way, you won't end up dropping any messages in the lockless phase.


#5299407 how rsa works on texts and signals?

Posted by hplus0603 on 06 July 2016 - 05:39 PM

All data in computers is just bytes in memory. (Or on disk.)
Those bytes in memory can be text (in various encodings -- ASCII, Latin-1, UTF-8, UTF-16, ...) or audio (in various formats) or video or executables or whatever.
Because those bytes are numbers, any encryption that can turn numbers into other numbers, and then decrypt by turning those other numbers back into the original numbers, can be used to encrypt whatever is stored in memory (or on disk.)
This is pretty basic computer science, though, so I suggest you read up on computer architecture using a good textbook of some sort.
Like perhaps http://amzn.to/29PIM2M


#5299082 Hundreds of timers...

Posted by hplus0603 on 04 July 2016 - 11:06 PM

this is a narrow interface, so try the simple thing first, and if it doesn't scale, swap it out


That's exactly the kind of thinking that leads to slow performance everywhere and the profile doesn't show you any obvious place to improve.
An experienced programmer will pick the right option up front, and thus doesn't need any re-work later.
std::priority_queue<> is pretty easy to use; std::map<> is trivial to use. Both will perform very well. Pick one of those, rather than something obviously bad, if you already know the volume of data.
 
Here's the declaration for priority_queue<>:

    typedef std::pair<my_time_type, my_action_type *> queue_item;
    std::priority_queue<queue_item, std::vector<queue_item>, std::greater<queue_item> >  my_timer_queue;

    // inserting in queue
    my_timer_queue.push(queue_item(time, action));

    // checking which items should run
    while (!my_timer_queue.empty() && my_timer_queue.top().first <= current_time) {
        auto ptr = my_timer_queue.top().second;
        my_timer_queue.pop();
        ptr->timer_expired();
    }
This will manage a heap in a linear container (vector,) which additionally has the potential cache benefits suggested above.
It will use log N insertion and removal times, and finding the next time to run (the time of the earliest expiring timer) is constant time.
This is very likely as close to optimal as you can get, and it's very simple to use.

Some other languages do not have well-behaved heap data structures, though; there you may have to sort an array yourself.
Make sure to test it with large numbers so you can tell whether the sort function you're using is pessimal or not.

every time your computer crashes, it's often caused by someone who chose a higher-performing and harder to maintain system


You're saying that choosing a well performing algorithm will cause crashes, which on the face of it seems like an absurd statement.

I'd view that as someone biting off more than they can chew. Which is also very, very, common -- although schedule pressure is often as much a reason for that as programmer competence.

That's a very different discussion, though :-)


#5298921 Keep checking credentials on Rest API?

Posted by hplus0603 on 03 July 2016 - 02:29 PM

This is no different from logging in to a website. On a website, you get a session cookie, which typically is something like "userid:expirytime:unique-id:hmac(userid,expirytime,unique-id,key)."

You can issue a similar session token to the users of your REST API. In fact, because you already use HTTP, you can still use cookies. (For UDP based connections, you'll need to manage this cookie at your own protocol layer.)

It's important that you do verify the cookie hmac, and that you do verify the timeout/expiry time. You will also want to issue a new session token when the expiry time gets close, if you want the user to keep the session alive.

Another option is to use OAuth for your API calls. However, as maunovaha suggests, implementations of that typically require looking up the token in some database. You could make that database be memcached or Redis, to make it fast, but it's still slower than "verify the hash."


#5298918 Hundreds of timers...

Posted by hplus0603 on 03 July 2016 - 02:25 PM

premature optimization


"Selecting the right algorithm" is not premature optimization; it's designing for performance.
Performance is a feature. You have to plan and build for it, at the high level.
The famous quote about premature optimization was in the context of worrying about things like whether a particular if statement would emit one extra instruction or not, and dropping to assembly to avoid it.
Hoare and Knuth would ABSOLUTELY tell you to choose the right algorithm for the particular problem you're trying to solve.

Finally: "add the element at the end, then sort the new array" is also bad advice, unless your sorting algorithm has explicit optimizations for sorting already-mostly-sorted containers.
Curiously, QuickSort could go quadratic in this case; meanwhile, most BubbleSort implementations may actually perform the expected O(N) work.
If you use a sorted array, you should expect to use a binary search (or linear scan from backwards) to find the location, and then do O(N) work to insert into the vector (moving the rest of the elements out of the way.)
If your array is large (thousands of elements, as suggested above,) then touching the N-sized amount of elements is bad, because it will evict a bunch of other data from L1 cache.
std::map<> will work much better in this case, as will std::priority_queue<>. It's the well-defined data structure intended exactly for this use case, so you should use it!

Put another way: Computers, when treated right, are really fast.
Every time you have to wait for a laggy computer, it's probably caused by someone not understanding the difference between "premature optimziation" and "designing for performance."


#5298757 Hundreds of timers...

Posted by hplus0603 on 01 July 2016 - 08:08 PM

Might the above be called 'flat' fibers ??


I don't understand the description of your system.

A fiber has a stack. Switching fibers involves saving the CPU registers to some memory, restoring other CPU registers for another fiber, changing the stack pointer to the new fiber, and finally jumping/returning to wherever that fiber was when it yielded.

It sounds a bit like you're simply describing having a lot of objects that get "poked" or "stepped" by a main thread, pulling state as needed. That's just object structure, not fibers.


#5298405 Hundreds of timers...

Posted by hplus0603 on 28 June 2016 - 10:49 AM

Also If you can avoid Threads and their overhead, do so - use fibers/microfibers (a software threading mechanism versus system resource hard threads)


The main cost of a system thread is the memory for the stack. You don't avoid that cost for a fiber. There are some other costs in pre-emptive threads, mainly having to do with the cost of kernel entry for context switch, but the difference in cost between fibers and threads is not an order of magnitude (except perhaps in some very synthetic cases.)


#5298249 Hundreds of timers...

Posted by hplus0603 on 27 June 2016 - 09:49 AM

Read from main memory will take 100+ clocks


{writes} ... are around 1 clock


Ah! You're looking at thread-blocking stall time. I'm looking at overall system throughput.

What will actually happen when you write, is that a cache line will be allocated for the write (which needs bus arbitration on multi-CPU systems,) and the bytes will be put in the right part of the cache line (or, for unaligned writes, cache lines.) Then a read will be issued, to fill the rest of the cache line. Then, once the cache line gets evicted, the memory bus will be used again, for putting the data back out.
At a minimum, this will take twice as long as reads. If you have data structures (or a heap) that alias cache lines to different pointers that different CPUs want to update, they will end up ping-ponging the cache line, costing more than 10x the "base" cost.
Similarly, unaligned writes that span cache line boundaries may cause significant additional slowdowns, depending on the specific (micro-)architecture. (Some RISC architectures disallow unaligned access for this reason.)

Btw: When a CPU thread stalls, waiting for a read to complete, a hyper-threaded CPU will switch to its next thread. Hopefully, that thread was previously stalled waiting for a cache line that has now loaded, so it can still do useful work.

The recommendation is still: If you can do a compare and a branch to avoid a write, that will improve your overall throughput if you are memory bound (which almost all workloads are.)

I think we're somewhat below the useful level of abstraction for timers based on priority queues, though :-)
And, finally: If you "step" each entity in your world each game tick anyway, you might as well just have an array of current attached timers inline in each object, and compare the current game time to the time in those timers; they don't need a queue separate from the collection of all game entities.


#5298152 In depth articles about fast paced multiplayer network architecture

Posted by hplus0603 on 26 June 2016 - 04:25 PM

I'm not aware of any such comprehensive resources, unfortunately. Mainly, because all of the different solutions put together, make for the better part of a game engine, so where all the pieces exist together, they will make up an engine (Source or Unreal or whatever.)
And, the few resources that exist for game engines, typically do not put networking as their main illustration, because networking is only used by a subset of games, AND it's pretty complex (as you are finding!)

The meta-advice I can give is to separate concerns between different pieces of code. In Object Oriented code, this should be trying to achieve the "single responsibility principle." In a more functional approach, keeping each function shorter/smaller, and composing functions to achieve the end goal, would achieve the same results.




PARTNERS