• 9
• 9
• 11
• 12
• 9

Hundreds of timers...

This topic is 586 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hello.

Let's put a player in an MMO, fighting against something/somebody. He has maybe 2 or 3+ buffs, 2 or 3+ DOTs, 2 or 3+ actions waiting for countdown, maybe 2 or 3+ things more waiting too. That's more or less 10/15 timers. Put now 300 or 500 players in the game and you get 5000+ timers !

What are the usual means to manage such a lot of timers ?

Thank you !

Thorna

Share on other sites

I have to agree that 5000 isn't too bad.Besides the queue solution you could store a 'tick count' variable where ever you store the buff data indicating the ending of buff/DOT/etc. and have a dedicated thread checking if the duration has ended. I mean a single thread for all buff timers btw not a thread for each buff. Something like:

struct BuffData
{
// Other stuff
unsigned long long duration_end;
};

{
while(1)
{
// Get buff data and stuff
if(current_ticks >= buff_data->duration_end)
; // Remove buff or push a job to some queue if you don't want to spend time here
}
return 0;
}


This worked pretty decent in my case but if you have a bunch of similar problems it might be a better idea to use a job queue for simplicity and development speed.

Edited by AcarX

Share on other sites

5000 integer compares? [...] you could probably reach about 6 per clock cycle, or under 300 nanoseconds. [...] It is about as close to a zero-cost as you can get with actual work.

Well...300ns is a lot cheaper than what I was afraid of !

Yes, a priority queue is absolutely the right way to do this. [...] store either "time at which I'm done," or perhaps "time when I started, and how many ticks from start it will go away"

Well, I asked the question after looking at quite a lot of code in many MMO emulators and it has been my feeling that there was a lot of different ways to manage those timers. My first idea was to ask the OS for a timer each time I needed one but that didn't look well (just a feeling...).

In a huge C# project (DOL), there is a very complicated system of many lists and cache and a dedicated thread which job is only to run through all (I said 5000...) "timer object" and executing callbacks when needed.

In another C++ project (mangos), I didn't find any peace of code dedicated only to run the timers but an "update" call propagated to all objects.... But I don't feel comfortable with C++ and I may have missed something.

I'm going to spend some time in a priority queue.

Thank you.

Share on other sites

5000 integer compares? If they're in a linear array, and on an x86 chip (with its out-of-order core and clock-speed multiplied integer comparisons) you could probably reach about 6 per clock cycle, or under 300 nanoseconds.

Most likely, it will need not only compares, but also conditional jumps, and each conditional jump is a potential pipeline stall, costing around like 10-20 clocks each :-(.

Yes, a priority queue is absolutely the right way to do this.

Sure, the only question being "what kind of priority queue we're speaking about". If it is something like std::priority_queue<>, it will get a fair O(log(N)) share of pipeline stalls (and probably cache misses too) on inserts. Linear scan-based stuff will be technically O(N), but will be faster for small Ns due to better prefetching (for integers, below N=50 linear scans will be faster almost for sure, above N=500 - very unlikely, in between - It Depends). For OP's N=5000, I'd probably bet on something along the lines of std::priority_queue<>. Then, my Really Wild guesstimate (give or take an order of magnitude) on insert/extract time would be around 1000 clocks (13 jumps on memory with last 4-5 of them being really close and fitting within the same cache line, plus 13 conditional jumps with (?) half of them causing stalls, plus some other unspecified stuff which always happens). If somebody tests it - please share results as it is always interesting to see how much such estimates are off (though within an order of magnitude it should be fine, I hope) ;-).

Writes to memory cost more than just reads, by a pretty large factor.

While I do wholeheartedly support the idea of using "happening at" semantics for "when it is going to happen" (instead of "happening in" semantics), I have to disagree about write costs. As a rule of thumb (though not without its exceptions), writes are _faster_ than reads. Read from main memory will take 100+ clocks (and L3-L2-L1 reads will take around 40-12-4 clocks respectively); writes on modern CPUs usually (give or take, and NOT when you're writing 100% of the time) are around 1 clock (the write is just stored in hardware and most of the time you don't care when it really reaches cache, or whether it causes cache eviction); sure, there are exceptions (after all, it is possible to overload memory bus, and there are also other considerations like inter-core cache invalidation too), but overall - for a normal program it is reads which are expensive (though once again - I am NOT advocating for unnecessary writes).

Edited by No-Bugs Hare

Share on other sites

Read from main memory will take 100+ clocks

{writes} ... are around 1 clock

Ah! You're looking at thread-blocking stall time. I'm looking at overall system throughput.

What will actually happen when you write, is that a cache line will be allocated for the write (which needs bus arbitration on multi-CPU systems,) and the bytes will be put in the right part of the cache line (or, for unaligned writes, cache lines.) Then a read will be issued, to fill the rest of the cache line. Then, once the cache line gets evicted, the memory bus will be used again, for putting the data back out.
At a minimum, this will take twice as long as reads. If you have data structures (or a heap) that alias cache lines to different pointers that different CPUs want to update, they will end up ping-ponging the cache line, costing more than 10x the "base" cost.
Similarly, unaligned writes that span cache line boundaries may cause significant additional slowdowns, depending on the specific (micro-)architecture. (Some RISC architectures disallow unaligned access for this reason.)

Btw: When a CPU thread stalls, waiting for a read to complete, a hyper-threaded CPU will switch to its next thread. Hopefully, that thread was previously stalled waiting for a cache line that has now loaded, so it can still do useful work.

The recommendation is still: If you can do a compare and a branch to avoid a write, that will improve your overall throughput if you are memory bound (which almost all workloads are.)

I think we're somewhat below the useful level of abstraction for timers based on priority queues, though :-)
And, finally: If you "step" each entity in your world each game tick anyway, you might as well just have an array of current attached timers inline in each object, and compare the current game time to the time in those timers; they don't need a queue separate from the collection of all game entities.