The mystery of the spike

Started by
6 comments, last by Kaptein 10 years, 7 months ago

I'm coding a game client, which now and then have huuuuuuge spikes

I can run around for minutes, and suddenly there's a 180ms spike for no reason at all

I've found where it happens, but consider this:

The area where it happens is something that runs ALOT, compiling hundreds of meshes per second

I just call it the precompiler, which prepares meshes, which is then transferred to rendering thread and "compiled" (uploaded) there

I generate the meshes in smaller chunks, which is then compiled on the rendering side to bigger based on certain criterias

This is done for obvious reasons: Time and the ability to re-use parts of the bigger structure later on enabling very fast modification of terrain

Anyways,

I'm sure everyone here has had their fair share of "interesting" problems with threadpools and mesh generation

Anyone have any tips for what i should look for?

The timing numbers are 99.8% what I'd expect them to be, within the time limit which i set out to keep myself within

Most (99%) run within the 0.0125 second mark, while 0.8% are slightly above (which is FINE, i know i can do better!)

However, there's that odd giga-spike which is ... insanely long, in the 100-200ms area

The threadpool I'm using is (or seems very,) professional:

http://www.hlnum.org/english/projects/tools/threadpool/doc.html

I schedule N jobs (mostly 1-4 depending on the preferences of the player/user), then i finish them immediately

I imagine its that latter "finish immediately" part that people will raise eyebrows at

Unfortunately for me, It's downright impossible to wait for the jobs to finish by themselves.. It simply won't work

The rest of the engine needs to be able to modify that data at ANY time in ANY way, including removing it completely

(Or even moving the data around)

Any ideas?

If you haven't had any problems like this before, I understand... :S
Advertisement

What does it mean, exactly, to "finish" a job immediately instead of waiting for a job?

It means I wait for all the N threads to finish, and don't continue executing code on the "physics" thread until they're completely done

I schedule 4 threads, wait for 4 threads, continue. That's the gist of it

I realize I probably shouldn't be doing that to begin with, but I haven't found a solution that allows me to not do that

The threads' jobs are completely self-contained, but the data they carry has meaning only if the world is the same when they're done AND during the job

Come to think of it, maybe i should "finish" the job after the threads are done by themselves... hmm

That way i could validate that the work they've been doing is still relevant

*sigh*

It still doesn't explain 200ms spike :P

so you're generating meshes on the fly. i do this too. a terrain chunk is 4 ground meshes composed of 225 quads each, plus up to about 5000 instances of mesh primitives for rocks, plants, trees, etc.

and you split it into 4 threads to speed it up. understandable - i too experience a slight delay when generating a chunk. i was considering implementing multi-frame chunk generation - but perhaps not, the delay is not really that bad. too short for me to even display a quick "loading..." message! <g>.

and you get an intermittent weird spike in execution time when the 4 threads are running? all 4? always all 4? or just some? are you SURE the 4 are unrelated? no stalls possible? what about outside interference from the OS or another task?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Well, the spikes are happening even from the chunk loader which uses a decompressor

The decompressor itself always uses absolutely no time at all, but the "worldbuilder" as a whole sometimes has a super spike too

And all it does is file/io in a very simple manner: Read a linear segment of data from disk, decompress it, check if time to bail out is close etc.

I'm at that point where the entire engine has exits for when time runs out :)

I'm starting to think that it's a problem with my computer!

My testers are not having these gigaspikes, but they do have spikes.. regular spikes in the 20-40ms area

Precomp delta: 0.040641

the biggest one from my testers is 40ms.. thats noticeable during gameplay, and it's probably because threads are what they are :P scheduled and all that

but its not something that has my jaw dropping and sitting day & night looking at the code... :P

So, if the problem is my computer, then I wouldn't know where to start :P

I know enough about OSes to know that the next step would be to reinstall drivers and pray

Captain, 200ms iceberg ahead!

So, if the problem is my computer, then I wouldn't know where to start tongue.png

I know enough about OSes to know that the next step would be to reinstall drivers and pray

Captain, 200ms iceberg ahead!

What about the step of trying to catch one of these 200ms spikes in a profiler to see what is going on?

Have you tried reading all of the data in one go or maybe bigger chunks? I'm wondering if you are getting disk seeks from other processes while you are decompressing. Causing your process to constantly have to seek back to the point in the file it left off when it goes back for more. I've found that file I/O is a get in, do the job, and get out process with no lollygagging inbetween. I could be way off base though and I wouldn't consider myself an expert on file systems.

Also, is your hard drive severely fragmented? This will cause poor performance with file I/O as well.

The profiling tip was helpful, and I learned a great deal using it

Especially about inlining empty constructors :) One would think the compiler could optimize it away, but not if it could not see the implementation!

Anyways, the spikes were actually caused by my own OpenGL interface:

I was deleting and generating alot of objects, which does tend to cost alot of time

And after I "fixed" that part, I decided to not delete objects, but just set the size to 0.

Well, It turns out it didn't "null out" the data all the time, which caused GPU memory to grow and eventually it would get laggy again

After i consistently sat size to 0, most of my problems went away.

I finally fixed the empty constructor things, crossing out one thing after another on the profiler list

And I finally decided not to deallocate data I didn't use. It turns out the memory of the application didn't grow too much for me to absolutely have to do something about it

That was the final nail in the coffin for my spikes!

Nightmare is over :)

This topic is closed to new replies.

Advertisement