Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 16 Apr 2007
Offline Last Active Nov 06 2013 07:59 PM

#5104177 Direct3D 11 Present makes up 98% of FPS ?

Posted by on 24 October 2013 - 01:46 PM

Okay but this is very slow isn't it ?? I have the cascade shadow demo from the smaple browser and have there about 400 fps with shadows and light, a very big scene and textures. And here is only a little mesh with 1 texture which takes so long ....


No, because it's proportional. If you are just issuing a call to render a single mesh, that is very, very fast. Present may be slower, but it only seems so expensive because it's it's slower than doing a very fast thing.

For a real world comparison, this is similar to saying that a donut at 50 cents is expensive because a stick of gum only costs 5 cents. The donut is not expensive, they gum is just insanely cheap.

#5104174 Main loop timing

Posted by on 24 October 2013 - 01:36 PM

I used sleep() and on a high-end CPU it worked fine but on an other machine it produced weird effects. It was because it always woke up much later as it was expected. I also heard others say that sleep is unreliable. ....so I dont use it anymore, except when the game is paused.

Was the high end machine running win 8? Win 8 waits the exact amount of time (within certain limitations) that the next sleeping thread needs to wake up, so it isn't dependent on any timer resolution like previosu versions.

("PerformanceCounter". I tried this: on one machine it returned negative values, on an other one it was going back and forth between two values. So I use it only for profiling and I use timeGetTime for the game time.)

This is pretty normal. QueryPerformanceCounter isn't reliable on all CPUS, so even if you do use it for loop timing you need to back it up with some other less precise timer (like timeGetTime()) than can let you know when QueryPerformanceCounter is grossly incorrect.

Back to L. Spiro.

I am an R&D programmer at tri-Ace, where I work on this engine:

I know, I've seen you around. I wasn't trying to make an argument from authority, I was just starting to feel that maybe you thought I was a hobbyist spouting off. Which probably backfired because then you probably thought that that was what I thought. I also do heavy lifting engine work, btw, although not on a dedicated R&D team.

Keeping this in mind, I can hardly believe you even said the following:

After all my efforts, if I were to then discover that some idiot wasted a whole millisecond of a frame sleeping for no reason, I kid you not I would punch him or her in the face. Twice.

I didn't say that, I said in the case where you already had oodles of spare CPU time. That's not "no reason", and it's not happening when that spare ms would hurt your mesh skinning because in that case there's no oodles of spare CPU cycles.

Your linked mech skinning example, btw, being one where I absolutely would never sleep, as you're on a console and know you're already sucking up every available cycle, and always will be. This though is apples and oranges, you're countering an argument I've made for games on Windows or phones with a console game. They are different environments, they have different engineering considerations.

 I'm not sure if I'm not explaining things very well or what, but please try to understand what I'm saying. I am not talking about consoles. I am not talking about systems which are always just squeaking by. I am not talking about sleeping when you have no time to spare. I'm saying that battery life/heat is really, really important for usability on platforms that are inherently backwards compatible and likely to be portable. As I have said before, many times, everything you have said is absolutely, one hundred percent true, *under certain circumstances*. Console development being one of them. But I think you do hobbyists and students a great disservice by dismissing concerns that actually are part of the platforms they are most likely to be developing on.

If FTL or the Duck Tales remake were to peg my CPU at 100% while I'm running in a window (no vsync), even if they only needed 25-30% of a core to be smooth, I would be very, very sad, and would not consider that to be a good engineering decision. Likewise, as cutting edge titles age, they will also take a smaller and smaller amount of CPU time. I don't want Quake2 to be pegging my modern laptop CPU at 100% when it now only needs 10% to be smooth. It's brainless engineering to get the best of both worlds by simply not calling sleep if you don't have cycles sleep. If you want to add a checkbox to toggle this, go for it, but as a usability thing, I'd leave it limited by default.

Anyway, I think I've explained my reasons about as well as I can at this point.

#5103923 Main loop timing

Posted by on 23 October 2013 - 04:54 PM

I don’t know what you mean by the limit of your updates. It may help if we all use a consistent set of terms: Render Update, Logical Update, and Game Loop. I assume you mean a logical update.


Yes, logical update. Rendering more frames than your logical update rate gives the user no additional information. Sure, you can make it smoother (to the limits of max(refresh rate, human eye perception), unless vsync is off and then you are racing tears), but if your logical update is 30 and you are rendering at 60, the user can't actually influence the world and the actors won't update themselves any faster than 30.


#1: I disagree with “strongly disagreeing to it”. I agree with “giving the player options”, and what you suggest is “taking away options”. In fact, the game engineers typically try to max out the CPU’s performance, using all available resources for whatever they can. And if you are playing a game, the performance of the rest of the applications on you machine don’t really matter unless the game is minimized, in which case yes, I do wait 10 milliseconds between game loops to give back the CPU power. As far as real-time caps go there are 2 things to consider:
-> #a: Too much power can overheat the system and fry parts. So the motivation for a cap is not related to refresh rates or starving other applications etc., it is about not frying the system.
-> #b: Therefor any cap at all should be based on getting the maximum performance out of the CPU without physically killing it. Which is extremely rare these days, and there are often system-wide settings the user can enable to prevent this. Do not force a cap on the user unless it is in the multiple hundreds of FPS’s such that no human eye can detect the difference. There are plenty of things people can do themselves, if and only if necessary, without you forcing it on them.


.All valid points, the problem is that I think you are forgetting that the OPs intent is to lower this CPU usage. He's already told us he has plenty of CPU time left over between updates, and he doesn't want to peg the CPU. For bleeding edge, AAA, barely running on a platform games, everything that you suggested applies in force. I don't think that's what we're dealing with here though. For smaller, less CPU intensive games of the usual hobbyest or indie flavor (which I assume is what we are looking at here), getting things looking good while not destroying the laptop batteries of your casual audience is very important to user experience. I also love choices, but a lot of users don't understand the implications of those choices. In this case, I see very little benefit to allowing hundreds of fps if the updates are fixed and the display has a cap of what it can show the user anyways. It doesn't matter how fast your eye is if the transmitting medium is only feeding them so fast. Tweaking your settings to get the best possible performance out of your latest big game is great and an important tool, but when performance of your game already fits nicely in modern machines with cycles to spare they are not nearly as important.

(I am myself a gameplay and systems engineer professionally, btw, not a hobbyist).

Your question is deceptively broad, so there are many things to say in reply.

My apologies, I have asked it poorly then. I did intend to ask "Is there a better way to sleep for a more accurate time?", as you hit on.

With that made clear, and then to restate your question as, “Is there a better way to sleep for a more accurate time?”, the answer is No. Which makes it easy to draw the wrong conclusion—it would be easy to misunderstand and decide, “Then I guess that’s that—increase the timer resolution and sleep.” Do draw the correct conclusion, we need to keep deducing.

And here I must disagree again, because your conclusion is correct in the wrong circumstances. For contemporary AAA games, yes, you probably have the CPU saturated and sleeping is a moot point, waiting is much better. For smaller games, however, as I said above, this is not the case, and other factors become equally if not more important to consider than raw FPS.

However, I would argue this counterpoint even in the AAA case. Today's big budget CPU hogging games are tomorrows throw on your portable and go titles. I would always code a game to sleep when you have oodles of spare time and just wait otherwise. This way you get the best of both worlds. Today, you get performance, but tomorrow, you get to take Quake2 with you on the plane and catch a few games without destroying your battery at a rapid rate.

The game loop should require much finer granularity and reliability, this waiting is the correct solution.

Current consoles do around 30FPS in most games overall, but frame variability is insane. Sometimes your frames come in under 16ms because nothing interesting is happening on screen, sometimes you need to calculate a path or init an expensive AI behavior and you'll spike 150ms+ for a frame. This is also true on PC, although the numbers tend to be tighter because the hardware is so much better. This is neither granular nor terribly reliable, but it is the reality. Really the only thing that matters is that any invariability is imperceptible to the user, and a tick granularity of 1ms is well, well below that. If you get a few ms ahead here or a few ms behind here, unless you are running hundreds of FPS in a super twitchy game with a monitor that can actually display those frames, none of this is noticable by the user.

#5103865 Main loop timing

Posted by on 23 October 2013 - 02:16 PM

All valid counter arguments for very good reasons. I don't pretend that my suggestions are one size fits all, but I do still strongly disagree with "render as fast as you can" in most cases, especially over the limit of your updates. I didn't really consider the possibility of having your physics running at a slower rate, which is not uncommon, but input, actors updates, etc, "usually" run faster. If they are fixed at all, Unreal3 for example is variable timestep for updates and works pretty well (not sure how UE4 does it).


However, could you expand on my comments to this?


kuroioranda, on 23 Oct 2013 - 2:18 PM, said:snapback.png

Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.

Otherwise his timestep technique still works very well in general.

Once again I have to recommend this advice be regretfully ignored.
Firstly, timeSetPeriod() is a Windows®-wide global function, and can interfere with other processes on your machine, as well as degrade overall performance.
Secondly, it consumes more PC resources and battery life, which goes exactly against what was mentioned before.
Thirdly, it has no effect on QueryPerformanceCounter() and friend, which is what you should be using for timing in your game. This is not related to Sleep(), but…
Fourthly, Sleep() is not even the proper way to give resources back to the machine, and if you increase the system resolution you are giving fewer resources back anyway, defeating the purpose. The proper way to give resources back is to actually wait for timed system events, such as v-sync. On iOS you have CADisplayLink, on all consoles you have a v-sync notification, and on desktops you have a simple setting that can be enabled to allow automatic v-sync waiting.


I gave that as an example of a way around it on windows specifically. No, it doesn't affect QueryPerformanceCounter, but the global timer resolution and high performance timing are orthogonal concerns. If I had mentioned timeGetTime(), that would have been one thing, but he was specifically asking for ways to not spin in a loop and blow CPU cycles waiting for a set time to elapse. Lowering the scheduler resolution to 1 does consume slightly more resources than having it higher, but it does NOT increase resource usage over never letting the CPU go idle. If he wants to let it spin at 100% CPU time, then sure, put it in a loop and keep checking QueryPerformanceCounter, but in his original post he said that he wants to put the CPU to sleep when he doesn't have anything for the game to do. You can lower the impact by using a slighly higher value (timeBeginPeriod(3), for example) that is still low enough to keep hiccups from appearing, but as Windows before Win8 is a ticked kernel you have to make in trade of slightly more CPU time spent handling CPU wakeups if you want the much bigger power win of idling the CPU down when it's not used.

And yes, timeBeginPeriod is global, but it's designed that way. It's how Windows works. Every program tells the OS what resolution it needs in the kernel and then the OS uses the smallest requested resolution (for Windows before Win8). If your program isn't prepared to have the timer resolution lowered on it, you're in serious trouble, because even Microsoft uses it. So no, I disagree that you have nothing to gain and everything to lose with timeBeginPeriod(). It exists for very good reasons, and used for those reasons it is not incorrect.

Specifically, do you know of another way to sleep your threads on windows that allows you the time granularity needed for games (serious question, I'd love to know because I'd rather not have to waste any extra CPU time with a lower timer resolution myself).

#5103817 Main loop timing

Posted by on 23 October 2013 - 12:18 PM

I recommend you read Gaffer's Fix Your Timestep article. In the article, Gaffer recommends you render as many times as possible, but only update your simulation at a fixed rate. Interpolation can be used in between updates to give a smooth feel.


I would recommend against rendering as fast as you can, actually. I usually cap my loops to render at most as many frames as the monitor supports by refresh rate. This didn't used to be such a big deal, but in the modern world of laptops and smart devices it's crucial to not waste battery power rendering frames the user cannot physically ever see. Likewise interpolation is usually unneeded and I would advise against it. At best you'll be adding a lot of complexity to your renderer to give your game that slightly strange effect that 120hz interpolating tvs give to broadcast shows. At worst it does nothing, as most LCDs are still 60hz and running at the same rate as your update loop anyway.


Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.



Otherwise his timestep technique still works very well in general.

#5101933 Renderer too slow

Posted by on 16 October 2013 - 02:43 PM

To read the call graph, open your .sleepy file in very sleepy and sort by %inclusive or %exclusive. %exclusive is the total amount of running time each function uses, not counting other functions it calls. %inclusive is the amount of time spent in the function and all child calls. I like to sort by %inclusive, normally, as scanning down the list will give you a pretty good top down idea of where you are slow.


WaitForSingleObject is a thread block, so you appear to be limited by something in the video driver.


The first suspicious entry you can do something about is al_d3d_create_bitmap, at 17%. How are you creating your bitmaps? You aren't recreating it every frame, are you? That alone would probably cause your perf issues. The other thing to check is if you are using ALLEGRO_MEMORY_BITMAP. I think that will put the images in video ram so they can be accelerated. Also, how big are the images, and what kind of video card do you have?

#5101839 Renderer too slow

Posted by on 16 October 2013 - 09:21 AM

The first thing to do is to profile your code. Humans are notoriously bad at guessing where their performance issues are in code. Don't guess, let hard data be your guide!


Build the executable in optimized mode, and make sure you have debugging info still (in visual studio these are both under project->properties). I use this to profile. It's very easy to use and gives you pretty clear call graphs:



Run your program, then select it for profiling in Very Sleepy. Wait a few minutes while Very Sleepy samples what you program is doing, and then you can look at the results. It will tell you exactly where in your program you are spending CPU time, and how much. This should allow you to pinpoint the functions that are causing your performance loss. If you need help beyond that, post the call graph sorted by total time and I will help.


Just browsing your code, though, one thing that you absolutely should change is doing a binary sort when you insert a new request. This is extremely expensive, as you have to move (which is a memory copy) on average half of the existing vector elements everytime you insert in the middle. It's generally much, much, MUCH faster to just keep appending the requests to the end of the list, and then do a single sort with std::sort just before you render them.


To answer your other question, on modern hardware sprites are really just flat 3D objects. You can render *millions* of them if you can feed the card efficiently. As previously noted, though, allegro is software rendered, so your CPU will have to pick up that slack and your throughput will be much lower. You should still be able to do more that 1200 unless the sprites are absolutely massive.

#5101679 lightuser data in Lua

Posted by on 15 October 2013 - 05:58 PM


Is there a way to do this with lightuser data ?
Pretty sure, no. I ended up using the syntax that apatriarca suggests. Here's another example of the same syntax being used due to the choice to use light userdata:




With just light user data, no, but if you're not concerned about allocating a full userdata as a wrapper (edit: you're not, I do this in my games and if you ever get to the point where it's actually a performance problem, you're doing way too much in script), I like to embed the light user data in a normal user data. Any functions called in the user data can then pull the light user data out in native code and work on it.


This should be exactly what you are looking for, as you can use the hybrid user data just like a normal lua table as per your second code block in the original post. You can override values, call functions with colon syntax, and only the wrapper will be under lua's memory management. The object pointed to by the wrapper will still be safe after the wrapper is destroyed.


// to create the lua binding object (in C)
    // push an empty table. This will be the object in lua we will "return" to lua

    // put the C object in the "privatecobject" field of the lua object as a light user data
    lua_pushlightuserdata(L, user);
    lua_setfield(L, -1, "privatecobject");

    // do your normal metatable setting here

// then setName() in C would look like this (using your example function)
// this would be called from lua as user:setName("name")
int setName(lua_state* L)
    // grab the wrapper from the stack
    lua_getfield(L, 1, "privatecobject");
    User* user = static_cast<User*>(lua_touserdata(L, -1));
    lua_pop(L, 1); // pop the userdata field so the stack is unchanged. Not really needed in this function, but a good habit

    user->setName(lua_tostring(L, 2));
    return 0;

#4877692 Can't get a decent Collada file from Blender?

Posted by on 27 October 2011 - 03:58 PM

I wrote a converter that converts a Collada file to my own binary format. However since Blender's exporter is a useless piece of shit, i have to go through this everytime i need to load a model into my game:

Export .fbx file from Blender -> Convert to .dae using Autodesk Converter -> Use Collada Refinery on .dae file -> Finally convert .dae file to my format.

But then i realized that for some reason, the Autodesk Converter converts the animation keyframes into interpolated keyframes. So instead of reading 2 keyframe data and doing the interpolation in real-time, i have to read 10^10 keyframes. And i can't reverse it because the original keyframe data isn't there anymore.

So, FOR THE LOVE OF EVERYTHING THAT IS HOLY, does anyone have a working collada exporter for blender?! If not, could you at least tell me how do you convert your Collada files? Assuming that you do..

I need a drink.

Have you considered writing your own exporter directly from blender? The python API for blender is actually fairly easy to use, and you can do some really slick stuff with it like opening and dumping scenes directly from the command line. In my project, I don't manually export anything, I just run my build script and any changes in the game's blend files are automatically built into data for my engine and posted to the output directories for use.


#4864088 DXT compression vs downscaled textures

Posted by on 20 September 2011 - 10:04 PM

I'm fairly certain thats *not* the case, as that would make it 16:1 compression, not 4:1 compression as is documented everywhere.

It's actually 6:1 (for a 24-bit source) or 8:1 (for a 32-bit source), not 16:1 or 4:1 (DXT3 and 5 are 4:1). So no, it's not as good as 2048->512, but it's still pretty awesome.

I probably should have mentioned that our source art before compression is already at a higher resolution, and we are accustomed to appropriately creating individual mip levels to retain clarity. The problem with the quality reduction is that a lot of our details entirely inside or straddling dxt pixel blocks - for all intents and purposes, similar to stippling patterns.

Out of curiosity, does your art use a lot of solid color fills or gradients (such as 2D cell shading)? This is an area where texture compression artifacts can actually be noticed fairly easily, and it's common to just bite the bullet on memory and use full res uncompressed textures if these textures are relatively static and viewport aligned.

A screenshot could be worth 1000 words here, so if you have some images of the problems you're encountering with texture compression we might be able to suggest solutions that don't require you to use 4-8 times more VRAM with uncompressed assets.

#4858949 Collision with terrain

Posted by on 08 September 2011 - 01:50 AM

Why keep searching until you hit terrain? You only need to keep searching until you are past the bottom of your player's feet.

Also, does your quad tree actually store all leaf nodes, even if they are empty? You describe them as being 33x33, but you'll find you can inject nitro into your graph by only dividing the tree up as far as you need to. One of the real strengths of quad trees is that the moment you have a cell with nothing in it, you stop dividing it up. This drastically reduces the number of nodes you need to traverse, as most of your world will be empty space. For purposes of collision you can consider areas beneath terrain and inside of world objects as "empty space" as well, since you only care about intersections with the surface of collision objects and terrain.

The wikipedia article has a good image to describe what I'm talking about. http://en.wikipedia.org/wiki/Quadtree. See all that empty space? They have an incredibly small cell size where there are objects, and yet throwing a ray straight from top to bottom only results in a few dozen tests at worst.

#4858861 Best way to display bitmaps in a HUD (OpenGL)

Posted by on 07 September 2011 - 06:35 PM

What do you mean by "merge images"? Putting one image on top of another in a color blend? Taking component images and laying them out into a GUI of some sort?

Edit: To be more specific, what is it you are trying to do? The best way to manipulate your images will depend on *why* you are manipulating them.

#4856564 Accessing elements efficiently

Posted by on 01 September 2011 - 08:34 PM

It would seem that a for in loop on an array will default to the number of elements in the array and not a reference to each object in the array. Thus to get the code to work I had to do

for(var vertex in vertices)
x = vertices[vertex].x;
y = vertices[vertex].y;
z = vertices[vertex].z;

Which is substantially slower as I believe the "in" keyword involves more complicated type checking than a simple less than comparison.

I don't use for/in very much, if I am messing something up please let me know =)

Ah, so it does, you are right. Thanks for trying it, though :).

#4856530 Accessing elements efficiently

Posted by on 01 September 2011 - 06:05 PM

This gets repeated a lot here but it's important. Premature optimization is the root of all evil. Write clean code and use good algorithms where appropriate. Let the compiler (or interpreter) take care of the micro optimizations. If profiling shows that the compiler isn't doing a good enough job, THEN look at tuning it by hand.

For example, do you know if your Javascript engine is compiled at all, or is it simply interpreted? Does it optimize? How exactly are objects stored in memory? What are the costs involved allocating and destroying a vertex object as opposed to extending your array? What sort of cache access patterns does each allocation method have? These are only a few things you will need to know to answer your question meaningfully, and the best thing is that they will be different for each browser you run this in, if you're even running in a browser at all.

So stop worrying about it. Rewrite that sucker to use vertex objects. You are dealing with vertices, so using a packed array with all that access math is just plain confusing, a potential source of bugs and will make maintenance much harder.

#4776079 I got beat up by a cop

Posted by on 18 February 2011 - 03:29 PM

You missed the 4th option: Man up and admit that I am at fault here.

Every one of your options makes it clear that you are only interested in painting yourself as the victim, even though you, by your own admission, both initiated and then escalated this scenario. The fact that you would even consider pressing any kind of action is disgusting. And he even called to apologize when it's clear even from your own telling of the story you were the one responsible for this going down, which was a grade A class act on his part.

Let break this down:
You hide something in your pocket and act suspicious with the intention of forcing him to take notice of you. He does. whether you heard him ask you to stop or not, you know you are being pursued, and that persual is YOUR FAULT. You intended it to happen, or you wouldn't have run.
At this point, you could stop, turn around, and tell the guy, "I'm sorry officier, I'm just a stupid kid trying to get a rise out of you". And he'd search you, he'd have to, it's his job. Then you'd be let go. But you don't. You escalate, choose to keep running.
Then you try to escape into your house. At this point, what is the cop thinking? For all he knows, your buddies are in there. They might have guns. Is it a safe house? Sure, you know you're harmless, but he doesn't. As far as he knows, he could be dead in the next few minutes if he chooses to continue the chase. But what's he to do? It's his job.
He catches you. He punches you. And he's completely justified in doing so. What would you have him do, try the peaceful route, just say "please stop running?". Oh wait, you had that chance. You didn't take it. You forced him to either subdue you the violent way, or else not do his duty as a cop.
He searched you, he didn't believe you were clean at first. Well no crap, I wouldn't have either. No sane person would.

You reaped what you sowed. Get over yourself.