kuroioranda

Members
  • Content count

    204
  • Joined

  • Last visited

Community Reputation

304 Neutral

About kuroioranda

  • Rank
    Member
  1. SlimDX Matrix class broken in Windows 8.1?

    This is just a side effect of how Microsoft chose to package the D3DX libraries. Each release of the SDK has it's own revision, and they aren't interchangeable. This is why Steam always reinstalls DirectX even if you have the latest version, because the version specific libraries might not have been previously installed if no other program on your machine had been linked against that specific version. You pretty much have to either have the user install the version of D3D that SlimDX was linked against or else remove the D3DX dependency. More info: http://forums.steampowered.com/forums/showpost.php?p=23759166&postcount=47
  2.   No, because it's proportional. If you are just issuing a call to render a single mesh, that is very, very fast. Present may be slower, but it only seems so expensive because it's it's slower than doing a very fast thing. For a real world comparison, this is similar to saying that a donut at 50 cents is expensive because a stick of gum only costs 5 cents. The donut is not expensive, they gum is just insanely cheap.
  3. Main loop timing

    Was the high end machine running win 8? Win 8 waits the exact amount of time (within certain limitations) that the next sleeping thread needs to wake up, so it isn't dependent on any timer resolution like previosu versions. This is pretty normal. QueryPerformanceCounter isn't reliable on all CPUS, so even if you do use it for loop timing you need to back it up with some other less precise timer (like timeGetTime()) than can let you know when QueryPerformanceCounter is grossly incorrect. Back to L. Spiro. I know, I've seen you around. I wasn't trying to make an argument from authority, I was just starting to feel that maybe you thought I was a hobbyist spouting off. Which probably backfired because then you probably thought that that was what I thought. I also do heavy lifting engine work, btw, although not on a dedicated R&D team. I didn't say that, I said in the case where you already had oodles of spare CPU time. That's not "no reason", and it's not happening when that spare ms would hurt your mesh skinning because in that case there's no oodles of spare CPU cycles. Your linked mech skinning example, btw, being one where I absolutely would never sleep, as you're on a console and know you're already sucking up every available cycle, and always will be. This though is apples and oranges, you're countering an argument I've made for games on Windows or phones with a console game. They are different environments, they have different engineering considerations.  I'm not sure if I'm not explaining things very well or what, but please try to understand what I'm saying. I am not talking about consoles. I am not talking about systems which are always just squeaking by. I am not talking about sleeping when you have no time to spare. I'm saying that battery life/heat is really, really important for usability on platforms that are inherently backwards compatible and likely to be portable. As I have said before, many times, everything you have said is absolutely, one hundred percent true, *under certain circumstances*. Console development being one of them. But I think you do hobbyists and students a great disservice by dismissing concerns that actually are part of the platforms they are most likely to be developing on. If FTL or the Duck Tales remake were to peg my CPU at 100% while I'm running in a window (no vsync), even if they only needed 25-30% of a core to be smooth, I would be very, very sad, and would not consider that to be a good engineering decision. Likewise, as cutting edge titles age, they will also take a smaller and smaller amount of CPU time. I don't want Quake2 to be pegging my modern laptop CPU at 100% when it now only needs 10% to be smooth. It's brainless engineering to get the best of both worlds by simply not calling sleep if you don't have cycles sleep. If you want to add a checkbox to toggle this, go for it, but as a usability thing, I'd leave it limited by default. Anyway, I think I've explained my reasons about as well as I can at this point.
  4. Main loop timing

      Yes, logical update. Rendering more frames than your logical update rate gives the user no additional information. Sure, you can make it smoother (to the limits of max(refresh rate, human eye perception), unless vsync is off and then you are racing tears), but if your logical update is 30 and you are rendering at 60, the user can't actually influence the world and the actors won't update themselves any faster than 30.     .All valid points, the problem is that I think you are forgetting that the OPs intent is to lower this CPU usage. He's already told us he has plenty of CPU time left over between updates, and he doesn't want to peg the CPU. For bleeding edge, AAA, barely running on a platform games, everything that you suggested applies in force. I don't think that's what we're dealing with here though. For smaller, less CPU intensive games of the usual hobbyest or indie flavor (which I assume is what we are looking at here), getting things looking good while not destroying the laptop batteries of your casual audience is very important to user experience. I also love choices, but a lot of users don't understand the implications of those choices. In this case, I see very little benefit to allowing hundreds of fps if the updates are fixed and the display has a cap of what it can show the user anyways. It doesn't matter how fast your eye is if the transmitting medium is only feeding them so fast. Tweaking your settings to get the best possible performance out of your latest big game is great and an important tool, but when performance of your game already fits nicely in modern machines with cycles to spare they are not nearly as important. (I am myself a gameplay and systems engineer professionally, btw, not a hobbyist). My apologies, I have asked it poorly then. I did intend to ask "Is there a better way to sleep for a more accurate time?", as you hit on. And here I must disagree again, because your conclusion is correct in the wrong circumstances. For contemporary AAA games, yes, you probably have the CPU saturated and sleeping is a moot point, waiting is much better. For smaller games, however, as I said above, this is not the case, and other factors become equally if not more important to consider than raw FPS. However, I would argue this counterpoint even in the AAA case. Today's big budget CPU hogging games are tomorrows throw on your portable and go titles. I would always code a game to sleep when you have oodles of spare time and just wait otherwise. This way you get the best of both worlds. Today, you get performance, but tomorrow, you get to take Quake2 with you on the plane and catch a few games without destroying your battery at a rapid rate. Current consoles do around 30FPS in most games overall, but frame variability is insane. Sometimes your frames come in under 16ms because nothing interesting is happening on screen, sometimes you need to calculate a path or init an expensive AI behavior and you'll spike 150ms+ for a frame. This is also true on PC, although the numbers tend to be tighter because the hardware is so much better. This is neither granular nor terribly reliable, but it is the reality. Really the only thing that matters is that any invariability is imperceptible to the user, and a tick granularity of 1ms is well, well below that. If you get a few ms ahead here or a few ms behind here, unless you are running hundreds of FPS in a super twitchy game with a monitor that can actually display those frames, none of this is noticable by the user.
  5. Main loop timing

      I hadn't heard of that used for game timing (although I've seen it in callstacks before), but sadly WaitForSingleObject appears to also be dependent on the timer resolution specified by timeBeginPeriod().
  6. Main loop timing

    All valid counter arguments for very good reasons. I don't pretend that my suggestions are one size fits all, but I do still strongly disagree with "render as fast as you can" in most cases, especially over the limit of your updates. I didn't really consider the possibility of having your physics running at a slower rate, which is not uncommon, but input, actors updates, etc, "usually" run faster. If they are fixed at all, Unreal3 for example is variable timestep for updates and works pretty well (not sure how UE4 does it).   However, could you expand on my comments to this?   Once again I have to recommend this advice be regretfully ignored. Firstly, timeSetPeriod() is a Windows®-wide global function, and can interfere with other processes on your machine, as well as degrade overall performance. Secondly, it consumes more PC resources and battery life, which goes exactly against what was mentioned before. Thirdly, it has no effect on QueryPerformanceCounter() and friend, which is what you should be using for timing in your game. This is not related to Sleep(), but… Fourthly, Sleep() is not even the proper way to give resources back to the machine, and if you increase the system resolution you are giving fewer resources back anyway, defeating the purpose. The proper way to give resources back is to actually wait for timed system events, such as v-sync. On iOS you have CADisplayLink, on all consoles you have a v-sync notification, and on desktops you have a simple setting that can be enabled to allow automatic v-sync waiting.   I gave that as an example of a way around it on windows specifically. No, it doesn't affect QueryPerformanceCounter, but the global timer resolution and high performance timing are orthogonal concerns. If I had mentioned timeGetTime(), that would have been one thing, but he was specifically asking for ways to not spin in a loop and blow CPU cycles waiting for a set time to elapse. Lowering the scheduler resolution to 1 does consume slightly more resources than having it higher, but it does NOT increase resource usage over never letting the CPU go idle. If he wants to let it spin at 100% CPU time, then sure, put it in a loop and keep checking QueryPerformanceCounter, but in his original post he said that he wants to put the CPU to sleep when he doesn't have anything for the game to do. You can lower the impact by using a slighly higher value (timeBeginPeriod(3), for example) that is still low enough to keep hiccups from appearing, but as Windows before Win8 is a ticked kernel you have to make in trade of slightly more CPU time spent handling CPU wakeups if you want the much bigger power win of idling the CPU down when it's not used. And yes, timeBeginPeriod is global, but it's designed that way. It's how Windows works. Every program tells the OS what resolution it needs in the kernel and then the OS uses the smallest requested resolution (for Windows before Win8). If your program isn't prepared to have the timer resolution lowered on it, you're in serious trouble, because even Microsoft uses it. So no, I disagree that you have nothing to gain and everything to lose with timeBeginPeriod(). It exists for very good reasons, and used for those reasons it is not incorrect. Specifically, do you know of another way to sleep your threads on windows that allows you the time granularity needed for games (serious question, I'd love to know because I'd rather not have to waste any extra CPU time with a lower timer resolution myself).
  7. Main loop timing

      I would recommend against rendering as fast as you can, actually. I usually cap my loops to render at most as many frames as the monitor supports by refresh rate. This didn't used to be such a big deal, but in the modern world of laptops and smart devices it's crucial to not waste battery power rendering frames the user cannot physically ever see. Likewise interpolation is usually unneeded and I would advise against it. At best you'll be adding a lot of complexity to your renderer to give your game that slightly strange effect that 120hz interpolating tvs give to broadcast shows. At worst it does nothing, as most LCDs are still 60hz and running at the same rate as your update loop anyway.   Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.   http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624%28v=vs.85%29.aspx Otherwise his timestep technique still works very well in general.
  8. Renderer too slow

      It's in the .sleepy file he posted. You can open saved captures in Very Sleepy to check it out. Most of the call time was in shader verification.   Paragon: 8000 still seems pretty low, but I think you're well on your way to getting your performance up. If you want deeper help you can PM me the project, but without the rest of the source and the ability to inspect things I'm afraid this is the limit of what I can do to help. Good luck!
  9. Renderer too slow

    To read the call graph, open your .sleepy file in very sleepy and sort by %inclusive or %exclusive. %exclusive is the total amount of running time each function uses, not counting other functions it calls. %inclusive is the amount of time spent in the function and all child calls. I like to sort by %inclusive, normally, as scanning down the list will give you a pretty good top down idea of where you are slow.   WaitForSingleObject is a thread block, so you appear to be limited by something in the video driver.   The first suspicious entry you can do something about is al_d3d_create_bitmap, at 17%. How are you creating your bitmaps? You aren't recreating it every frame, are you? That alone would probably cause your perf issues. The other thing to check is if you are using ALLEGRO_MEMORY_BITMAP. I think that will put the images in video ram so they can be accelerated. Also, how big are the images, and what kind of video card do you have?
  10. Renderer too slow

    The first thing to do is to profile your code. Humans are notoriously bad at guessing where their performance issues are in code. Don't guess, let hard data be your guide!   Build the executable in optimized mode, and make sure you have debugging info still (in visual studio these are both under project->properties). I use this to profile. It's very easy to use and gives you pretty clear call graphs: http://www.codersnotes.com/sleepy   Run your program, then select it for profiling in Very Sleepy. Wait a few minutes while Very Sleepy samples what you program is doing, and then you can look at the results. It will tell you exactly where in your program you are spending CPU time, and how much. This should allow you to pinpoint the functions that are causing your performance loss. If you need help beyond that, post the call graph sorted by total time and I will help.   Just browsing your code, though, one thing that you absolutely should change is doing a binary sort when you insert a new request. This is extremely expensive, as you have to move (which is a memory copy) on average half of the existing vector elements everytime you insert in the middle. It's generally much, much, MUCH faster to just keep appending the requests to the end of the list, and then do a single sort with std::sort just before you render them.   To answer your other question, on modern hardware sprites are really just flat 3D objects. You can render *millions* of them if you can feed the card efficiently. As previously noted, though, allegro is software rendered, so your CPU will have to pick up that slack and your throughput will be much lower. You should still be able to do more that 1200 unless the sprites are absolutely massive.
  11. lightuser data in Lua

    Pretty sure, no. I ended up using the syntax that apatriarca suggests. Here's another example of the same syntax being used due to the choice to use light userdata: http://bitsquid.blogspot.com.au/2011/06/lightweight-lua-bindings.html     With just light user data, no, but if you're not concerned about allocating a full userdata as a wrapper (edit: you're not, I do this in my games and if you ever get to the point where it's actually a performance problem, you're doing way too much in script), I like to embed the light user data in a normal user data. Any functions called in the user data can then pull the light user data out in native code and work on it.   This should be exactly what you are looking for, as you can use the hybrid user data just like a normal lua table as per your second code block in the original post. You can override values, call functions with colon syntax, and only the wrapper will be under lua's memory management. The object pointed to by the wrapper will still be safe after the wrapper is destroyed. Example: // to create the lua binding object (in C) // push an empty table. This will be the object in lua we will "return" to lua lua_newtable(L); // put the C object in the "privatecobject" field of the lua object as a light user data lua_pushlightuserdata(L, user); lua_setfield(L, -1, "privatecobject"); // do your normal metatable setting here // then setName() in C would look like this (using your example function) // this would be called from lua as user:setName("name") int setName(lua_state* L) { // grab the wrapper from the stack lua_getfield(L, 1, "privatecobject"); User* user = static_cast<User*>(lua_touserdata(L, -1)); lua_pop(L, 1); // pop the userdata field so the stack is unchanged. Not really needed in this function, but a good habit user->setName(lua_tostring(L, 2)); return 0; }
  12. [quote name='Icebone1000' timestamp='1320104870' post='4879058'] Ive been reading about tripple buffering, and Im kind like "wtf? triple buffer is just magicly enabled by driver stuff?" I mean, as a programmer, I though tripple buffer(witch I was holding for future) is something Im as a graphics programmer would have to manage myself, setting the swap chain to have 2 back buffers, and then chosing( in my algorythm) when to render to the first and when to render to the second bbuffer, probaly involving multithreading.. Damn, how can tripple buffer be that automatic..( its what Im guessing from a fast read on articles).. I mean, in d3d you the one who says witch is your render target, how can it be modified from outside? [/quote] Usually that is only for the OpenGL driver settings, which are a little looser about what you need to setup than DirectX. Even in DirectX, though, the driver is free to do all kinds of things with your rendering options under the hood (and it does). My video card has options to force a bunch of bells and whistles (such as anisotropic texture filtering) even in games that were made before these features existed. [quote name='Icebone1000' timestamp='1320105202' post='4879059'] off the topic, in frank luna dx10 book: BufferCount: The number of back buffers to use in the swap chain; we usually only use one back buffer for double buffering, although you could use two for triple buffering. in dx sdk: A value that describes the number of buffers in the swap chain,[b] including the front buffer.[/b] I always put 2 to this value, meaning front and back only...passing 1 to it would mean an error..but it works..makes me guess if the sdk have a wrong description [/quote] According to the SDK remarks: "In full-screen mode, there is a dedicated front buffer; in windowed mode, the desktop is the front buffer."
  13. [quote name='Icebone1000' timestamp='1320081331' post='4878938'] So today I decided to investigate it, since im logging all frames and its delta times..Im not really sure if I found anything, but I did found that from times to times, I get a frame running at 0.03xx, witch is a 30fps value, and my vsynced app runs at 0.016 per frame...its just one or two frames, so Im not sure if those are indeed the ones that bumps, would one frame drop be so noticeable? <...snip...> What the hell can be causing this, its weird, when I searched for something like this(bumps), ppl solved by turning vsync ON, not vsync causing it... [/quote] Yes, it will most likely be noticeable. If you have constant values in your update loops, it will be significantly worse, as your objects will move/update at half speed for a frame. So if every frame you do something like object.position += 0.3, when one of these spikes occur, you're effectively going to be moving half as fast. You might want to try enabling triple buffering in combination with vsync. If you enable VSync with a single back buffer in your swap chain, small hiccups like this are going to happen. If you add a second backbuffer to the chain, you can completely remove those hiccups with very little effort on your part. Anandtech has a really good breakdown of why these hiccups happen, and how triple buffering addresses the problem. [url="http://www.anandtech.com/show/2794/3"]http://www.anandtech.com/show/2794/3[/url]
  14. Can't get a decent Collada file from Blender?

    [quote name='Waaayoff' timestamp='1319742887' post='4877648'] I wrote a converter that converts a Collada file to my own binary format. However since Blender's exporter is a useless piece of shit, i have to go through this everytime i need to load a model into my game: Export .fbx file from Blender -> Convert to .dae using Autodesk Converter -> Use Collada Refinery on .dae file -> Finally convert .dae file to my format. But then i realized that for some reason, the Autodesk Converter converts the animation keyframes into interpolated keyframes. So instead of reading 2 keyframe data and doing the interpolation in real-time, i have to read 10^10 keyframes. And i can't reverse it because the original keyframe data isn't there anymore. So, FOR THE LOVE OF EVERYTHING THAT IS HOLY, does anyone have a working collada exporter for blender?! If not, could you at least tell me how do you convert your Collada files? Assuming that you do.. I need a drink. [/quote] Have you considered writing your own exporter directly from blender? The python API for blender is actually fairly easy to use, and you can do some really slick stuff with it like opening and dumping scenes directly from the command line. In my project, I don't manually export anything, I just run my build script and any changes in the game's blend files are automatically built into data for my engine and posted to the output directories for use. [url="http://www.blender.org/documentation/blender_python_api_2_59_0/info_quickstart.html"]http://www.blender.org/documentation/blender_python_api_2_59_0/info_quickstart.html[/url]
  15. [quote name='maxest' timestamp='1319667439' post='4877368'] [quote] In the very end, all you can do is optimizing your rendering for whichever API and system you are targeting. [/quote] I'd gladly like to do that, but how if I encounter problems at the very beginning. I just can't see what I should do to make OGL faster in plain geometry processing... I guess I will need to conduct some more tests. [/quote] How are you sending your data to GL? Immediate mode functions (glVertex()), Vertex Arrays, VBOs? There are a lot of different paths you can take in GL2.0, and they can have significantly different performance profiles. VBOs are generally going to be fastest, as the other two methods require you to copy your geometry data from system ram to GPU memory every frame. (As a note, VBOs can do that as well, but only if you ask them to and/or change the buffer contents). So if you're using DrawPrimitive on D3D, and glBegin()/glEnd() on OGL, that right there is a huge difference in the way you're actually getting your data to the card.