Advertisement Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

1813 Excellent

About Geometrian

  • Rank

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Hi all, More than a decade ago, a problem came up on this forum for computing a fast transpose of a 3x3 matrix using SSE. The most sensible implementation stores the matrix internally as a 3x4 matrix (so, one row stores 4 elements, aligned in a vector). A version, which I believe to be the fastest currently known, was presented: I am pleased to report that I have been able to come up with a version which should be faster: inline void transpose(__m128& A, __m128& B, __m128& C) { //Input rows in __m128& A, B, and C. Output in same. __m128 T0 = _mm_unpacklo_ps(A,B); __m128 T1 = _mm_unpackhi_ps(A,B); A = _mm_movelh_ps(T0,C); B = _mm_shuffle_ps( T0,C, _MM_SHUFFLE(3,1,3,2) ); C = _mm_shuffle_ps( T1,C, _MM_SHUFFLE(3,2,1,0) ); } This should be 5 instructions instead of ajas95's 8 instructions. Of course, to get that level of performance with either version, you need to inline everything, or else you spend tons of time on moving floating point arguments to/from input registers. The other thing that is crucial is that the instruction set be VEX encoded. This allows generating instructions that take three arguments, like `vunpcklps`, instead of instructions like `unpcklps` that take only two. VEX is only available in AVX and higher (usually passing e.g. `-mavx` is sufficient to get the compiler to generate VEX instructions). -G
  2. @_Silence_: GLX version on this system is 1.4. The FB configs load is pretty standard, using glXChooseFBConfig, and the result is queried for validity. The code is open-source, albeit the latest version is not online. If you (or someone else) would like to see it, I can update the repo; the reason I didn't lead with that is that the code is quite lengthy; much control and additional functionality needs to be exposed. This is also the reason I moved from using existing context/windowing libraries (I've worked extensively with wx, Qt, SDL, and GLUT previously, and am somewhat familiar with GLFW). @NumberXaero: Basically what's happening is there is a "Context" object which sets the context at the beginning of its constructor, and unsets it at the end. This, along with some other logic in there, ensures that the bound context is the same before/after making the "Context" object. The constructors are (lines 1-5) and (lines 7-13). Each context loads its own pointers automatically, which is why the second context gets its own pointer, even though it didn't have to.   I thought it best to present the problem as simply as possible, with just the raw API commands. Perhaps more commands would be helpful? Or perhaps someone wants to dig through the source (it's actually very readable; just long)? Suggestions?
  3. Hi,   I have created an OpenGL context in what I believe is the proper way, but on Linux, basic calls such as `glGetString(...)`, `glGetIntegerv(...)` are returning bogus values/no-ops. Despite this, the context seems to render everything fine. On Windows, I follow the same algorithm and it works perfectly.   Fortunately, since I've been having this problem for such a long time, I've had the opportunity to make some very pretty debug output. In case it isn't obvious, in the following, red is a frame/window, cyan is an OpenGL context, yellow is an API pointer, and violet is the display server handle.   Here is the output on Windows: [sharedmedia=gallery:images:7659] The algorithm is as follows: (line 1): Create a basic context on a default frame (lines 3-5): Set the basic context as current, load its API pointer for `wglCreateContextAttribsARB(...)`, unset context. (lines 7-9): Set the basic context as current, call the API pointer to create an attribute context (lines 11-13): Set the attribute context as current, and set it up (including loading its API pointer, which happens to be the same; we don't use it ever, though). (line 15): Set attribute context as current in preparation for main loop. (line 15.5): [Main loop] (line 16): Unset the attribute context. (lines 18-19): Cleanup. Now, I try to do something very similar on Linux: [sharedmedia=gallery:images:7658] Unfortunately, it doesn't work. Notice the error after line 12. At that point, I called `glGetString(...)` and it returned null. This should not be possible. A context is bound (line 11). Crucially, there is no GL error--yet the only case the documentation says null is returned is if an error happened. In fact, no OpenGL error occurs at all, anywhere!   Basically, I want to know why this happens, and prevent it. Did I screw up the context creation somehow? Why wouldn't it throw an error? Is this OpenGL driver just terrible?   ---   One other potentially-relevant fact: on AMD CodeXL, I get the following output on Windows:   This should not be possible either; as you can see, that function is only ever called when the basic context is bound, using the pointer loaded while the basic context was bound. Additionally, at the time this message appears, only one context had been created, so . . .   Thanks, -G
  4. Geometrian


    Scratch space for fora
  5. Whatever happened to "Write Games not Engines"?
  6. Nice article. Two things: one of your tags is typoed. Second, (and I don't know if this works within's peer review process), while you have overall good points, I would like to see it be less . . . ramble-y. Some minor formatting (e.g. boldface some subsections) could even be enough to give it some structure.
  7. Geometrian

    Our New Game: I Am Dolphin

    I would definitely be interested in this. I'm going to ask a few more questions; perhaps you want to point them there instead of answering here?   This was surprising to me. I looked for water detail specifically, but thought you were probably using only two triangles and a normalmap because the intersections I saw on jumping looked flat. So, am I to understand that all that variation in lighting is representative of underlying geometry? I did notice the more complicated BRDF and the reflection (which I assume is a scaled inverse impostor?). My impression was maybe a depth fog hack, without special consideration of the background. I didn't really think about it much, but I'd've guessed the gradient came from a scalar with elevation.   I can see all the shading for non-creatures happening without a texture fetch (and besides, the compute for depth fog is pretty cheap), so I'd think this is only a bottleneck because it just covers a huge amount of pixels. It's a pity; one's instinct is to try deferred shading--but the fragment cost is coming mostly from overhead, not compute or memory fetch. A rare situation for HPC.   This is certainly the rule on commodity computer graphics cards, because compute is free compared to memory accesses. The vertex unit gets its data fed directly to it, but the fragment generally pulls most of its data indirectly from GPU memory. Even if it's coherent, it's still an issue for the memory controller. This is why thread processors have many register files to amortize accesses.   I'm . . . less convinced for fewer thread processors. The scheduler plays a role because it batches less, but the most significant reason is that memory accesses for fragment programs become both more coherent and less frequent. Graphics memory gets fetched into thread warps' caches, and the memory controller needs to service multiple thread warps. However, for a simple GPU--like apparently mobile GPUs exclusively are--there's effectively only one thread warp, which is the memory controller's only customer.   Probably more importantly, as fragment programs get cheaper, rasterization and vertex shading start being important. Assuming the area shaded remains contant, adding more vertices makes your application vertex-bound. My (software) rasterizer starts showing linear scaling with the number of vertices once pixels:vertices gets around 1000:1, for a pass-through fragment shader. Whatever the magic ratio is for your architecture, shading, and scene determines whether the render is vertex- or fragment-bound.   I don't know. I feel like vertex shading should be a significant cost--both because of the simpler architecture and because your fragment shader is so simple. But, at the same time, you have so many pixels to shade maybe both are dwarfed by rasterization.   Best, -G
  8. Geometrian

    Our New Game: I Am Dolphin

    I see textured, diffuse shading on maybe up to 30 animated objects, maybe 100 particles, and in a few shots some distant terrain. A raspberry pi model A can do the graphics part of that at 30Hz in full HD using its crappy 4 core Broadcom GPU.   I was surprised then--nay amazed--to find that the 3rd- and 4th-generation iPads actually have comparable GPUs. The Air and even the iPhone 6 aren't much better. I was under the impression that mobile devices were maybe a decade back on the GPU curve. It's looking closer to two.   Under that information, I'm actually impressed you got this kind of graphics. For caustics, I was going to suggest some large textured quads--but you're almost certainly fillrate-bound at this resolution, which also explains your simple shading model. Updating animation geometry I imagine is also a significant challenge--especially since it looks like you used almost all your polygon budget on animated geometry. I'd be interested to hear about how you do skinning.   I likewise believe that interactivity trumps quality. Further, I find anything less than 60Hz unplayable. As above, I'm impressed you managed even that.   My research machine has 5,088 GPU cores. You're stuck with 4. Thank you for reminding me why I don't do mobile development; I retract my graphics criticism.   ----   For maneuverability, I was referring specifically to one scene in which a killer whale turns around in a half second or so (around 1:04 in the trailer above). My impression was that they are larger, too. I'm also pretty sure they can't do double backflips when jumping.   But yes, certainly "More responsive and controllable wins over more realistic.".   -G
  9. Geometrian

    Our New Game: I Am Dolphin

    Gentle criticism: I'd like to see some better graphics--in particular some fake underwater caustics and some splashing on the surface. It doesn't look like you're taxing the GPU much. Regarding the animation, the characters look like they are far too maneuverable.   All said, knowing how difficult AI and realistic, physical animation is, I am very impressed. Good work!
  10. Geometrian

    Chamois - Fluent Assertion Syntax for C++

    I don't mean to sound insulting, but that looks a lot less readable to me. I don't know if it's just me.   Maybe you could get used to it, but I don't like chained methods. A simple "<" is much more clear to me than "LessThan". It's less for me to parse, and I think in symbols instead of English when I'm doing anything vaguely mathematical.
  11. Geometrian

    Why Games Don't Have to be Good Anymore

    I would like to see a few more concrete examples in this article, especially since the article proper is expressing an opinion. You touched on this: new games can be rehashes of dead genres ad infinitum. Looking just at first person shooters, I wrote: In my opinion, one of the major reasons for stagnation in the gaming industry is that it (both companies producing and gamers consuming) has collectively forgotten the new-frontier sort of brazen exploration that was its hallmark twenty, thirty years ago.
  12. I feel like the main disadvantage (after 4.) is that your ".get" method now needs to be parameterized on a particular type that may not be obvious. I don't want to have to remember that the type was "PathfindingService" and not "PathfinderService"--and heaven forbid what happens should I want to change any of these names. Plus, Intellisense-type features become a lot less helpful (it can't suggest reasonable types to put in the template in the same way it could suggest reasonable fields to dereference). All that said, I have used something similar in my projects myself. It's interesting that you cast it as a compile-time versus link-time tradeoff; my main application was abstraction. If you're feeling especially devilish, you can replace your implementation class with a std::vector of void* (or of some base type). That will let you only have two files (which IMO is much more clear than having three or four to fill one semantic purpose).
  13. Geometrian

    Texture Deletion Race Condition

    I'd expect to be able to allocate and delete textures when I choose. It's pretty simple stuff. Even if I'm doing it frequently, this shouldn't affect whether it works. I feel like that's a pretty reasonable expectation.   Note: I was using a development version of the Kinect 1, that allows writing in C++ on a PC, not a console.
  14. Geometrian

    Texture Deletion Race Condition

    I found a silly bug related to the way I was handling mipmapping, but I have no evidence the problems were related.   The deletion/reallocation was mainly for simplicity. I understood then as now that it's generally a bad approach--but I was still expecting it to be correct.
  15. Geometrian

    Texture Deletion Race Condition

    Nope. As I wrote, time was short. PBOs aren't used.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!