Geometrian

Member

1270

1813 Excellent

• Rank
Contributor

• Website
• Interests
Education
Production
Programming

@ian_mallett
• Github
imallett
1. Optimization Towards an Optimal VEX-SSE 3*3*float Matrix Transpose

Hi all, More than a decade ago, a problem came up on this forum for computing a fast transpose of a 3x3 matrix using SSE. The most sensible implementation stores the matrix internally as a 3x4 matrix (so, one row stores 4 elements, aligned in a vector). A version, which I believe to be the fastest currently known, was presented: I am pleased to report that I have been able to come up with a version which should be faster: inline void transpose(__m128& A, __m128& B, __m128& C) { //Input rows in __m128& A, B, and C. Output in same. __m128 T0 = _mm_unpacklo_ps(A,B); __m128 T1 = _mm_unpackhi_ps(A,B); A = _mm_movelh_ps(T0,C); B = _mm_shuffle_ps( T0,C, _MM_SHUFFLE(3,1,3,2) ); C = _mm_shuffle_ps( T1,C, _MM_SHUFFLE(3,2,1,0) ); } This should be 5 instructions instead of ajas95's 8 instructions. Of course, to get that level of performance with either version, you need to inline everything, or else you spend tons of time on moving floating point arguments to/from input registers. The other thing that is crucial is that the instruction set be VEX encoded. This allows generating instructions that take three arguments, like vunpcklps, instead of instructions like unpcklps that take only two. VEX is only available in AVX and higher (usually passing e.g. -mavx is sufficient to get the compiler to generate VEX instructions). -G
2. OpenGL My Properly-Created OpenGL Context Is Lying To Me.

@_Silence_: GLX version on this system is 1.4. The FB configs load is pretty standard, using glXChooseFBConfig, and the result is queried for validity. The code is open-source, albeit the latest version is not online. If you (or someone else) would like to see it, I can update the repo; the reason I didn't lead with that is that the code is quite lengthy; much control and additional functionality needs to be exposed. This is also the reason I moved from using existing context/windowing libraries (I've worked extensively with wx, Qt, SDL, and GLUT previously, and am somewhat familiar with GLFW). @NumberXaero: Basically what's happening is there is a "Context" object which sets the context at the beginning of its constructor, and unsets it at the end. This, along with some other logic in there, ensures that the bound context is the same before/after making the "Context" object. The constructors are (lines 1-5) and (lines 7-13). Each context loads its own pointers automatically, which is why the second context gets its own pointer, even though it didn't have to.   I thought it best to present the problem as simply as possible, with just the raw API commands. Perhaps more commands would be helpful? Or perhaps someone wants to dig through the source (it's actually very readable; just long)? Suggestions?
3. OpenGL My Properly-Created OpenGL Context Is Lying To Me.

Hi,   I have created an OpenGL context in what I believe is the proper way, but on Linux, basic calls such as glGetString(...), glGetIntegerv(...) are returning bogus values/no-ops. Despite this, the context seems to render everything fine. On Windows, I follow the same algorithm and it works perfectly.   Fortunately, since I've been having this problem for such a long time, I've had the opportunity to make some very pretty debug output. In case it isn't obvious, in the following, red is a frame/window, cyan is an OpenGL context, yellow is an API pointer, and violet is the display server handle.   Here is the output on Windows: [sharedmedia=gallery:images:7659] The algorithm is as follows: (line 1): Create a basic context on a default frame (lines 3-5): Set the basic context as current, load its API pointer for wglCreateContextAttribsARB(...), unset context. (lines 7-9): Set the basic context as current, call the API pointer to create an attribute context (lines 11-13): Set the attribute context as current, and set it up (including loading its API pointer, which happens to be the same; we don't use it ever, though). (line 15): Set attribute context as current in preparation for main loop. (line 15.5): [Main loop] (line 16): Unset the attribute context. (lines 18-19): Cleanup. Now, I try to do something very similar on Linux: [sharedmedia=gallery:images:7658] Unfortunately, it doesn't work. Notice the error after line 12. At that point, I called glGetString(...) and it returned null. This should not be possible. A context is bound (line 11). Crucially, there is no GL error--yet the only case the documentation says null is returned is if an error happened. In fact, no OpenGL error occurs at all, anywhere!   Basically, I want to know why this happens, and prevent it. Did I screw up the context creation somehow? Why wouldn't it throw an error? Is this OpenGL driver just terrible?   ---   One other potentially-relevant fact: on AMD CodeXL, I get the following output on Windows:   This should not be possible either; as you can see, that function is only ever called when the basic context is bound, using the pointer loaded while the basic context was bound. Additionally, at the time this message appears, only one context had been created, so . . .   Thanks, -G
4. /scratch

Scratch space for fora
5. A Rudimentary 3D Game Engine, Built with C++, OpenGL and GLSL

Whatever happened to "Write Games not Engines"?
6. 300 Employees On Multiple Continents: How We Work Without An Office

Nice article. Two things: one of your tags is typoed. Second, (and I don't know if this works within GDev.net's peer review process), while you have overall good points, I would like to see it be less . . . ramble-y. Some minor formatting (e.g. boldface some subsections) could even be enough to give it some structure.

8. Our New Game: I Am Dolphin

I see textured, diffuse shading on maybe up to 30 animated objects, maybe 100 particles, and in a few shots some distant terrain. A raspberry pi model A can do the graphics part of that at 30Hz in full HD using its crappy 4 core Broadcom GPU.   I was surprised then--nay amazed--to find that the 3rd- and 4th-generation iPads actually have comparable GPUs. The Air and even the iPhone 6 aren't much better. I was under the impression that mobile devices were maybe a decade back on the GPU curve. It's looking closer to two.   Under that information, I'm actually impressed you got this kind of graphics. For caustics, I was going to suggest some large textured quads--but you're almost certainly fillrate-bound at this resolution, which also explains your simple shading model. Updating animation geometry I imagine is also a significant challenge--especially since it looks like you used almost all your polygon budget on animated geometry. I'd be interested to hear about how you do skinning.   I likewise believe that interactivity trumps quality. Further, I find anything less than 60Hz unplayable. As above, I'm impressed you managed even that.   My research machine has 5,088 GPU cores. You're stuck with 4. Thank you for reminding me why I don't do mobile development; I retract my graphics criticism.   ----   For maneuverability, I was referring specifically to one scene in which a killer whale turns around in a half second or so (around 1:04 in the trailer above). My impression was that they are larger, too. I'm also pretty sure they can't do double backflips when jumping.   But yes, certainly "More responsive and controllable wins over more realistic.".   -G
9. Our New Game: I Am Dolphin

Gentle criticism: I'd like to see some better graphics--in particular some fake underwater caustics and some splashing on the surface. It doesn't look like you're taxing the GPU much. Regarding the animation, the characters look like they are far too maneuverable.   All said, knowing how difficult AI and realistic, physical animation is, I am very impressed. Good work!
10. Chamois - Fluent Assertion Syntax for C++

I don't mean to sound insulting, but that looks a lot less readable to me. I don't know if it's just me.   Maybe you could get used to it, but I don't like chained methods. A simple "<" is much more clear to me than "LessThan". It's less for me to parse, and I think in symbols instead of English when I'm doing anything vaguely mathematical.
11. Abusing the Linker to Minimize Compilation Time

I feel like the main disadvantage (after 4.) is that your ".get" method now needs to be parameterized on a particular type that may not be obvious. I don't want to have to remember that the type was "PathfindingService" and not "PathfinderService"--and heaven forbid what happens should I want to change any of these names. Plus, Intellisense-type features become a lot less helpful (it can't suggest reasonable types to put in the template in the same way it could suggest reasonable fields to dereference). All that said, I have used something similar in my projects myself. It's interesting that you cast it as a compile-time versus link-time tradeoff; my main application was abstraction. If you're feeling especially devilish, you can replace your implementation class with a std::vector of void* (or of some base type). That will let you only have two files (which IMO is much more clear than having three or four to fill one semantic purpose).
12. OpenGL Texture Deletion Race Condition

I'd expect to be able to allocate and delete textures when I choose. It's pretty simple stuff. Even if I'm doing it frequently, this shouldn't affect whether it works. I feel like that's a pretty reasonable expectation.   Note: I was using a development version of the Kinect 1, that allows writing in C++ on a PC, not a console.
13. OpenGL Texture Deletion Race Condition

I found a silly bug related to the way I was handling mipmapping, but I have no evidence the problems were related.   The deletion/reallocation was mainly for simplicity. I understood then as now that it's generally a bad approach--but I was still expecting it to be correct.
14. OpenGL Texture Deletion Race Condition

Nope. As I wrote, time was short. PBOs aren't used.
15. OpenGL Texture Deletion Race Condition

Hi,   I built a small demo using the Kinect for PC that simply reconstructs 3D position and color. Time was short, and the simplest solution just took the raw client-side data and copied it into a new texture for every frame.   The way this worked in the main loop was as follows: --Grab new data from Kinect (client-side bytes) --Allocate new OpenGL textures and glTexImage2D the new data into them --glFlush and glFinish all the things (just in case)   --Draw the frame using the new textures (vertex shader sorcery)   --glFlush and glFinish all the things (just in case) --Flip buffers --glFlush and glFinish all the things (just in case) (again)   --Delete the new textures (and delete the client side data)   The above was my debugging attempt, and it's obviously redundant with all its flushes/finishes. Even with this code, however, I would get an error where the texture would essentially not exist every n ~= 15 frames. It would just appear black.   My workaround at the time was to just queue the last 10 frames for deletion (so each frame it would delete the texture used 10 frames ago). This "fixed" the problem. However, I want to know why the original issue occurred. Some thoughts/notes: --I want to say it's not because the client-side data is being deleted before being copied into the texture; the client side data was deleted at the end of a frame, after the object has been drawn. The flushes everywhere and the buffer flipping should prevent it being a pipelining issue. However, I can't think of what else it could be. --It's not some compiler reordering breaking it. Without optimizations, the problem still occurred. --The draw-the-frame step actually draws different views into three separate windows, with a shared OpenGL context. The issue only seemed to affect one window at a time, which I found odd. --The issue was not data from the Kinect being broken. This is evidenced by, among other things, queueing for deletion fixing the problem. Also, the Kinect copies its data into a client-side array which is reused, but this array is copied into a freshly-allocated client-side array associated with each texture. When the new OpenGL texture is created, the data is pulled from this newly-allocated array, which persists for the length of the frame and until the texture is deleted. --My drivers are up-to-date; the GPU is NVIDIA GeForce 580M.   I feel like it's some driver-related pipelining issue related to texture upload, but I'm not completely sure. Ideas?   Thanks, -G