OpenGL ASM Experiment

Started by
12 comments, last by maxgpgpu 11 years, 10 months ago
Would asm be viable in a runtime situation whereby we have an if -else statement, we could then do this in asm and avoid the doubled call each frame, as it would remove half the calling?. I could see it being better than C++ in such a situation, or would I still be wrong?

It doesn't matter. Any reasonable optimizing C/C++ compiler (including MSVC and GCC) will generate good code for one simple if-else branch.
Advertisement
I will never understand this. People wasting time writting things like that in asm. The compiler will do a way better job than you.
It is interesting to have some understanding of assembler. It's like studying spoken languages, where learning of Latin can be interesting. But most people don't have the time for it.

To optimize, you can almost always get better results spending the effort on algorithms instead.

In the case of OpenGL, trying to optimize while using legacy OpenGL (as OP is doing) is like taking a modern car, replacing the engine with a steam engine, and trying to optimize the steam engine using different qualities of coal.
[size=2]Current project: Ephenation.
[size=2]Sharing OpenGL experiences: http://ephenationopengl.blogspot.com/
Hello dxCUDA. I just noticed your thread. Hey, I have two 64-bit assembly language files in my 3D engine. Most important for you and I, one of them contains a 4x4 matrix multiply function (of f64 AKA double elements), plus a function that transforms my vertices from local to world coordinates. My vertices contain:

1 position
1 zenith vector (normal vector)
1 north vector (tangent vector)
1 east vector (bi-tangent vector)
1 texture-coordinate (just moved from local-vertex to world-vertex)
1 16-bit field of option bits (ditto)
1 texture-ID field (ditto)
1 matrix-ID field (ditto)
1 RGBA color (ditto)

So the function multiplies the input transformation matrix time each position, zenith-vector, north-vector, east-vector in the local-coordinate vertex structure, then stores the result in two world-coordinate vertex structures (one contains f64 == double-precision position/vectors, and the other contains f32 == single precision position/vectors). After the transformation my engine transfers the 32-bit structure to the GPU, then calls glDrawElements().

Anyway, I have not decided whether to make my code open-source yet, but I'm willing to send it to you for education purposes. As I recall, it has quite a few comments at the top about how 64-bit function calls work (where the arguments are, what needs to be preserved, etc). I also have 32-bit versions of the same functions in another file (since I can compile both 32-bit and 64-bit versions of my engine). I also have C code for the matrix multiply, and somewhere in my engine is equivalent C code for the vertex transformation function, so you can compare if you wish.

I was absolutely blown away when I benchmarked these routines. It takes only a few nanoseconds to transform the position and three vectors in each vertex from local to world coordinates, and save it in both 64-bit and 32-bit form (the local-coords input is 64-bit form)... plus transfer the other fields too. And that is only running on one of my 8 cores so far!

PS: I don't think I have a 64-bit version in MASM yet, because my windoze computer is still windoze XP 64-bit edition, which DOES NOT support 16 SIMD registers and DOES NOT support the wider AVX/ymm registers (which hold and process four f64 values at once). What I definitely do have is 64-bit version in GAS (linux syntax), as well as 32-bit versions in GAS and MASM.

If you're interested, let me know. And we can chat on skype if you wish to pick my brain about this topic.

This topic is closed to new replies.

Advertisement