• Advertisement

Fast 4x4 Matrix Inverse with SSE SIMD, Explained

Recommended Posts

Hi guys,

I'm writing my math library and implemented some matrix inverse function I would like to share.

The SIMD version I got is more than twice as fast as non-SIMD version (which is what Unreal is using). It is also faster than some other math libraries like Eigen or DirectX Math.

chart.jpg.1cb28518ee800ad1f960c5b9d5e74c56.jpg

(result from my test, the first 3 columns are my methods)

 

If you are interested in either theory or implementation, I put together my math derivation and source code in this post:

https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html

I would appreciate any feedback :)

Edited by lxjk

Share this post


Link to post
Share on other sites
Advertisement

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By shooter9688
      Once I needed a program for packing an atlas with 3d models. I could not find one, so I made it.
      Now it has only basic functionality. Should I improve it further? Does it need someone else?
       
      Link to download(for Windows): https://drive.google.com/open?id=1CLizcUAOsYnbdfyKCYDcGxmso79GPBuv
    • By standinonstilts
      Hi, I am new to Game Development and am currently making my first game in Unity using c#. I am a second year uni student studying computer science (internet security specialization). I am new to unity and have had trouble understanding how the game engine actually functions and how I should use the engine to my advantage when programming. Currently I am making a RPG and want to implement an efficient and scalable item database. My plan is to store all items in the game in an xml database using the built in unity xml serializer. I have an abstract class item -> weapon, armour, potion, ring etc. Each of these classes have respective values (damage, cost etc.). For a relatively generic and straightforward item system: How would you organize your code? What interfaces/classes/other would you implement; why? In your experience what kinds of issues have you run into and how did you work around them? Is there any other advice with regards to rpg design in general?
    • By Descent
      I need to program a game, in  task manager. How do i do it?
    • By hiya83
      (Posted this in graphics forum too, which was perhaps the wrong forum for it)
      Hey, I was wondering if on mobile development (Android mainly but iOS as well if you know of it), if there is a GPUView equivalent for whole system debugging so we can figure out if the CPU/GPU are being pipelined efficiently, if there are bubbles, etc. Also slightly tangent question, but do mobile GPU's have a DMA engine exposed as a dedicated Transfer Queue for Vulkan?
      Thanks!
    • By pabloreda
       
      I am coding the rasterization of triangles by the baricentric coordinate method.
      Look a lot of code and tutorials that are on the web about the optimization of this algorithm.
      I found a way to optimize it that I did not see it anywhere.
      I Only code the painting of triangles without Zbuffer and without textures. I am not comparing speeds and I am not interested in doing them, I am simply reducing the amount of instructions that are executed in the internal loop.
      The idea is simple, someone must have done it before but he did not publish the code or maybe in hardware it is already that way too.
      It should be noted that for each horizontal line drawn, of the three segments, you only need to look at one when going from negative to positive to start drawing and you only need to look at one when it goes from positive to negative when you stop drawing.
      I try it and it works well, now I am implementing a regular version with texture and zbuffer to realize how to add it to this optimization.
      Does anyone know if this optimization is already done?
      The code is in https://github.com/phreda4/reda4/blob/master/r4/Dev/graficos/rasterize2.txt
      From line 92 to 155
       
  • Advertisement