HW accel Matrices on android (NDK)

Started by
17 comments, last by Aks9 10 years, 10 months ago

on windows/directx i have XMMATRIX. on ios i have GLKit and GLKMATRIX. i can't seem to find a hardware accelerated matrix library (for example on armv7 uses NEON extension for matrix multiply) for android that i can use on the C++/NDK side. anyone have any recommendations?

thanks a bunch.

Advertisement

Can you explain, please, how XMMATRIX, or any of the mentioned structures are hardware accelerated?

I assume that by hardware acceleration he means that they use SIMD.

There's some answers over here: http://stackoverflow.com/questions/981787/good-portable-simd-library

Hodgman,

I do appreciate the link to the simd library discussion. We don't currently use simd however i would be willing to use it if i can get a performance increase out of it. i was looking for something that does have the matrix math functions written in assember. but i guess i'll have to dive into the assembler myself. should be fun ^_^

for example on armv7 uses NEON extension for matrix multiply

NEON is a SIMD instruction set, which is why I thought that's what you were asking for tongue.png
That stackoverflow discussion is talking about math libraries that have been ported to use NEON intrisics (and SSE, AltiVec, etc).

Math code that's hand-written in assembly won't be any better/worse that math code that's carefully written in C/C++. I'd recommend just using a higher level language and looking at the assembly output from your compiler to double-check that it's doing an OK job (and if it's not, don't resort to writing asm yourself, tweak the high level code so that the compiler performs better).

To use CPU-specific assembly instructions (like NEON ones) from high-level code, compilers provide "intrinsic functions". E.g. GCC's ones for NEON are here.

Hodgman,

you truely are the "man" thanks a bunch.

I'm sorry for interrupting this nice and polite conversation, but I still don't understand why hdxpete needs SSE or similar instruction to multiply two 4x4 matrices.

1. The system is as slow as its slowest part. Matrix multiplication is very rare operation in common visualization system, so improving calculation time for a fraction of ms will not improve overall performance.

2. I haven't seen other approaches, but XMMATRIX uses floats for arguments. In all application I have developed I needed double precision for the matrices calculations. This also discredits usage of XMMATRIX.

3. Why do you think you really need faster matrix manipulation library? Did you try to benchmark your application and find bottleneck? Did you ever experience performance problems with matrix operations?

1. If you're processing a large mesh, transforming, distorting or whatever other usage you've encountered then yes matrix operations can dominate in that situation. hdxpete hasn't said that he's doing so it might be that he simply wants to use an optimised library throughout his project right from the start, rather than use one he writes himself. Why pick a slower option if there's an existing better one?

2. All applications you've developed required double precision? Then you're missing out on SIMD performance benefits, wasting memory needlessly and not running on a console/mobile platform. The only situation I currently encounter the usage of doubles is in space games and even there I'm pushing to get as much as possible back into float to avoid the conversions between double precision generation of the terrain and the floating point vector representation for rendering.

3. This is a valid question however it does come with the caveat that simply using a decent SIMD maths library might also force you to write code in a more data centric way so having it from the start might also be a good idea.

"Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile"

"Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgement difficult."

I'm sorry for interrupting this nice and polite conversation, but I still don't understand why hdxpete needs SSE or similar instruction to multiply two 4x4 matrices.

1. The system is as slow as its slowest part. Matrix multiplication is very rare operation in common visualization system, so improving calculation time for a fraction of ms will not improve overall performance.

The main benefit of SIMD is not computation time (although it's a very nice bonus), but the amount of time you spend reading / writing data. Loading 4xfloat as a packed register is much quicker than the FPU equivalent. I'd also say, that in the case of a matrix multiply, it's a function that is a prime candidate for SIMD, since it's used all over the place. Generally, you always optimise the biggest bottlenecks first, and typically memory access is a bigger bottleneck than computation time.

2. I haven't seen other approaches, but XMMATRIX uses floats for arguments. In all application I have developed I needed double precision for the matrices calculations. This also discredits usage of XMMATRIX.

1. It is entirely possible to use double precision SIMD instructions (ok, maybe not on NEON).

2. If you actually need double precision for a matrix, then you're either doing science (and need the accuracy), or you're doing something very wrong indeed. It may be that you have a huge game world, and are losing precision when the position is far from the origin (in which case, store an integer based 3D grid reference along with the matrix), otherwise the problem is possibly something that could be fixed by reordering your equations for better accuracy, or a simple orthogonalise might be whats needed.

3. Double precision typically halves the performance of your code (doubles the amount of time spent reading / writing data). If your ultimate criteria is speed, double precision is not a good idea....

Thank you guys for the replies, but I still have to clear up something...

I hope you don't mind, and the whole community could benefit from such polemics. rolleyes.gif

1. If you're processing a large mesh, transforming, distorting or whatever other usage you've encountered then yes matrix operations can dominate in that situation. hdxpete hasn't said that he's doing so it might be that he simply wants to use an optimised library throughout his project right from the start, rather than use one he writes himself. Why pick a slower option if there's an existing better one?

I also thought at the first glance he has something seriously do with matrix calculus, but from the rest of his posts I really doubt it is so.

Transforming large meshes? On a CPU? Could you give an example? I really think that should be done on a GPU.

2. All applications you've developed required double precision? Then you're missing out on SIMD performance benefits, wasting memory needlessly and not running on a console/mobile platform. The only situation I currently encounter the usage of doubles is in space games and even there I'm pushing to get as much as possible back into float to avoid the conversions between double precision generation of the terrain and the floating point vector representation for rendering.

I have to admit that I've never developed for a console platform, but on a PC I have never noticed a performance lost when using doubles instead of single precision floats. As I've already said, those operations are rare to be noticed. Texture streaming, for example, is something more noticeable than calculation of a transformation matrix. And that's where I'm using SIMD libraries. For example, for texture compression. BTW, a large terrain rendering is a focus of my graphics programming currently.

3. This is a valid question however it does come with the caveat that simply using a decent SIMD maths library might also force you to write code in a more data centric way so having it from the start might also be a good idea.

It is not a bad idea per se, I just pointed out this could make complications without a real need for that. smile.png

2. If you actually need double precision for a matrix, then you're either doing science (and need the accuracy), or you're doing something very wrong indeed. It may be that you have a huge game world, and are losing precision when the position is far from the origin (in which case, store an integer based 3D grid reference along with the matrix), otherwise the problem is possibly something that could be fixed by reordering your equations for better accuracy, or a simple orthogonalise might be whats needed.

3. Double precision typically halves the performance of your code (doubles the amount of time spent reading / writing data). If your ultimate criteria is speed, double precision is not a good idea....

Currently I'm working on a high precision massive terrain algorithm. I've modeled Earth based on WGS84 ellipsoid, geoid undulation, and high precision (submeter precision for the whole Earth) DEM with the ability to place the viewer 1 micron above the surface without visible artifacts. Of course, there is no need to place viewer on such hight, but it is just a demonstration of the power. smile.png

Math is done partially on the CPU (in double precision, only for the viewer position) and partially on the GPU (in single precision for all vertices of the terrain). Frame-rate is really high so I'm probably doing things right.

I know that double precision on a GPU requires SM5 cards, and is at least two times slower (in fact the factor is much higher and depends on a concrete architecture). That's why I'm using single precision on a GPU. But on the CPU, I have never had performance problems with DP calculations. On the other hand, things I'm working on would be impossible without DP.

This topic is closed to new replies.

Advertisement