Jump to content

  • Log In with Google      Sign In   
  • Create Account

14 years ago on June 15th Gamedev.net was first launched! We want to thank all of you for being part of our community and hope the best years are ahead of us. Happy birthday Gamedev.net!

Aks9

Member Since 10 Jul 2009
Online Last Active Today, 12:59 AM
-----

Posts I've Made

In Topic: HW accel Matrices on android (NDK)

13 June 2013 - 07:49 AM

Ah yes, I thought that you were storing vertices in double-precision format.

I guess you're reading in some compact data (e.g. 16-bit elevation), doing a bunch of double-precision trasforms on it, then outputting 32-bit floats?

That's much less offensive to performance than what I assumed you were doing cool.png

Nope! In fact I'm generating terrain completely on the GPU. Only 16-bit elevation data, and different overlays are sent through textures. Everything is rendered without a single attribute (in a GLSL sense). CPU calculates precise position on the globe and relevant parameters used to generate full ellipsoid calculation and height correction on the GPU per vertex. Everything is done using FP on the GPU side, but coefficients are calculated on the CPU in DP, downcasted to FP, and sent to GPU as uniforms. Once again, no attributes are used. The representation cannot be more compact. But I still need DP to do accurate math on the CPU.

 

 

While on this topic though, it's worth noting that some compilers, such as MSVC, actually output really horribly bad assembly code when you use floats, depending on the compiler settings. MSVC has "Enhanced Instruction Set" and "Floating point model". With the FP model set to "strict" or "precise", then it will produce assembly code with a LOT of redundant instructions to take every 80-bit intermediate values and round it down to 32-bit precision, so it your code behaves as if the FPU actually used 32-bit precision internally. When using double, it doesn't bother with all this redundant rounding code, which can actually make double seem like it's much faster than float!

Personally, I always set the instruction set to SSE2 and the FP model to "fast", which makes MSVC produce more sensible x86 code for floats.

Thank you for the advice! Although I've been using VS since version 4.1, I have never had need to tweak compiler options. I'll try what you have suggested! wink.png


In Topic: HW accel Matrices on android (NDK)

12 June 2013 - 03:52 AM

I didn't quite note just that -- I pointed out that they have the potential to halve performance, because memory bandwidth is usually more of a bottleneck than CPU speed. 

 

We again misunderstood each other. I don't "promote" double precision models, just calculation. There is no impact on the bandwidth since only few floats are sent to the GPU.

 

 

Neither floats or doubles are a great choice for storing globe surface points relative to the globe, because both formats dedicate the bulk of their precision to representing points within the globe's core. What a waste!
The surface of earth only varies vertically by about 20km, so if you need sub-metre height accuracy you could use a 16-bit int to store the height difference from average, or a 32-bit int would give you near-micron accuracy.
If you need the globe vertices displaced horizontally as well as vertically, then you could then compliment the height with two spherical coordinates, or smoothed-cube coordinates that are trendy in planetary renderers.
Why? Because a more efficient storage format takes up less space, and efficiency in memory layouts is one of the primary optimisations on modern computers (arguably more important that reducing CPU cycles -- in relative terms of bandwidth per CPU cycle, memory is getting slower and slower every day...). 

 

Can you elaborate this, please?

 I'm already using 16-bit storage for the height map (DEM). It is enough for 0.14m accuracy on the global level (without need for average block values or differential coding). Quite enough for the global elevation data currently available.

 

Keep in mind I only jumped in here because you claimed that all applications you've developed required double precision -- that seems to be the same generalization on the other side of the fence wink.png

 

You are right about this. Sorry! wink.png


In Topic: HW accel Matrices on android (NDK)

12 June 2013 - 01:58 AM

To quote Tom Forsyth - "Double precision has no place in games. If you think you need double precision, you either need 64-bit fixed point, or you don't understand the algorithm."

He's being a bit facetious; they might occasionally have a use... but they definitely should not be your default choice, especially on 32-bit architectures. In my experience, doubles are very, very, very rarely used in games (and as Tom says, when you do see them used, it's often done without understanding -- "oh float was having trouble so I just changed it to double"). Float and double have a huge range, yes, but their logarithmic precision is usually not the most efficient choice.

 

 

I agree that doubles are more expensive than floats on GPUs (not on CPUs, as you already noted), but they make many things easier and some of them even faster on a GPU. When precision is needed, using doubles instead of single-double floats is significantly faster.

 

In my example, how would one calculate (and draw) precise position (of the vertices) on the globe (where even radius cannot be represented with a meter accuracy with floats) without doubles on the CPU side? Maybe with some mathematical gymnastics. But what for? The calculation is done at the same speed on the CPU side, only floats (just several floating point values for hundreds of thousands of vertices generated on a GPU) are transfered to a GPU and everything is done using FP arithmetics on the GPU side? With all respect to you and Tom Forsyth, it does not make any sense. Please, before disapprove something generally, consider cases when and where it might be a better solution.

 

Thank you for the link! I'll read it carefully. smile.png


In Topic: HW accel Matrices on android (NDK)

11 June 2013 - 02:44 PM

Thank you guys for the replies, but I still have to clear up something... 

I hope you don't mind, and the whole community could benefit from such polemics. rolleyes.gif

 

1. If you're processing a large mesh, transforming, distorting or whatever other usage you've encountered then yes matrix operations can dominate in that situation. hdxpete hasn't said that he's doing so it might be that he simply wants to use an optimised library throughout his project right from the start, rather than use one he writes himself. Why pick a slower option if there's an existing better one?

I also thought at the first glance he has something seriously do with matrix calculus, but from the rest of his posts I really doubt it is so.

Transforming large meshes? On a CPU? Could you give an example? I really think that should be done on a GPU. 

 

 

 

2. All applications you've developed required double precision? Then you're missing out on SIMD performance benefits, wasting memory needlessly and not running on a console/mobile platform. The only situation I currently encounter the usage of doubles is in space games and even there I'm pushing to get as much as possible back into float to avoid the conversions between double precision generation of the terrain and the floating point vector representation for rendering.

I have to admit that I've never developed for a console platform, but on a PC I have never noticed a performance lost when using doubles instead of single precision floats. As I've already said, those operations are rare to be noticed. Texture streaming, for example, is something more noticeable than calculation of a transformation matrix. And that's where I'm using SIMD libraries. For example, for texture compression. BTW, a large terrain rendering is a focus of my graphics programming currently.

 

3. This is a valid question however it does come with the caveat that simply using a decent SIMD maths library might also force you to write code in a more data centric way so having it from the start might also be a good idea.

It is not a bad idea per se, I just pointed out this could make complications without a real need for that. smile.png 

 

 

 

 

2. If you actually need double precision for a matrix, then you're either doing science (and need the accuracy), or you're doing something very wrong indeed. It may be that you have a huge game world, and are losing precision when the position is far from the origin (in which case, store an integer based 3D grid reference along with the matrix), otherwise the problem is possibly something that could be fixed by reordering your equations for better accuracy, or a simple orthogonalise might be whats needed.

 

3. Double precision typically halves the performance of your code (doubles the amount of time spent reading / writing data). If your ultimate criteria is speed, double precision is not a good idea....
 

 

Currently I'm working on a high precision massive terrain algorithm. I've modeled Earth based on WGS84 ellipsoid, geoid undulation, and high precision (submeter precision for the whole Earth) DEM with the ability to place the viewer 1 micron above the surface without visible artifacts. Of course, there is no need to place viewer on such hight, but it is just a demonstration of the power. smile.png 

Math is done partially on the CPU (in double precision, only for the viewer position) and partially on the GPU (in single precision for all vertices of the terrain). Frame-rate is really high so I'm probably doing things right.

I know that double precision on a GPU requires SM5 cards, and is at least two times slower (in fact the factor is much higher and depends on a concrete architecture). That's why I'm using single precision on a GPU. But on the CPU, I have never had performance problems with DP calculations. On the other hand, things I'm working on would be impossible without DP.


In Topic: HW accel Matrices on android (NDK)

10 June 2013 - 04:52 AM

I'm sorry for interrupting this nice and polite conversation, but I still don't understand why hdxpete needs SSE or similar instruction to multiply two 4x4 matrices.

 

1. The system is as slow as its slowest part. Matrix multiplication is very rare operation in common visualization system, so improving calculation time for a fraction of ms will not improve overall performance.

 

2. I haven't seen other approaches, but XMMATRIX uses floats for arguments. In all application I have developed I needed double precision for the matrices calculations. This also discredits usage of XMMATRIX.

 

3. Why do you think you really need faster matrix manipulation library? Did you try to benchmark your application and find bottleneck? Did you ever experience performance problems with matrix operations?


PARTNERS