Jump to content

  • Log In with Google      Sign In   
  • Create Account

Erik Rufelt

Member Since 17 Apr 2002
Online Last Active Today, 10:57 AM

#5154766 OpenGL 2.1 / ES 2 streaming vertex buffer update performance

Posted by Erik Rufelt on 20 May 2014 - 02:24 AM

I've found it's often faster to use glBufferData to overwrite the entire buffer rather than use glBufferSubData, even if only part of the buffer actually needs to be updated. If you have lots of data that can be updated, divide it into multiple buffers of for example 256 vertices per buffer and try to use a method that updates as few buffers as possible each frame.

#5154634 Why wouldn't you *just* support raw mouse input?

Posted by Erik Rufelt on 19 May 2014 - 09:01 AM

Provided that they both have the same support, it can be desirable for some users to have the option to depend on control panel settings for mouse movement and smoothing.

#5151804 D3D11 and multiplication order in the GPU

Posted by Erik Rufelt on 06 May 2014 - 07:46 AM

Not sure I understand the question exactly... but mul(vector, matrix) is always multiplied like a single-row vector dot the columns of the matrix.. while mul(matrix, vector) is each row of the matrix dot the column vector (ofcourse, as it's how matrix multiplication works). So mul(vector, matrix) == mul(T(matrix), vector).

Then for multiplication order.. mul(vector, matrix1 * matrix2) == mul(T(matrix2) * T(matrix1), vector).


Whether the driver then somehow behind the scenes rearranges that to fit it's preferred memory layout I don't know but it won't matter for the calculation itself.


Posted by Erik Rufelt on 04 May 2014 - 04:15 AM

hehe. yes


it might not work.

ok, but the idea is work. the previous logic will always be leveled up for the next speculation, always more

but petaflops is few in todays climate to necessary function.


please blog about development process so all can share in future the same

but i think no rock but let always learning conceptually further, but FORWARD

#5149345 Are square roots still really that evil?

Posted by Erik Rufelt on 25 April 2014 - 04:16 AM

Point in sphere and distance checks by themselves can be done by comparing the squared distance to the squared radius, thereby avoiding sqrt.

When you actually need sqrt... it's not very evil on newer desktop processors, but at the same time the other instructions have also gotten faster, so they can still be relatively faster.

There are also special instructions on many newer processors for calculating them. One reference I found put sqrt for a single float in SSE at 19 clockcycles, while an instruction for 1 / sqrt which is only an approximation with some number of bits accuracy only takes 3 cycles so if that would work then it would probably be the fastest way.

#5149196 Performance difference btw tri-list and indexed tris.

Posted by Erik Rufelt on 24 April 2014 - 12:46 PM

Probably not.

#5148703 How to get float value that is less then 1 without the higher part

Posted by Erik Rufelt on 22 April 2014 - 06:43 AM

modf returns the fractional and integer part of a floating point number.

If you do it with subtraction then remember to check that it gives you the answer you want for negative numbers.

#5147803 10-bit Monitors

Posted by Erik Rufelt on 17 April 2014 - 09:20 PM

First, only Quadro and FirePro GPUs support 10 bit output. NOT GeForce or Radeon.


That actually isn't completely true. In fullscreen it works perfectly fine for Geforce and Radeon, and it works with HDMI. It's only 10 bit desktop modes that require the pro cards (like using it in Photoshop). The DirectX SDK has a '10-bit scanout' sample for D3D10 that shows the difference compared to 8-bit for fullscreen gradients, and it's quite a difference for such cases.

#5147510 Optimising my renderer

Posted by Erik Rufelt on 16 April 2014 - 07:41 PM

Am I doing something in-efficiently here? Would it be faster to just use a textured quad instead?



Probably.. but at only 1000 sprites it's quite surprising to see such a huge drop in performance. Do the sprites cover the same amount of screen space in both tests?

Your test seems to scale pretty linearly over the number of sprites, which indicates that the problem is either in setup per sprite, or in fillrate.

If the sprites completely cover each other, perhaps GM optimizes away those behind. Try with like 2x2 sprites instead of 256x256 to confirm whether it can be fillrate.

#5147464 Why discard pixel take a noticeable performance hit?

Posted by Erik Rufelt on 16 April 2014 - 03:06 PM

One reason to do what the op does is that it can give smooth edges on magnified textures, and it can be easily used regardless of rendering order. The reasons for the slowdown seems outlined by others, and I just wanted to point out that I have seen the opposite behavior, where adding discard for alpha < 0.5 increases performance for alpha-blended triangles that have large areas with alpha = 0. However, when alpha-blended geometry is drawn back to front as it was in my case, there is no need for depth writes so there were no conditional depth writes.

Using the technique only on triangles that need it (and if the reason for it is not rendering order, possibly combined with drawing affected geometry last) should limit the performance impact.

#5147124 Passing and Returning Arrays of Bytes

Posted by Erik Rufelt on 15 April 2014 - 09:04 AM

Either that or wrap the key-bytes, like std::vector<char> getKey(). You could have a constant static max-size to simplify using static arrays for the key if it's reasonably short.

#5147108 Passing and Returning Arrays of Bytes

Posted by Erik Rufelt on 15 April 2014 - 08:08 AM

Some variation of this perhaps?

int getKeySize()
bool getKey(void *dest, int destLen)

#5144048 [C++] Is there an easy way to mimic printf()'s behavior?

Posted by Erik Rufelt on 03 April 2014 - 03:43 AM

Perhaps this would work:

template<typename ...Args>
void myPrint(const char *format, Args ...args) {
	char str[255];
	sprintf(str, format, args...);

#5143050 Do Abusing Countries in game is illegal?

Posted by Erik Rufelt on 29 March 2014 - 07:18 AM

If you live in North Korea I wouldn't recommend letting the players smash your homeland. Other than that you're probably reasonably safe, though there might be similar concerns at a few other places in the world.

#5141543 Compiling opengl code using gcc through the command line

Posted by Erik Rufelt on 23 March 2014 - 03:43 PM

so dont put it at the top, put under the others