Jump to content

  • Log In with Google      Sign In   
  • Create Account

Erik Rufelt

Member Since 17 Apr 2002
Offline Last Active Today, 07:35 AM

#5170849 1px Bordered Quad

Posted by Erik Rufelt on 01 August 2014 - 05:38 AM

Probably the rasterization rules that only draw pixels where the center is inside it (or perhaps the edges, check the rasterization rules for your DX version). Try offsetting all coords a half pixel (dx and dy are half pixel offsets). As of now you begin your line on a pixel edge, and include half a pixel in each direction, so you include two halves of two different adjacent pixels, instead of including one full pixel.


When you do 2.0f / width then you include one full pixel in each direction, which includes the pixel centers of the adjacent pixels in both directions.

#5170399 Equal distribution of points on the surface of a sphere

Posted by Erik Rufelt on 30 July 2014 - 11:12 AM

You can repeatedly subdivide an http://en.wikipedia.org/wiki/Icosahedron to get a sphere covered in equally sized triangles. Something like the following http://en.wikipedia.org/wiki/Geodesic_grid

#5169298 Declaring temporary variable to save 1 multiply?

Posted by Erik Rufelt on 26 July 2014 - 08:36 AM

The compiler will do that automatically when it makes sense, if optimizations are turned on. If the temporary doesn't make sense it will probably remove it and turn the second example into the first example.


In general however I would probably recommend using the temporary for expensive operations.. but I'm not sure a multiply applies.. if it was sqrt or sin/cos I would use the temporary.


EDIT: For float operations there is one more thing to consider.. the order of operations can slightly alter the results (as floats are approximations). For example (a * b) / c is not necessarily equal to a * (b / c). Therefore using a temporary can often be a good idea, as the compiler can be prevented from performing optimizations it could normally do (though this can depend on compiler settings).

#5169003 Terraforming by aliens

Posted by Erik Rufelt on 24 July 2014 - 05:13 PM

From our perspective I guess it would be a derivative of the name we have for them, or for the process they perform or the substance they replace or insert. Like deoxygenize or acidize.

#5156021 What am I forgetting to optimize?

Posted by Erik Rufelt on 26 May 2014 - 09:08 AM

Your screenshot indicates you're running on Intel graphics, if you're on a laptop with dual graphics, make sure your app uses the right GPU.

#5154766 OpenGL 2.1 / ES 2 streaming vertex buffer update performance

Posted by Erik Rufelt on 20 May 2014 - 02:24 AM

I've found it's often faster to use glBufferData to overwrite the entire buffer rather than use glBufferSubData, even if only part of the buffer actually needs to be updated. If you have lots of data that can be updated, divide it into multiple buffers of for example 256 vertices per buffer and try to use a method that updates as few buffers as possible each frame.

#5154634 Why wouldn't you *just* support raw mouse input?

Posted by Erik Rufelt on 19 May 2014 - 09:01 AM

Provided that they both have the same support, it can be desirable for some users to have the option to depend on control panel settings for mouse movement and smoothing.

#5151804 D3D11 and multiplication order in the GPU

Posted by Erik Rufelt on 06 May 2014 - 07:46 AM

Not sure I understand the question exactly... but mul(vector, matrix) is always multiplied like a single-row vector dot the columns of the matrix.. while mul(matrix, vector) is each row of the matrix dot the column vector (ofcourse, as it's how matrix multiplication works). So mul(vector, matrix) == mul(T(matrix), vector).

Then for multiplication order.. mul(vector, matrix1 * matrix2) == mul(T(matrix2) * T(matrix1), vector).


Whether the driver then somehow behind the scenes rearranges that to fit it's preferred memory layout I don't know but it won't matter for the calculation itself.


Posted by Erik Rufelt on 04 May 2014 - 04:15 AM

hehe. yes


it might not work.

ok, but the idea is work. the previous logic will always be leveled up for the next speculation, always more

but petaflops is few in todays climate to necessary function.


please blog about development process so all can share in future the same

but i think no rock but let always learning conceptually further, but FORWARD

#5149345 Are square roots still really that evil?

Posted by Erik Rufelt on 25 April 2014 - 04:16 AM

Point in sphere and distance checks by themselves can be done by comparing the squared distance to the squared radius, thereby avoiding sqrt.

When you actually need sqrt... it's not very evil on newer desktop processors, but at the same time the other instructions have also gotten faster, so they can still be relatively faster.

There are also special instructions on many newer processors for calculating them. One reference I found put sqrt for a single float in SSE at 19 clockcycles, while an instruction for 1 / sqrt which is only an approximation with some number of bits accuracy only takes 3 cycles so if that would work then it would probably be the fastest way.

#5149196 Performance difference btw tri-list and indexed tris.

Posted by Erik Rufelt on 24 April 2014 - 12:46 PM

Probably not.

#5148703 How to get float value that is less then 1 without the higher part

Posted by Erik Rufelt on 22 April 2014 - 06:43 AM

modf returns the fractional and integer part of a floating point number.

If you do it with subtraction then remember to check that it gives you the answer you want for negative numbers.

#5147803 10-bit Monitors

Posted by Erik Rufelt on 17 April 2014 - 09:20 PM

First, only Quadro and FirePro GPUs support 10 bit output. NOT GeForce or Radeon.


That actually isn't completely true. In fullscreen it works perfectly fine for Geforce and Radeon, and it works with HDMI. It's only 10 bit desktop modes that require the pro cards (like using it in Photoshop). The DirectX SDK has a '10-bit scanout' sample for D3D10 that shows the difference compared to 8-bit for fullscreen gradients, and it's quite a difference for such cases.

#5147510 Optimising my renderer

Posted by Erik Rufelt on 16 April 2014 - 07:41 PM

Am I doing something in-efficiently here? Would it be faster to just use a textured quad instead?



Probably.. but at only 1000 sprites it's quite surprising to see such a huge drop in performance. Do the sprites cover the same amount of screen space in both tests?

Your test seems to scale pretty linearly over the number of sprites, which indicates that the problem is either in setup per sprite, or in fillrate.

If the sprites completely cover each other, perhaps GM optimizes away those behind. Try with like 2x2 sprites instead of 256x256 to confirm whether it can be fillrate.

#5147464 Why discard pixel take a noticeable performance hit?

Posted by Erik Rufelt on 16 April 2014 - 03:06 PM

One reason to do what the op does is that it can give smooth edges on magnified textures, and it can be easily used regardless of rendering order. The reasons for the slowdown seems outlined by others, and I just wanted to point out that I have seen the opposite behavior, where adding discard for alpha < 0.5 increases performance for alpha-blended triangles that have large areas with alpha = 0. However, when alpha-blended geometry is drawn back to front as it was in my case, there is no need for depth writes so there were no conditional depth writes.

Using the technique only on triangles that need it (and if the reason for it is not rendering order, possibly combined with drawing affected geometry last) should limit the performance impact.