Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 04 Oct 2010
Offline Last Active Jul 24 2016 03:32 PM

#5292929 c++: sprintf but return chars. Possible?

Posted by on 22 May 2016 - 02:54 PM

If you are using C++ then you should be using iostream instead of stdio. Alternatively you could at least use snprintf.

#5268979 Criticism of C++

Posted by on 03 January 2016 - 06:48 AM

There is nothing wrong with C++.


C++ is love, C++ is life.

#5267106 What will change with HDR monitor ?

Posted by on 20 December 2015 - 12:58 AM

10-bit monitors aren't HDR. A HDR monitor is one that makes you go blind if you look at a picture of the sun.

#5266571 Opengl 2d texture bleeding

Posted by on 15 December 2015 - 05:58 PM

In 2015 where have you found hardware that only supports OpenGL up to 1.4?

#5263189 Phong versus Screen-Space Ambient Occlusion (with source code)

Posted by on 22 November 2015 - 03:51 PM

Which do you like better?


I prefer apples to oranges.

#5250505 Most efficient way of designing a vector class in 3D

Posted by on 03 September 2015 - 07:42 PM

If you really need vector types like this then I would say just use GLM, but if you want performance then I would say stay away from class VectorN...


If you do want a simple vector type and you don't want to use GLM you could use the following in Clang anyway:


using Vector3 = float __attribute__((ext_vector_type(3)));
//Vector3 is pretty much identical to vec3 or float3 from GLSL or HLSL

#5249999 C++ duck typing with optional members

Posted by on 31 August 2015 - 03:37 PM

But what if you want to try and access a member, and only use it if it actually exists?


Then you would use Objective C instead.

#5247985 Fast sqrt for 64bit

Posted by on 21 August 2015 - 12:28 AM

If you really need a square root fast and you aren't to concerned about the precision you can use this. May actually not be that much faster depending on your architecture.

//use 0x1FBD1DF5 for (0.0, 1.0) RME: 1.42%
//use 0x1FBD22DF for (0.0, 1000.0) RME: 1.62%
inline float sqrt(float f)


    int i = 0x1FBD1DF5 + (*(int*)&f >> 1);

    return *(float*)&i;




Edit: If you want to do the same for doubles you might have luck with this magic number 0x1FF776E75B46DF3D.

#5247967 Fast sqrt for 64bit

Posted by on 20 August 2015 - 06:34 PM

I can pretty much guarantee that sqrt is not the source of your performance problems...

Micro-optimizations like "fast SQRT" rarely get you anything these days. Modern processors are so fast that you're going to get much more bang for your buck by looking at memory usage patterns and lowering cache misses.


I was writing some SIMD code the other day and replacing a single instruction with an approximation gave me a 25% performance boost. "Fast SQRT" could be really damn useful if you are calculating 4 billion of them every second.





float Sqrt(float x){
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    float r = *(float*)&i;
    r = r*(1.5f - 0.5f*x*r*r);
    return r * x;
Just don't ask me how it works...


That code computes a fast approximation to 1/sqrt(x) and multiplies it by x.


In modern CPUs casting from float to int and then back will cause moving data among different registers and potentially memory, which can cause stalls.

I wouldn't be surprised if it performs poorly today.



SSE2 and AXV2 give you integer SIMD instructions that operate on the same register file as the floating point instructions. This means you get float -> int punning for free. See clb's post below, apparently on Intel anyway there is a penalty for switching between float and integer SIMD instructions. It seems this has to do with the way their pipeline is structured and how forwarding of results from one operation to the next is handled.

#5244044 Trilinear texture filtering

Posted by on 01 August 2015 - 02:16 PM

I think I'd be more interested in how they manage to hide the memory latency considering that in the case of a monochrome texture they have to read 8 bytes, maybe from 8 different cache lines.

#5243901 Trilinear texture filtering

Posted by on 31 July 2015 - 03:40 PM

I was wondering if anyone knew where I could find some information on how trilinear texture filtering is implemented on GPUs. I remember that in the past GPU vendors would claim that their cards could perform a trilinear filtered texture sample per cycle. It would be interesting to know the architectural details of how that was accomplished and how things may have changed now that GPU architectures have become more general purpose. Information on how it might be efficiently implemented in software using SIMD would also be welcome.

#5237778 mipmaping a texture after being writen by FBO

Posted by on 30 June 2015 - 07:54 PM

well it doesn't seem to work


I'm sorry to hear that.

#5224112 Glossy reflections - how to do a proper blur

Posted by on 17 April 2015 - 07:14 PM


#5215501 Vulkan is Next-Gen OpenGL

Posted by on 09 March 2015 - 03:20 PM


No, Intel dedicates a huge portion of the die to the HD/Iris graphics cores.

According to AnandTech's article on Iris Pro (see last paragraph), Intel is dedicating somewhere around 65% of their total die area to the GPU in this generation.



Yeah, and it feels like such a waste of space and transistors when you have a dedicated GPU and it goes unused. Those transistors would be better spent on more cores. 16 core i7 when? tongue.png

#5215466 Vulkan is Next-Gen OpenGL

Posted by on 09 March 2015 - 12:35 PM

It seems you can explicitly query&use different GPUs in the system. Does that mean I can use dedicated + integrated GPUs in the same application? 


That's the idea. You should even be able to use radically different GPUs from different vendors together, though the amount of support for this will very.


Thinking about it, do integrated GPUs like Intel's HD/Iris use separate hardware on the die or do they basically use the vector stuff (AVX/SSE) of the available cores?


No, Intel dedicates a huge portion of the die to the HD/Iris graphics cores.