Recommended Posts

I've been starting to optimize my code in anyway possible. I saw that some cpu's actually have 256-bit SIMD, but I was wondering if there is a way to detect this and fallback to the 128-bit on an unsupported cpu, or how else to deal with this.

Share this post


Link to post
Share on other sites

Remember, the intrinsics will determine the code to generate at compile time.  If you want runtime fallback in a single executable you need to use cpuid-based feature detection as well.

Share this post


Link to post
Share on other sites
Quote

If you want runtime fallback in a single executable you need to use cpuid-based feature detection as well.

Right, that was on that website, that was what I was looking for run-time detection.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628714
    • Total Posts
      2984357
  • Similar Content

    • By hiya83
      (Posted this in graphics forum too, which was perhaps the wrong forum for it)
      Hey, I was wondering if on mobile development (Android mainly but iOS as well if you know of it), if there is a GPUView equivalent for whole system debugging so we can figure out if the CPU/GPU are being pipelined efficiently, if there are bubbles, etc. Also slightly tangent question, but do mobile GPU's have a DMA engine exposed as a dedicated Transfer Queue for Vulkan?
      Thanks!
    • By pabloreda
       
      I am coding the rasterization of triangles by the baricentric coordinate method.
      Look a lot of code and tutorials that are on the web about the optimization of this algorithm.
      I found a way to optimize it that I did not see it anywhere.
      I Only code the painting of triangles without Zbuffer and without textures. I am not comparing speeds and I am not interested in doing them, I am simply reducing the amount of instructions that are executed in the internal loop.
      The idea is simple, someone must have done it before but he did not publish the code or maybe in hardware it is already that way too.
      It should be noted that for each horizontal line drawn, of the three segments, you only need to look at one when going from negative to positive to start drawing and you only need to look at one when it goes from positive to negative when you stop drawing.
      I try it and it works well, now I am implementing a regular version with texture and zbuffer to realize how to add it to this optimization.
      Does anyone know if this optimization is already done?
      The code is in https://github.com/phreda4/reda4/blob/master/r4/Dev/graficos/rasterize2.txt
      From line 92 to 155
       
    • By FFA702
      Hi. It's been a while since I posted here, and my last posts are almost about this exact same subject. Just saying to demonstrate how annoying this is to me.
      Here is the problem : I'm trying to make a decent raycaster in C#. The main issue is that for this to happen, I need pixel by pixel drawing. My previous raycaster used VS GDI+, and trough several tricks involving pointers and filling a bitmap byte by byte, I was able to obtain half decent results, and make an online server-client style 3d engine complete with a map builder and several really cool features. I unfortunately wasn't able to expand the project further due to poorly written code (I am an hobbyist, I study Business Administration at Uni) and the fact that my quick hack for performance was barely able to carry the bare minimum of what I needed to make a very bare bone raycaster possible. This came with very real sadness, the realization that the project I spent almost 2 years on was essentially useless, bloated and impossible to expand on. 
      Enough background. Now, starting anew, my main concern is to find a way to gain fast pixel by pixel control over the screen. I'm using SFML and C#. My current testbench is pretty simple, I'm using a method I found on the internet written for C++. I Adapted it for C#. I'm filling a Color[,] array (each color is a pixel) and then I copy the RGB values inside a byte[] array before moving them inside the texture buffer. I then display the texture on the screen. I'm not sure what the bottleneck is, the application is faster than my previous one, but it's still too slow for my liking. Raycasters work by redrawing stuff ontop of other stuff, and I fear that adding more stuff would creep it to an halt. I'm posting what I have as a testbench right now, any help would be greatly appreciated. Keep in mind I am not a professional programmer by any mean, I am pretty uncomfortable with pointers and true 3d stuff, but I will use them if I must. 
      using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using SFML.Audio; using SFML.Graphics; using SFML.System; using SFML.Window; namespace RayCastFF { class Program { public static int TestCounter = 0; public static Color[,] ScreenBuffer = new Color[640, 360]; //an array containing the color of all the pixel, this is intended to be the main target of all manipulation and draw call public static Texture MainViewPort = new Texture(640, 360);//main screen texture unsafe static void Main(string[] args) { //MAINWINDOW SETUP RenderWindow window = new RenderWindow(new VideoMode(640, 360), "RayCaster"); while (window.IsOpen)//MAIN GAME LOOP { //CALL FOR UPDATE Update(); //DRAW window.Clear(); window.DispatchEvents(); Sprite mainviewport = new Sprite(MainViewPort); window.Draw(mainviewport);//draw the texture over the screen window.Display(); //TAKE INPUT } } static void Update() { TestCounter++; if (TestCounter > 639) { TestCounter = 0; } //RESET THE BUFFER (COULD BE REMOVED LATER I GUESS) for (int x = 0; x < 640; x++) { for (int y = 0; y < 360; y++) { ScreenBuffer[x, y] = Color.Black; } } //DO STUFF DrawLine(Color.Red, TestCounter, 200, 100); //(for this test, i simply draw a moving line) //WRITING THE BUFFER INTO THE IMAGE //THIS SHOULD ALWAYS BE THE LAST STEP OF THE UPDATE METHOD byte[] pixels = new byte[640 * 360 * 4]; //640 x 360 pixels x 4 bytes per pixel Color[] cpixels = new Color[640 * 360];//intermediary step to keep everything clear for (int x = 0; x < 640; x++) { for (int y = 0; y < 360; y++) { cpixels[x+(640*y)] = ScreenBuffer[x, y];//make an intermediary array the correct dimention and arrange the pixels in the correct position to be drawn (separate step to keep everything clean, I find this operation incredibly confusing mainly because I had no idea how the pixels are supposed to be arrenged in the first place(still kind of dont)) } } for (int i = 0; i < 640 * 360 * 4; i += 4)//fill the byte array { pixels[i + 0] = cpixels[i / 4].R; pixels[i + 1] = cpixels[i / 4].G; pixels[i + 2] = cpixels[i / 4].B; pixels[i + 3] = cpixels[i / 4].A; } MainViewPort.Update(pixels);//update the texture with the array } //[X , Y] static void DrawLine(Color color, int Wpos, int Ytop, int Ybottom)//simple draw method making a vertical line { for (int y = Ybottom; y < Ytop; y++) { ScreenBuffer[Wpos, y] = color; } } } } What I'd like to end up with is a very fast way to draw the pixels on the window by the abstraction of a single 2d array of 640x360 unit that I could easily and simply manipulate. However, while being simple, it's also somewhat slow. It's also using 30% GPU load for some reason on a 1070GTX 8GB. Again, any help would be greatly appreciated.
      Thanks in advance.
    • By Tanzan
      Hello all,
      My question is a bit hard to describe but hopefully it will do...
      I just wonder what you guys think is the 'best' way of getting info about the model in your view(s)..
      To clearify (i hope ;-) )
      If the model is updating itself every game-cycle and the  (deep) nested objects all do there jobs how do you get info where only the view is interested in?
      So my question is not how to do it but more what do you people think is the best way to do it ?
       
      Regards,
       
      Alex   
    • By aejt
      Sorry for making a new thread about this, but I have a specific question which I couldn't find an answer to in any of the other threads I've looked at.
      I've been trying to get the method shown here to work several days now and I've run out of things to try.
      I've more or less resorted to using the barebones example shown there (with some very minor modifications as it wouldn't run otherwise), but I still can't get it to work. Either I have misunderstood something completely, or there's a mistake somewhere.
       
      My shader code looks like this:
      Vertex shader:
      #version 330 core //Vertex shader //Half the size of the near plane {tan(fovy/2.0) * aspect, tan(fovy/2.0) } uniform vec2 halfSizeNearPlane; layout (location = 0) in vec3 clipPos; //UV for the depth buffer/screen access. //(0,0) in bottom left corner (1, 1) in top right corner layout (location = 1) in vec2 texCoord; out vec3 eyeDirection; out vec2 uv; void main() { uv = texCoord; eyeDirection = vec3((2.0 * halfSizeNearPlane * texCoord) - halfSizeNearPlane , -1.0); gl_Position = vec4(clipPos.xy, 0, 1); } Fragment shader:
      #version 330 core //Fragment shader layout (location = 0) out vec3 fragColor; in vec3 eyeDirection; in vec2 uv; uniform mat4 persMatrix; uniform vec2 depthrange; uniform sampler2D depth; vec4 CalcEyeFromWindow(in float windowZ, in vec3 eyeDirection, in vec2 depthrange) { float ndcZ = (2.0 * windowZ - depthrange.x - depthrange.y) / (depthrange.y - depthrange.x); float eyeZ = persMatrix[3][2] / ((persMatrix[2][3] * ndcZ) - persMatrix[2][2]); return vec4(eyeDirection * eyeZ, 1); } void main() { vec4 eyeSpace = CalcEyeFromWindow(texture(depth, uv).x, eyeDirection, depthrange); fragColor = eyeSpace.rbg; } Where my camera settings are: float fov = glm::radians(60.0f); float aspect = 800.0f / 600.0f; And my uniforms equal: uniform mat4 persMatrix = glm::perspective(fov, aspect, 0.1f, 100.0f) uniform vec2 halfSizeNearPlane = glm::vec2(glm::tan(fov/2.0) * aspect, glm::tan(fov/2.0)) uniform vec2 depthrange = glm::vec2(0.0f, 1.0f) uniform sampler2D depth is a GL_DEPTH24_STENCIL8 texture which has depth values from an earlier pass (if I linearize it and set fragColor = vec3(linearizedZ), it shows up like it should, so nothing seems wrong there).
      I can confirm that it's wrong because it doesn't give me similar results to what saving position in the G-buffer or reconstructing using inverse matrices does.
      Is there something obvious I'm missing? To me the logic seems sound, and from the description on the Khronos wiki I can't see where I go wrong.
      Thanks!
  • Popular Now