Jump to content
  • Advertisement

CommanderLake

Member
  • Content Count

    8
  • Joined

  • Last visited

Community Reputation

108 Neutral

About CommanderLake

  • Rank
    Newbie

Personal Information

  • Interests
    |programmer|
  1. I'd love to add a 3rd dimension and try it on my Vive, how complex would such a thing be using OpenVR just drawing simple points?
  2. I get the same framerate with 65536 points on an old nVIDIA CUDA demo on an overclocked water cooled GTX1080 so the Xeons are doing about 25% as many FLOPS as the 1080 which is impressive. I've seen the FMM mentioned a lot, sounds like a nightmare to implement without a code sample, how accurate is it?
  3. Oh well I've changed it now. You are right, I was using separate source and destination buffers up to a point and I didn't realise what I was doing when I changed it! Its fixed and its faster: #include "PPC.h" #include <thread> static const float G = 0.0000001F; const int count = 4096; __declspec(align(64)) float px0[count]; __declspec(align(64)) float py0[count]; __declspec(align(64)) float px1[count]; __declspec(align(64)) float py1[count]; float* pointsx0 = px0; float* pointsy0 = py0; float* pointsx1 = px1; float* pointsy1 = py1; void SetData(float* x, float* y){ memcpy(px0, x, count * sizeof(float)); memcpy(py0, y, count * sizeof(float)); memcpy(px1, x, count * sizeof(float)); memcpy(py1, y, count * sizeof(float)); } void Compute(float* points, float* velx, float* vely, long pcount, float aspect, float zoom) { #pragma omp parallel for for (auto i = 0; i < count; ++i) { auto forcex = 0.0F; auto forcey = 0.0F; for (auto j = 0; j < count; ++j) { if(j == i)continue; const auto distx = pointsx0[i] - pointsx0[j]; const auto disty = pointsy0[i] - pointsy0[j]; const auto force = G / (distx * distx + disty * disty); forcex += distx * force; forcey += disty * force; } pointsx1[i] = pointsx0[i] + (velx[i] -= forcex); pointsy1[i] = pointsy0[i] + (vely[i] -= forcey); if (zoom != 1) { points[i * 2] = pointsx1[i] * zoom / aspect; points[i * 2 + 1] = pointsy1[i] * zoom; } else { points[i * 2] = pointsx1[i] / aspect; points[i * 2 + 1] = pointsy1[i]; } } const auto pxt = pointsx0; const auto pyt = pointsy0; pointsx0 = pointsx1; pointsy0 = pointsy1; pointsx1 = pxt; pointsy1 = pyt; } The code is now SO fast i can do 32768 points at 58 FPS on a server with 2x 14 core Xeon E5 2695 v3's and its absolutely mesmerising!
  4. I should have mentioned it does the same thing without multithreading, I did the same thing in the multithreaded one that works properly, I see what you mean though and thanks for the tips. I will update the OP with the changes I have made.
  5. I've been experimenting with my own n-body simulation for some time and I recently discovered how to optimize it for efficient multithreading and vectorization with the Intel compiler. It did exactly the same thing after making it multithreaded and scaled very well on my ancient i7 3820 (4.3GHz). Then I changed the interleaved xy coordinates to separate arrays for x and y to eliminate the strided loads to improve AVX scaling and copy the coordinates to an interleaved array for OpenTK to render as points. Now the physics is all wrong, the points form clumps that interact with each other but they are unusually dense and accelerate faster than they decelerate causing the clumps to randomly fly off into the distance and after several seconds I get a NaN where 2 points somehow occupy exactly the same x and y float coordinates. This is the C++ DLL: #include "PPC.h" #include <thread> static const float G = 0.0000001F; const int count = 4096; __declspec(align(64)) float pointsx[count]; __declspec(align(64)) float pointsy[count]; void SetData(float* x, float* y){ memcpy(pointsx, x, count * sizeof(float)); memcpy(pointsy, y, count * sizeof(float)); } void Compute(float* points, float* velx, float* vely, long pcount, float aspect, float zoom) { #pragma omp parallel for for (auto i = 0; i < count; ++i) { auto forcex = 0.0F; auto forcey = 0.0F; for (auto j = 0; j < count; ++j) { if(j == i)continue; const auto distx = pointsx[i] - pointsx[j]; const auto disty = pointsy[i] - pointsy[j]; //if(px != px) continue; //most efficient way to avoid a NaN failure const auto force = G / (distx * distx + disty * disty); forcex += distx * force; forcey += disty * force; } pointsx[i] += velx[i] -= forcex; pointsy[i] += vely[i] -= forcey; if (zoom != 1) { points[i * 2] = pointsx[i] * zoom / aspect; points[i * 2 + 1] = pointsy[i] * zoom; } else { points[i * 2] = pointsx[i] / aspect; points[i * 2 + 1] = pointsy[i]; } /*points[i * 2] = pointsx[i]; points[i * 2 + 1] = pointsy[i];*/ } } This is the relevant part of the C# OpenTK GameWindow: private void PhysicsLoop(){ while(true){ if(stop){ for(var i = 0; i < pcount; ++i) { velx[i] = vely[i] = 0F; } } if(reset){ reset = false; var r = new Random(); for(var i = 0; i < Startcount; ++i){ do{ pointsx[i] = (float)(r.NextDouble()*2.0F - 1.0F); pointsy[i] = (float)(r.NextDouble()*2.0F - 1.0F); } while(pointsx[i]*pointsx[i] + pointsy[i]*pointsy[i] > 1.0F); velx[i] = vely[i] = 0.0F; } NativeMethods.SetData(pointsx, pointsy); pcount = Startcount; buffersize = (IntPtr)(pcount*8); } are.WaitOne(); NativeMethods.Compute(points0, velx, vely, pcount, aspect, zoom); var pointstemp = points0; points0 = points1; points1 = pointstemp; are1.Set(); } } protected override void OnRenderFrame(FrameEventArgs e){ GL.Clear(ClearBufferMask.ColorBufferBit); GL.EnableVertexAttribArray(0); GL.BindBuffer(BufferTarget.ArrayBuffer, vbo); mre1.Wait(); are1.WaitOne(); GL.BufferData(BufferTarget.ArrayBuffer, buffersize, points1, BufferUsageHint.StaticDraw); are.Set(); GL.VertexAttribPointer(0, 2, VertexAttribPointerType.Float, false, 0, 0); GL.DrawArrays(PrimitiveType.Points, 0, pcount); GL.DisableVertexAttribArray(0); SwapBuffers(); } These are the array declarations: private const int Startcount = 4096; private readonly float[] pointsx = new float[Startcount]; private readonly float[] pointsy = new float[Startcount]; private float[] points0 = new float[Startcount*2]; private float[] points1 = new float[Startcount*2]; private readonly float[] velx = new float[Startcount]; private readonly float[] vely = new float[Startcount]; Edit 0: It seems that adding 3 zeros to G increases the accuracy of the simulation but I'm at a loss as to why its different without interleaved coordinates. Edit 1: I somehow achieved an 8.3x performance increase with AVX over scalar with the new code above!
  6. CommanderLake

    Converting to OpenGL 4

    I'd love to be able to use Vulcan but from what I've read its difficult even for an experienced graphics programmer to use and I can barely get my head around my bit of GL.
  7. CommanderLake

    Converting to OpenGL 4

    Thanks for the advice, Is there by any chance a more direct way of drawing pixel data without scaling etc. with modern OpenGL?
  8. CommanderLake

    Converting to OpenGL 4

    I use an OpenTK GLcontrol as a faster alternative to GDI+ for displaying images and I would like to convert the old deprecated code to OpenGL 4 but I'm having trouble adapting code samples in guides I have found to my needs so would anyone kindly show me how to convert this: GL.TexSubImage2D(TextureTarget.Texture2D, 0, 0, 0, rez.Width, rez.Height, PixelFormat.Bgr, PixelType.UnsignedByte, currentframe); GL.MatrixMode(MatrixMode.Projection); GL.LoadIdentity(); GL.Ortho(-1.0, 1.0, -1.0, 1.0, -1.0, 1.0); GL.MatrixMode(MatrixMode.Modelview); GL.LoadIdentity(); GL.Enable(EnableCap.Texture2D); GL.BindTexture(TextureTarget.Texture2D, tex); GL.Begin(PrimitiveType.Quads); GL.TexCoord2(0.0f, 1.0f); GL.Vertex2(-1.0f, -1.0f); GL.TexCoord2(1.0f, 1.0f); GL.Vertex2(1.0f, -1.0f); GL.TexCoord2(1.0f, 0.0f); GL.Vertex2(1.0f, 1.0f); GL.TexCoord2(0.0f, 0.0f); GL.Vertex2(-1.0f, 1.0f); GL.End(); OGLControl.SwapBuffers();
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!