• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jens Eckervogt
      Hello guys, 
       
      Please tell me! 
      How do I know? Why does wavefront not show for me?
      I already checked I have non errors yet.
      using OpenTK; using System; using System.Collections.Generic; using System.IO; namespace Tutorial_08.net.sourceskyboxer { public class WaveFrontLoader { private static List<Vector3> vertices; private static List<Vector2> textures; private static List<Vector3> normals; private static List<int> indices; private static float[] verticesArray; private static float[] normalsArray; private static float[] texturesArray; private static int[] indicesArray; private static string[] lines; public static RawModel LoadObjModel(string filename, Loader loader) { if (!File.Exists("Contents/" + filename + ".obj")) { throw new FileNotFoundException("Error: wavefront file doesn't exist path: " + filename + ".png"); } vertices = new List<Vector3>(); textures = new List<Vector2>(); normals = new List<Vector3>(); indices = new List<int>(); lines = File.ReadAllLines("Contents/" + filename + ".obj"); try { foreach (string line in lines) { if (line == "" || line.StartsWith("#")) continue; string[] token = line.Split(' '); switch(token[0]) { case ("o"): string o = token[1]; break; case "v": Vector3 vertex = new Vector3(float.Parse(token[1]), float.Parse(token[2]), float.Parse(token[3])); vertices.Add(vertex); break; case "vn": Vector3 normal = new Vector3(float.Parse(token[1]), float.Parse(token[2]), float.Parse(token[3])); normals.Add(normal); break; case "vt": Vector2 texture = new Vector2(float.Parse(token[1]), float.Parse(token[2])); textures.Add(texture); break; case "f": texturesArray = new float[vertices.Count * 2]; normalsArray = new float[vertices.Count * 3]; verticesArray = new float[vertices.Count * 3]; indicesArray = new int[indices.Count]; int vertexPointer = 0; foreach (Vector3 vex in vertices) { verticesArray[vertexPointer++] = vex.X; verticesArray[vertexPointer++] = vex.Y; verticesArray[vertexPointer++] = vex.Z; } for (int i = 0; i < indices.Count; i++) { indicesArray[i] = indices[i]; } break; } } } catch (FileNotFoundException f) { throw new FileNotFoundException($"OBJ file not found: {f.FileName}", f); } catch (ArgumentException ae) { throw new ArgumentException("OBJ file is damaged", ae); } return loader.loadToVAO(verticesArray, texturesArray, indicesArray); } } } And It have tried other method but it can't show for me.  I am mad now. Because any OpenTK developers won't help me.
      Please help me how do I fix.

      And my download (mega.nz) should it is original but I tried no success...
      - Add blend source and png file here I have tried tried,.....  
       
      PS: Why is our community not active? I wait very longer. Stop to lie me!
      Thanks !
    • By codelyoko373
      I wasn't sure if this would be the right place for a topic like this so sorry if it isn't.
      I'm currently working on a project for Uni using FreeGLUT to make a simple solar system simulation. I've got to the point where I've implemented all the planets and have used a Scene Graph to link them all together. The issue I'm having with now though is basically the planets and moons orbit correctly at their own orbit speeds.
      I'm not really experienced with using matrices for stuff like this so It's likely why I can't figure out how exactly to get it working. This is where I'm applying the transformation matrices, as well as pushing and popping them. This is within the Render function that every planet including the sun and moons will have and run.
      if (tag != "Sun") { glRotatef(orbitAngle, orbitRotation.X, orbitRotation.Y, orbitRotation.Z); } glPushMatrix(); glTranslatef(position.X, position.Y, position.Z); glRotatef(rotationAngle, rotation.X, rotation.Y, rotation.Z); glScalef(scale.X, scale.Y, scale.Z); glDrawElements(GL_TRIANGLES, mesh->indiceCount, GL_UNSIGNED_SHORT, mesh->indices); if (tag != "Sun") { glPopMatrix(); } The "If(tag != "Sun")" parts are my attempts are getting the planets to orbit correctly though it likely isn't the way I'm meant to be doing it. So I was wondering if someone would be able to help me? As I really don't have an idea on what I would do to get it working. Using the if statement is truthfully the closest I've got to it working but there are still weird effects like the planets orbiting faster then they should depending on the number of planets actually be updated/rendered.
    • By Jens Eckervogt
      Hello everyone, 
      I have problem with texture
      using System; using OpenTK; using OpenTK.Input; using OpenTK.Graphics; using OpenTK.Graphics.OpenGL4; using System.Drawing; using System.Reflection; namespace Tutorial_05 { class Game : GameWindow { private static int WIDTH = 1200; private static int HEIGHT = 720; private static KeyboardState keyState; private int vaoID; private int vboID; private int iboID; private Vector3[] vertices = { new Vector3(-0.5f, 0.5f, 0.0f), // V0 new Vector3(-0.5f, -0.5f, 0.0f), // V1 new Vector3(0.5f, -0.5f, 0.0f), // V2 new Vector3(0.5f, 0.5f, 0.0f) // V3 }; private Vector2[] texcoords = { new Vector2(0, 0), new Vector2(0, 1), new Vector2(1, 1), new Vector2(1, 0) }; private int[] indices = { 0, 1, 3, 3, 1, 2 }; private string vertsrc = @"#version 450 core in vec3 position; in vec2 textureCoords; out vec2 pass_textureCoords; void main(void) { gl_Position = vec4(position, 1.0); pass_textureCoords = textureCoords; }"; private string fragsrc = @"#version 450 core in vec2 pass_textureCoords; out vec4 out_color; uniform sampler2D textureSampler; void main(void) { out_color = texture(textureSampler, pass_textureCoords); }"; private int programID; private int vertexShaderID; private int fragmentShaderID; private int textureID; private Bitmap texsrc; public Game() : base(WIDTH, HEIGHT, GraphicsMode.Default, "Tutorial 05 - Texturing", GameWindowFlags.Default, DisplayDevice.Default, 4, 5, GraphicsContextFlags.Default) { } protected override void OnLoad(EventArgs e) { base.OnLoad(e); CursorVisible = true; GL.GenVertexArrays(1, out vaoID); GL.BindVertexArray(vaoID); GL.GenBuffers(1, out vboID); GL.BindBuffer(BufferTarget.ArrayBuffer, vboID); GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(vertices.Length * Vector3.SizeInBytes), vertices, BufferUsageHint.StaticDraw); GL.GenBuffers(1, out iboID); GL.BindBuffer(BufferTarget.ElementArrayBuffer, iboID); GL.BufferData(BufferTarget.ElementArrayBuffer, (IntPtr)(indices.Length * sizeof(int)), indices, BufferUsageHint.StaticDraw); vertexShaderID = GL.CreateShader(ShaderType.VertexShader); GL.ShaderSource(vertexShaderID, vertsrc); GL.CompileShader(vertexShaderID); fragmentShaderID = GL.CreateShader(ShaderType.FragmentShader); GL.ShaderSource(fragmentShaderID, fragsrc); GL.CompileShader(fragmentShaderID); programID = GL.CreateProgram(); GL.AttachShader(programID, vertexShaderID); GL.AttachShader(programID, fragmentShaderID); GL.LinkProgram(programID); // Loading texture from embedded resource texsrc = new Bitmap(Assembly.GetEntryAssembly().GetManifestResourceStream("Tutorial_05.example.png")); textureID = GL.GenTexture(); GL.BindTexture(TextureTarget.Texture2D, textureID); GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)All.Linear); GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)All.Linear); GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, texsrc.Width, texsrc.Height, 0, PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); System.Drawing.Imaging.BitmapData bitmap_data = texsrc.LockBits(new Rectangle(0, 0, texsrc.Width, texsrc.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppRgb); GL.TexSubImage2D(TextureTarget.Texture2D, 0, 0, 0, texsrc.Width, texsrc.Height, PixelFormat.Bgra, PixelType.UnsignedByte, bitmap_data.Scan0); texsrc.UnlockBits(bitmap_data); GL.Enable(EnableCap.Texture2D); GL.BufferData(BufferTarget.TextureBuffer, (IntPtr)(texcoords.Length * Vector2.SizeInBytes), texcoords, BufferUsageHint.StaticDraw); GL.BindAttribLocation(programID, 0, "position"); GL.BindAttribLocation(programID, 1, "textureCoords"); } protected override void OnResize(EventArgs e) { base.OnResize(e); GL.Viewport(0, 0, ClientRectangle.Width, ClientRectangle.Height); } protected override void OnUpdateFrame(FrameEventArgs e) { base.OnUpdateFrame(e); keyState = Keyboard.GetState(); if (keyState.IsKeyDown(Key.Escape)) { Exit(); } } protected override void OnRenderFrame(FrameEventArgs e) { base.OnRenderFrame(e); // Prepare for background GL.Clear(ClearBufferMask.ColorBufferBit); GL.ClearColor(Color4.Red); // Draw traingles GL.EnableVertexAttribArray(0); GL.EnableVertexAttribArray(1); GL.BindVertexArray(vaoID); GL.UseProgram(programID); GL.BindBuffer(BufferTarget.ArrayBuffer, vboID); GL.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, 0, IntPtr.Zero); GL.ActiveTexture(TextureUnit.Texture0); GL.BindTexture(TextureTarget.Texture3D, textureID); GL.BindBuffer(BufferTarget.ElementArrayBuffer, iboID); GL.DrawElements(BeginMode.Triangles, indices.Length, DrawElementsType.UnsignedInt, 0); GL.DisableVertexAttribArray(0); GL.DisableVertexAttribArray(1); SwapBuffers(); } protected override void OnClosed(EventArgs e) { base.OnClosed(e); GL.DeleteVertexArray(vaoID); GL.DeleteBuffer(vboID); } } } I can not remember where do I add GL.Uniform2();
    • By Jens Eckervogt
      Hello everyone
      For @80bserver8 nice job - I have found Google search. How did you port from Javascript WebGL to C# OpenTK.?
      I have been searched Google but it shows f***ing Unity 3D. I really want know how do I understand I want start with OpenTK But I want know where is porting of Javascript and C#?
       
      Thanks!
    • By mike44
      Hi
      I draw in a OpenGL framebuffer. All is fine but it eats FPS (frames per second), hence I wonder if I could execute the framebuffer drawing only every 5-10th loop or so?
      Many thanks
       
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Poor OpenCL performance

This topic is 2215 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm learning OpenCL for a project and so far I'm a little disappointed in the performance I've been getting with a really basic kernel so I'm hoping there's just something I'm missing. Here's the kernel that I'm using, all it does is calculate a gradient that's applied to a texture 640x480 resolution texture:

__kernel void debug(__write_only image2d_t resultTexture)
{
int2 imgCoords = (int2)(get_global_id(0), get_global_id(1));
int2 imgDims = (int2)(get_image_width(resultTexture), get_image_height(resultTexture));

float4 imgVal = (float4)((float)imgCoords.x / (float)imgDims.x, (float)imgCoords.y / (float)imgDims.y, 0.0f, 1.0f);
write_imagef(resultTexture, imgCoords, imgVal);
}


My video card is an Nvidia Geforce 285M GTX, with this kernel running in a release build (C++) I'm getting ~750 FPS. That's not low...but it's not as high as I was expecting. I figure calculating this gradient on this card in GLSL would probably give me quite a bit more. Now I know that GLSL is optimized for this sort of thing whereas raw OpenCL is not so it could just be that, but I thought I should make sure before I get into more complex things since I have plans to really tax this card once I figure out the intricacies of OpenCL. Also here is the code I'm using each frame to execute the kernel:

void CLContext::runKernelForScreen(int screenWidth, int screenHeight) {
cl_int result = CL_SUCCESS;
cl::Event ev;
cl::NDRange localRange = cl::NDRange(32, 16);
cl::NDRange globalRange = cl::NDRange(screenWidth, screenHeight);
//make sure OpenGL isn't using anything
glFlush();
//get the OpenGL shared objects
result = _commandQueue.enqueueAcquireGLObjects(&_glObjects, 0, &ev);
ev.wait();
if(result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}

//set the argument to be the image
_primaryKernel.setArg(0, _screenTextureImage);
//enqueue operations to perform on the texture
result = _commandQueue.enqueueNDRangeKernel(_primaryKernel, cl::NullRange, globalRange, cl::NullRange, 0, &ev);
ev.wait();
if (result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}
result = _commandQueue.enqueueReleaseGLObjects(&_glObjects, 0, &ev);
ev.wait();
if (result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}
_commandQueue.finish();
}


I profiled this and found that the bulk of the time is spent on the ev.wait() lines, and commenting those out doesn't do any direct harm but only yields around a 100 FPS gain, also at that point the execution time is almost entirely in _commandQueue.finish() for obvious reasons.

If it matters at all, I'm initializing the OpenGL texture as such:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, screenWidth, screenHeight, 0, GL_RGBA, GL_FLOAT, NULL);

And the respective OpenCL texture object is created with:
_screenTextureImage = cl::Image2DGL(_context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, textureId, &err);

Lastly in addition to profiling from the host side, I've also used gDebugger to try and see where the issue is, but the tool (at least as I'm capable of using it) doesn't yield much performance data other than to say that on average the kernel uses around [s]70%[/s] (17%) of the GPU to run. I've tried Parallel NSight as well, but it seems a bit daunting to me in it's complexity.

Hopefully I've preempted most of the questions concerning how I'm doing things and someone can make some sense of all this. Is my head on straight here? I don't think I'll be surprised either way if I hear that this is the kind of performance I should or should not expect from OpenCL on this hardware, but like I said I feel like I'd be getting a bit more at this stage from GLSL.

Share this post


Link to post
Share on other sites
Advertisement
I have a similar GPU and have experienced similar performance when using OpenCL. I've seen CUDA use about half as much time in some cases for the exact same (simple) kernel! You can also verify this by looking at Nvidia's compute SDK. they have a few samples that they have implemented in OpenCL, CUDA, and DirectCompute. The OpenCL versions are always the slowest for me.

I can't imagine whats going on behind the scenes, but I eventually stopped trying to speed up my OpenCL and switched to using pixels shaders for just about everything. OpenGL 4.2 now supports atomic operations in shaders. So now you can almost do the same things in a shader that you would do in OpenCL, except it runs much faster because of who knows why haha.

Share this post


Link to post
Share on other sites
Thanks for the reply. Well that sucks! I'm doing this for a independent study at my college and I don't really have the time to be restarting in CUDA :( . I'm gonna wait and see if anyone else weighs in with similar experience and then make a decision. Thanks again for the info, even though it's not at all what I wanted to hear haha.

Share this post


Link to post
Share on other sites
The fact that commenting out [font=courier new,courier,monospace]ev.wait()[/font] which in fact does "nothing" gives a huge boost suggests that OpenCL as such is not really to blame, but it is a scheduling thing. Waiting on an event twice means being taken off the ready-to-run list and put on it again when the event is set, and being scheduled again when the next time slice becomes available. If you do this thousands of times and time slices are, say, 15-20 milliseconds, this can be a long, long time.

Have you tried increasing the schdeuler's frequency (I'm not sure how to do it under any other OS but Windows, where that would be [font=courier new,courier,monospace]timeBeginPeriod(1)[/font])?

Alternatively, push a hundred kernels onto the task queue and let them execute, then block in [font=courier new,courier,monospace]finish()[font=arial,helvetica,sans-serif] and see how [/font][/font]long it took all of them to run. I'm sure it will be much faster. You're not benchmarking OpenCL here, you're benchmarking waiting on an event...

Share this post


Link to post
Share on other sites

The fact that commenting out [font=courier new,courier,monospace]ev.wait()[/font] which in fact does "nothing" gives a huge boost suggests that OpenCL as such is not really to blame, but it is a scheduling thing. Waiting on an event twice means being taken off the ready-to-run list and put on it again when the event is set, and being scheduled again when the next time slice becomes available. If you do this thousands of times and time slices are, say, 15-20 milliseconds, this can be a long, long time.

Have you tried increasing the schdeuler's frequency (I'm not sure how to do it under any other OS but Windows, where that would be [font=courier new,courier,monospace]timeBeginPeriod(1)[/font])?

Alternatively, push a hundred kernels onto the task queue and let them execute, then block in [font=courier new,courier,monospace]finish()[font=arial,helvetica,sans-serif] and see how [/font][/font]long it took all of them to run. I'm sure it will be much faster. You're not benchmarking OpenCL here, you're benchmarking waiting on an event...


Sorry, I'm really new to OpenCL and a lot of what you said is lost on me...

What do you mean pushing a hundred kernels onto the task queue at a time? Currently I'm just enqueueing an NDrange kernel and letting it execute with a workgroup equivalent to the dimensions of the texture I'm writing to. I don't think I understand what you mean.

Also I can't quite see what you're talking about with the ev.wait() commands. Are you saying that they should be slowing it down or that they shouldn't be? Do I have too many or too few? I just figured out how to use the Cuda Toolkit's Visual Profiler, and that spat back that I have a very low compute utilization (~24%) so if my GPU is idle most of the time I'm sure I'm not getting the performance I could be in theory...I just don't quite get how to go about that. I've pretty much split up the tasks each work item carries out as much as possible (I'm using a different kernel than the one originally posted, but it's still fairly basic) so I'm unsure how to increase the amount that gets done at one time. I'm still using an NDRange of width x height and letting the drivers decide what local workgroup size to use, could that be the problem?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement