Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 05 Jul 2007
Offline Last Active May 12 2015 01:04 AM

Topics I've Started

Binding consecutive attributes in buffer to non-consecutive indices

05 March 2015 - 10:38 AM

Wondering if this is an issue people deal with or if I'm complicating something unnecessarily. If I make a series of calls like the following during shader compilation / linking:

glBindAttribLocation(program, 0, "position");
glBindAttribLocation(program, 1, "normal");
glBindAttribLocation(program, 2, "color");

and I later have a vertex array that only contains position and color, I sort of expected the following to work:

glVertexAttribPointer(0, ...); // assume I've proven that all other parameters are valid
glVertexAttribPointer(2, ...);

However it doesn't. My triangles are rendered in the correct positions but their color is a solid red (when it should be a gradient). Binding color to attrib location 1 instead of 2 obviously solves the problem, but I was expecting that the first parameter to glVertexAttribPointer was just a logical relation to the index we previously bound to that attribute in the shader we're using -- not something that actually mandated the ordering of our data in memory (which I thought was the purpose of our attribute 'offset').


The motivation behind this sounds simple to me: I want to be able to define a standard ordering for all attributes and bind them to every shader during compilation. Is it really assumed that you always put your attributes in the same order in memory? If so, what's the point of specifying an offset to glVertexAttribPointer? I've seen suggestions on various sites and forums (including this one) that suggest you should explicitly set the order of all of your attributes. I just don't see how that would work in the case that you have some shaders that know about, say, Position and Normal while others only know about Position and Color.


Am I missing something? Is there a simpler solution?

How much thought to put into terrain gen for specific use case.

01 November 2012 - 01:03 AM

So this is more about terrain complexity than anything else. Basically I'm working on a prototype simcity like game, and I'm trying to work out exactly how I want terrain to behave before I get too far so I don't have to deal with it later when other things rely on it. Right now I'm doing my generation using noise functions and heightmaps the traditional way with one vertex per pixel. At the moment what has me most confused is the complexity of the terrain. At what point on modern hardware should I be worried about simplifying these meshes? For instance, is it acceptable to have a 1024x1024 vertex mesh that is in a VBO for use as the base terrain? Or should I be looking at methods of terrain simplification?

I've been toying with a terrain simplifying routine based on http://www.gamasutra.com/view/feature/131841/continuous_lod_terrain_meshing_.php but at the moment the result is really ugly because I haven't written any of the code for actually stitching the terrain back together afterward. Before I got into that, I wanted to make sure what I was doing was worth my time. I know I could just keep going with my 1024x1024 vertex mesh and wait until it's actually a problem before simplifying, but I feel like this is something that I'll regret not figuring out sooner rather than later. Another big factor in this decision is that I'd like to be able to modify the terrain at runtime with mininal effort (i.e. not sending the entire mesh every frame that it's been changed ideally), but I'm not sure how that's typically done alongside mesh simplification.

Are there any typical paths for this sort of thing when the terrain must be large, detailed and editable during runtime? Should I be looking at a paging system instead, with something like CDLOD working out what to feed into the renderer each frame instead? That seemed like overkill at first glance because I'm not planning a very large environment, but I'd still like it to be detailed.

Any comments and suggestions much appreciated!

Vertex Attributes, VAOs and Shaders

14 June 2012 - 09:19 PM

I'm trying to come up with a clever abstraction for shaders for my project and I'm stumbling on this issue concerning VAOs. So, when you've got a shader bound and you want to bind a VBO for rendering you ask the shader for the location of each attribute your VBO contains data for. This way it's just a step when you're binding a VBO / Shader, correct? If I've got this part right, then where do VAOs fit in with their ability to essentially cache the operations necessary to bind a VBO? To clarify, say I've got ShaderA bound, and I set up a VAO that records what is necessary to bind a VBO. I find out from the shader that "position" is at location 1, "color" is at location 2 and "normal" is at location 3. When I call glEnableVertexAttribute and similar operations, the VAO is recording simply the integer value going into these functions (1, 2, or 3 in this example) not the fact that I've queried the current shader to find out where they actually are. So now if I swap ShaderA for ShaderB and bind that same VAO for drawing, but ShaderB happens to have "position", "color" and "normal" at 2, 3, and 1 respectively, the VAO is going to bind the data in the wrong order and mess everything up.

All I can think of is to statically enforce the location of vertex attributes in the shader code, but this seems like the wrong thing to do. Is this my only option?

Acceleration structures for photon mapping on a GPU

12 April 2012 - 08:44 AM

I'm working with OpenCL and I've been trying to figure this out for a few days now and I've got nothing and don't quite understand how others do it. My main hang-up is this: if I store my photons like I do right now in a completely unsorted, 1-dimensional array (which is horribly slow) then I can define that array statically at program startup with something like SCENE_MAX_PHOTONS as the array size. That way it doesn't matter how dense the photons in specific areas are as long as the total number does not exceed the array limits. Now I want to implement some sort of acceleration structure (kd-tree, uniform grids, spatial hashing, etc.) but I don't see how with the static requirements in OpenCL this is possible at all. For example, if I had a uniform grid that I defined as having resolution 256x256x256, each grid cell would have to have a statically defined container for all the photons that land within it...but that means that there's an upper bound on how many photons can be in one area that isn't the same as the maximum number of photons allowed in the scene like it should be.

The same problem comes up with kd-trees when I try to think of them. Getting around the lack of recursion is tricky but not impossible for implementing one, but at the end of the day you're left with a leaf-node that's got to contain some of the photons, and that size has to be defined statically at the time that the kd-tree is built.

The only way I can think of to resolve this is to store photons as a linked-list structure at each leaf / grid cell / bucket. But there's no way that's going to be even remotely efficient on the GPU...right?

Poor OpenCL performance

21 March 2012 - 11:17 AM

I'm learning OpenCL for a project and so far I'm a little disappointed in the performance I've been getting with a really basic kernel so I'm hoping there's just something I'm missing. Here's the kernel that I'm using, all it does is calculate a gradient that's applied to a texture 640x480 resolution texture:

__kernel void debug(__write_only image2d_t resultTexture)
   int2 imgCoords = (int2)(get_global_id(0), get_global_id(1));
   int2 imgDims = (int2)(get_image_width(resultTexture), get_image_height(resultTexture));
   float4 imgVal = (float4)((float)imgCoords.x / (float)imgDims.x, (float)imgCoords.y / (float)imgDims.y, 0.0f, 1.0f);
   write_imagef(resultTexture, imgCoords, imgVal);

My video card is an Nvidia Geforce 285M GTX, with this kernel running in a release build (C++) I'm getting ~750 FPS. That's not low...but it's not as high as I was expecting. I figure calculating this gradient on this card in GLSL would probably give me quite a bit more. Now I know that GLSL is optimized for this sort of thing whereas raw OpenCL is not so it could just be that, but I thought I should make sure before I get into more complex things since I have plans to really tax this card once I figure out the intricacies of OpenCL. Also here is the code I'm using each frame to execute the kernel:

void CLContext::runKernelForScreen(int screenWidth, int screenHeight) {
cl_int result = CL_SUCCESS;
cl::Event ev;
cl::NDRange localRange = cl::NDRange(32, 16);
cl::NDRange globalRange = cl::NDRange(screenWidth, screenHeight);
//make sure OpenGL isn't using anything
//get the OpenGL shared objects
result = _commandQueue.enqueueAcquireGLObjects(&_glObjects, 0, &ev);
if(result != CL_SUCCESS) {
  throw OCException(LookupErrorString(result));

//set the argument to be the image
_primaryKernel.setArg(0, _screenTextureImage);
//enqueue operations to perform on the texture
result = _commandQueue.enqueueNDRangeKernel(_primaryKernel, cl::NullRange, globalRange, cl::NullRange, 0, &ev);
if (result != CL_SUCCESS) {
  throw OCException(LookupErrorString(result));
result = _commandQueue.enqueueReleaseGLObjects(&_glObjects, 0, &ev);
if (result != CL_SUCCESS) {
  throw OCException(LookupErrorString(result));

I profiled this and found that the bulk of the time is spent on the ev.wait() lines, and commenting those out doesn't do any direct harm but only yields around a 100 FPS gain, also at that point the execution time is almost entirely in _commandQueue.finish() for obvious reasons.

If it matters at all, I'm initializing the OpenGL texture as such:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, screenWidth, screenHeight, 0, GL_RGBA, GL_FLOAT, NULL);

And the respective OpenCL texture object is created with:
_screenTextureImage = cl::Image2DGL(_context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, textureId, &err);

Lastly in addition to profiling from the host side, I've also used gDebugger to try and see where the issue is, but the tool (at least as I'm capable of using it) doesn't yield much performance data other than to say that on average the kernel uses around 70% (17%) of the GPU to run. I've tried Parallel NSight as well, but it seems a bit daunting to me in it's complexity.

Hopefully I've preempted most of the questions concerning how I'm doing things and someone can make some sense of all this. Is my head on straight here? I don't think I'll be surprised either way if I hear that this is the kind of performance I should or should not expect from OpenCL on this hardware, but like I said I feel like I'd be getting a bit more at this stage from GLSL.