Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

187 Neutral

About ChugginWindex

  • Rank

Personal Information

  1. In case anyone was interested, I figured out the issue. As it seems to always go this was a user-error and the functionality works as expected when used correctly. I was trying to explicitly bind my attributes AFTER my GLSL program was linked, rather than before. My explicit choices were being discarded and the linker was using the automatic values instead.
  2. Wondering if this is an issue people deal with or if I'm complicating something unnecessarily. If I make a series of calls like the following during shader compilation / linking: glBindAttribLocation(program, 0, "position"); glBindAttribLocation(program, 1, "normal"); glBindAttribLocation(program, 2, "color"); and I later have a vertex array that only contains position and color, I sort of expected the following to work: glEnableVertexAttribArray(0); glEnableVertexAttribArray(2); glVertexAttribPointer(0, ...); // assume I've proven that all other parameters are valid glVertexAttribPointer(2, ...); However it doesn't. My triangles are rendered in the correct positions but their color is a solid red (when it should be a gradient). Binding color to attrib location 1 instead of 2 obviously solves the problem, but I was expecting that the first parameter to glVertexAttribPointer was just a logical relation to the index we previously bound to that attribute in the shader we're using -- not something that actually mandated the ordering of our data in memory (which I thought was the purpose of our attribute 'offset').   The motivation behind this sounds simple to me: I want to be able to define a standard ordering for all attributes and bind them to every shader during compilation. Is it really assumed that you always put your attributes in the same order in memory? If so, what's the point of specifying an offset to glVertexAttribPointer? I've seen suggestions on various sites and forums (including this one) that suggest you should explicitly set the order of all of your attributes. I just don't see how that would work in the case that you have some shaders that know about, say, Position and Normal while others only know about Position and Color.   Am I missing something? Is there a simpler solution?
  3. Nothing archaic, but I'd rather not waste my polygon throughput on low-end devices on a stupid terrain mesh if I don't have to. 1024^2 is sort of my upper limit. Like I said, I'm going for a simcity-type system. I'd like to be able to raise and lower sections of the terrain and place units/buildings/whatever where the smallest thing you can place takes up one cell of the underlying terrain grid. I've never worked on anything like this before so I'm not sure if I'll need a 1024x1024 grid to represent a decent amount of space to play around with or if I'll need something more like 512 or even 256 cells to a side. Ultimately I feel like I shouldn't have to limit myself too much here, so I'm looking for ideas and suggestions on how to manage terrain for this type of system. There's the obvious sample-from-heightmap approach that brute-forces it, which seems very slow but would otherwise make editing the terrain in real-time a breeze. There's also a quadtree-based approach that I've already semi-implemented for terrain mesh simplification, but the actual simplification process introduces tears in the terrain that require complex stitching that I haven't tackled yet. Also, the quadtree approach seems like it wouldn't be the best way to handle terrain I want to allow editing of. I've seen some resources online about paging the data in chunks that sounds promising, but I wasn't really planning on having a vast landscape such that you couldn't see all of it at one time if you wanted. I guess what I'd really like to know is how (modern) RTS engines handle this type of thing. I feel like most of the games I've played recently handle maps that are much higher resolution than 1024x1024 when it comes to editing terrain or placing objects.
  4. So this is more about terrain complexity than anything else. Basically I'm working on a prototype simcity like game, and I'm trying to work out exactly how I want terrain to behave before I get too far so I don't have to deal with it later when other things rely on it. Right now I'm doing my generation using noise functions and heightmaps the traditional way with one vertex per pixel. At the moment what has me most confused is the complexity of the terrain. At what point on modern hardware should I be worried about simplifying these meshes? For instance, is it acceptable to have a 1024x1024 vertex mesh that is in a VBO for use as the base terrain? Or should I be looking at methods of terrain simplification? I've been toying with a terrain simplifying routine based on http://www.gamasutra.com/view/feature/131841/continuous_lod_terrain_meshing_.php but at the moment the result is really ugly because I haven't written any of the code for actually stitching the terrain back together afterward. Before I got into that, I wanted to make sure what I was doing was worth my time. I know I could just keep going with my 1024x1024 vertex mesh and wait until it's actually a problem before simplifying, but I feel like this is something that I'll regret not figuring out sooner rather than later. Another big factor in this decision is that I'd like to be able to modify the terrain at runtime with mininal effort (i.e. not sending the entire mesh every frame that it's been changed ideally), but I'm not sure how that's typically done alongside mesh simplification. Are there any typical paths for this sort of thing when the terrain must be large, detailed and editable during runtime? Should I be looking at a paging system instead, with something like CDLOD working out what to feed into the renderer each frame instead? That seemed like overkill at first glance because I'm not planning a very large environment, but I'd still like it to be detailed. Any comments and suggestions much appreciated!
  5. I don't get what you mean. If you bind a frame buffer with a texture, and then write to your quad as a fullscreen quad, you'd have the contents of the shader of the quad rendered to said texture. Is this what you're asking for?
  6. ChugginWindex

    Vertex Attributes, VAOs and Shaders

    Heh, I was just coming back here to update and say that I spent the last hour or so implementing a static system just to see how it worked out, and I think I like it. Like I said before, it was clever to be able to do it dynamically with hash maps of attribute names and locations but not being able to use VAOs was a huge strike against it. The new code is also simpler. I'm glad to see someone else taking this approach as well as it gives me confidence that this is viable (not that I saw many other options like I said). I'm still interested in any other opinions / techniques people have got though!
  7. I'm trying to come up with a clever abstraction for shaders for my project and I'm stumbling on this issue concerning VAOs. So, when you've got a shader bound and you want to bind a VBO for rendering you ask the shader for the location of each attribute your VBO contains data for. This way it's just a step when you're binding a VBO / Shader, correct? If I've got this part right, then where do VAOs fit in with their ability to essentially cache the operations necessary to bind a VBO? To clarify, say I've got ShaderA bound, and I set up a VAO that records what is necessary to bind a VBO. I find out from the shader that "position" is at location 1, "color" is at location 2 and "normal" is at location 3. When I call glEnableVertexAttribute and similar operations, the VAO is recording simply the integer value going into these functions (1, 2, or 3 in this example) not the fact that I've queried the current shader to find out where they actually are. So now if I swap ShaderA for ShaderB and bind that same VAO for drawing, but ShaderB happens to have "position", "color" and "normal" at 2, 3, and 1 respectively, the VAO is going to bind the data in the wrong order and mess everything up. All I can think of is to statically enforce the location of vertex attributes in the shader code, but this seems like the wrong thing to do. Is this my only option?
  8. Yeah, unfortunately sorting is pretty complex to implement on OpenCL (not impossible though) and recursion is impossible. You can simulate it with a stack, but that's the kind of stuff I was hoping to avoid. I'm actually looking toward http://www.peterw.eu/photon-flipping.pdf instead. On page 43 he details a spatial hashing solution that exchanges high memory usage for not having to sort at all. It's actually really clever and seems to cater to the kind of work I'm doing at the moment nicely. I might still go with a kD-tree solution however, I just haven't made up my mind yet ;)
  9. Okay I think I understand. It seems I was confusing kD trees with Octrees and hadn't realized that there's a difference in what nodes actually represent. Thanks! I'm working on an implementation as you suggested right now, but I'm not entirely sure how to build the kD-Tree within OpenCL since I don't think there's atomic operations I can utilize to set left / right children =/
  10. But how does that eliminate the problem IN the leaf nodes of the tree? I know how much space the tree itself takes up, sure, but I don't know how many photons are contained between each leaf. Leaf A may have 2 photons stored within it, but leaf G may have 200. Since I can't allocate the memory dynamically within the kernel, I have to define something like LEAF_MAX_PHOTONS and use that in the node struct itself. If that max is 300 photons per leaf, and leaf L has 500 photons I'm losing information. Or am I not understanding your meaning?
  11. I'm working with OpenCL and I've been trying to figure this out for a few days now and I've got nothing and don't quite understand how others do it. My main hang-up is this: if I store my photons like I do right now in a completely unsorted, 1-dimensional array (which is horribly slow) then I can define that array statically at program startup with something like SCENE_MAX_PHOTONS as the array size. That way it doesn't matter how dense the photons in specific areas are as long as the total number does not exceed the array limits. Now I want to implement some sort of acceleration structure (kd-tree, uniform grids, spatial hashing, etc.) but I don't see how with the static requirements in OpenCL this is possible at all. For example, if I had a uniform grid that I defined as having resolution 256x256x256, each grid cell would have to have a statically defined container for all the photons that land within it...but that means that there's an upper bound on how many photons can be in one area that isn't the same as the maximum number of photons allowed in the scene like it should be. The same problem comes up with kd-trees when I try to think of them. Getting around the lack of recursion is tricky but not impossible for implementing one, but at the end of the day you're left with a leaf-node that's got to contain some of the photons, and that size has to be defined statically at the time that the kd-tree is built. The only way I can think of to resolve this is to store photons as a linked-list structure at each leaf / grid cell / bucket. But there's no way that's going to be even remotely efficient on the GPU...right?
  12. ChugginWindex

    OpenGL Poor OpenCL performance

    Sorry, I'm really new to OpenCL and a lot of what you said is lost on me... What do you mean pushing a hundred kernels onto the task queue at a time? Currently I'm just enqueueing an NDrange kernel and letting it execute with a workgroup equivalent to the dimensions of the texture I'm writing to. I don't think I understand what you mean. Also I can't quite see what you're talking about with the ev.wait() commands. Are you saying that they should be slowing it down or that they shouldn't be? Do I have too many or too few? I just figured out how to use the Cuda Toolkit's Visual Profiler, and that spat back that I have a very low compute utilization (~24%) so if my GPU is idle most of the time I'm sure I'm not getting the performance I could be in theory...I just don't quite get how to go about that. I've pretty much split up the tasks each work item carries out as much as possible (I'm using a different kernel than the one originally posted, but it's still fairly basic) so I'm unsure how to increase the amount that gets done at one time. I'm still using an NDRange of width x height and letting the drivers decide what local workgroup size to use, could that be the problem?
  13. ChugginWindex

    OpenGL Poor OpenCL performance

    Thanks for the reply. Well that sucks! I'm doing this for a independent study at my college and I don't really have the time to be restarting in CUDA . I'm gonna wait and see if anyone else weighs in with similar experience and then make a decision. Thanks again for the info, even though it's not at all what I wanted to hear haha.
  14. I'm learning OpenCL for a project and so far I'm a little disappointed in the performance I've been getting with a really basic kernel so I'm hoping there's just something I'm missing. Here's the kernel that I'm using, all it does is calculate a gradient that's applied to a texture 640x480 resolution texture: __kernel void debug(__write_only image2d_t resultTexture) { int2 imgCoords = (int2)(get_global_id(0), get_global_id(1)); int2 imgDims = (int2)(get_image_width(resultTexture), get_image_height(resultTexture)); float4 imgVal = (float4)((float)imgCoords.x / (float)imgDims.x, (float)imgCoords.y / (float)imgDims.y, 0.0f, 1.0f); write_imagef(resultTexture, imgCoords, imgVal); } My video card is an Nvidia Geforce 285M GTX, with this kernel running in a release build (C++) I'm getting ~750 FPS. That's not low...but it's not as high as I was expecting. I figure calculating this gradient on this card in GLSL would probably give me quite a bit more. Now I know that GLSL is optimized for this sort of thing whereas raw OpenCL is not so it could just be that, but I thought I should make sure before I get into more complex things since I have plans to really tax this card once I figure out the intricacies of OpenCL. Also here is the code I'm using each frame to execute the kernel: void CLContext::runKernelForScreen(int screenWidth, int screenHeight) { cl_int result = CL_SUCCESS; cl::Event ev; cl::NDRange localRange = cl::NDRange(32, 16); cl::NDRange globalRange = cl::NDRange(screenWidth, screenHeight); //make sure OpenGL isn't using anything glFlush(); //get the OpenGL shared objects result = _commandQueue.enqueueAcquireGLObjects(&_glObjects, 0, &ev); ev.wait(); if(result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } //set the argument to be the image _primaryKernel.setArg(0, _screenTextureImage); //enqueue operations to perform on the texture result = _commandQueue.enqueueNDRangeKernel(_primaryKernel, cl::NullRange, globalRange, cl::NullRange, 0, &ev); ev.wait(); if (result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } result = _commandQueue.enqueueReleaseGLObjects(&_glObjects, 0, &ev); ev.wait(); if (result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } _commandQueue.finish(); } I profiled this and found that the bulk of the time is spent on the ev.wait() lines, and commenting those out doesn't do any direct harm but only yields around a 100 FPS gain, also at that point the execution time is almost entirely in _commandQueue.finish() for obvious reasons. If it matters at all, I'm initializing the OpenGL texture as such: glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, screenWidth, screenHeight, 0, GL_RGBA, GL_FLOAT, NULL); And the respective OpenCL texture object is created with: _screenTextureImage = cl::Image2DGL(_context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, textureId, &err); Lastly in addition to profiling from the host side, I've also used gDebugger to try and see where the issue is, but the tool (at least as I'm capable of using it) doesn't yield much performance data other than to say that on average the kernel uses around [s]70%[/s] (17%) of the GPU to run. I've tried Parallel NSight as well, but it seems a bit daunting to me in it's complexity. Hopefully I've preempted most of the questions concerning how I'm doing things and someone can make some sense of all this. Is my head on straight here? I don't think I'll be surprised either way if I hear that this is the kind of performance I should or should not expect from OpenCL on this hardware, but like I said I feel like I'd be getting a bit more at this stage from GLSL.
  15. ChugginWindex

    OpenCL ports and wrappers

    Yeah after I posted this I did a bit more research and quickly discovered the awesomeness of the C++ bindings to OpenCL. Maybe that's my answer then. I suppose given the amount of experience I have with C++ vs Python, it'd probably be faster for me to go that route than practically learn a new language entirely.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!