• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.


  • Content count

  • Joined

  • Last visited

Community Reputation

187 Neutral

About ChugginWindex

  • Rank

Personal Information

  1. In case anyone was interested, I figured out the issue. As it seems to always go this was a user-error and the functionality works as expected when used correctly. I was trying to explicitly bind my attributes AFTER my GLSL program was linked, rather than before. My explicit choices were being discarded and the linker was using the automatic values instead.
  2. Wondering if this is an issue people deal with or if I'm complicating something unnecessarily. If I make a series of calls like the following during shader compilation / linking: glBindAttribLocation(program, 0, "position"); glBindAttribLocation(program, 1, "normal"); glBindAttribLocation(program, 2, "color"); and I later have a vertex array that only contains position and color, I sort of expected the following to work: glEnableVertexAttribArray(0); glEnableVertexAttribArray(2); glVertexAttribPointer(0, ...); // assume I've proven that all other parameters are valid glVertexAttribPointer(2, ...); However it doesn't. My triangles are rendered in the correct positions but their color is a solid red (when it should be a gradient). Binding color to attrib location 1 instead of 2 obviously solves the problem, but I was expecting that the first parameter to glVertexAttribPointer was just a logical relation to the index we previously bound to that attribute in the shader we're using -- not something that actually mandated the ordering of our data in memory (which I thought was the purpose of our attribute 'offset').   The motivation behind this sounds simple to me: I want to be able to define a standard ordering for all attributes and bind them to every shader during compilation. Is it really assumed that you always put your attributes in the same order in memory? If so, what's the point of specifying an offset to glVertexAttribPointer? I've seen suggestions on various sites and forums (including this one) that suggest you should explicitly set the order of all of your attributes. I just don't see how that would work in the case that you have some shaders that know about, say, Position and Normal while others only know about Position and Color.   Am I missing something? Is there a simpler solution?
  3. [quote name='Hodgman' timestamp='1351766191' post='4996137'] I'm assuming your 1024[sup]^2[/sup] verts make up a grid of 1023*1023 quads, which is about 2 million triangles. That's a lot, but not an infeasible amount for modern GPUs. Profile it and see how fast this brute-force method takes [img]http://public.gamedev.net//public/style_emoticons/default/wink.png[/img] What kind of hardware are you targeting as your minimum spec? [/quote] Nothing archaic, but I'd rather not waste my polygon throughput on low-end devices on a stupid terrain mesh if I don't have to. 1024^2 is sort of my upper limit. Like I said, I'm going for a simcity-type system. I'd like to be able to raise and lower sections of the terrain and place units/buildings/whatever where the smallest thing you can place takes up one cell of the underlying terrain grid. I've never worked on anything like this before so I'm not sure if I'll need a 1024x1024 grid to represent a decent amount of space to play around with or if I'll need something more like 512 or even 256 cells to a side. Ultimately I feel like I shouldn't have to limit myself too much here, so I'm looking for ideas and suggestions on how to manage terrain for this type of system. There's the obvious sample-from-heightmap approach that brute-forces it, which seems very slow but would otherwise make editing the terrain in real-time a breeze. There's also a quadtree-based approach that I've already semi-implemented for terrain mesh simplification, but the actual simplification process introduces tears in the terrain that require complex stitching that I haven't tackled yet. Also, the quadtree approach seems like it wouldn't be the best way to handle terrain I want to allow editing of. I've seen some resources online about paging the data in chunks that sounds promising, but I wasn't really planning on having a vast landscape such that you couldn't see all of it at one time if you wanted. I guess what I'd really like to know is how (modern) RTS engines handle this type of thing. I feel like most of the games I've played recently handle maps that are much higher resolution than 1024x1024 when it comes to editing terrain or placing objects.
  4. So this is more about terrain complexity than anything else. Basically I'm working on a prototype simcity like game, and I'm trying to work out exactly how I want terrain to behave before I get too far so I don't have to deal with it later when other things rely on it. Right now I'm doing my generation using noise functions and heightmaps the traditional way with one vertex per pixel. At the moment what has me most confused is the complexity of the terrain. At what point on modern hardware should I be worried about simplifying these meshes? For instance, is it acceptable to have a 1024x1024 vertex mesh that is in a VBO for use as the base terrain? Or should I be looking at methods of terrain simplification? I've been toying with a terrain simplifying routine based on [url="http://www.gamasutra.com/view/feature/131841/continuous_lod_terrain_meshing_.php"]http://www.gamasutra.com/view/feature/131841/continuous_lod_terrain_meshing_.php[/url] but at the moment the result is really ugly because I haven't written any of the code for actually stitching the terrain back together afterward. Before I got into that, I wanted to make sure what I was doing was worth my time. I know I could just keep going with my 1024x1024 vertex mesh and wait until it's actually a problem before simplifying, but I feel like this is something that I'll regret not figuring out sooner rather than later. Another big factor in this decision is that I'd like to be able to modify the terrain at runtime with mininal effort (i.e. not sending the entire mesh every frame that it's been changed ideally), but I'm not sure how that's typically done alongside mesh simplification. Are there any typical paths for this sort of thing when the terrain must be large, detailed and editable during runtime? Should I be looking at a paging system instead, with something like CDLOD working out what to feed into the renderer each frame instead? That seemed like overkill at first glance because I'm not planning a very large environment, but I'd still like it to be detailed. Any comments and suggestions much appreciated!
  5. I don't get what you mean. If you bind a frame buffer with a texture, and then write to your quad as a fullscreen quad, you'd have the contents of the shader of the quad rendered to said texture. Is this what you're asking for?
  6. [quote name='clb' timestamp='1339739842' post='4949446'] So, essentially, in my case, I am able to get around the potential problems by rules of convention. [/quote] Heh, I was just coming back here to update and say that I spent the last hour or so implementing a static system just to see how it worked out, and I think I like it. Like I said before, it was clever to be able to do it dynamically with hash maps of attribute names and locations but not being able to use VAOs was a huge strike against it. The new code is also simpler. I'm glad to see someone else taking this approach as well as it gives me confidence that this is viable (not that I saw many other options like I said). I'm still interested in any other opinions / techniques people have got though!
  7. I'm trying to come up with a clever abstraction for shaders for my project and I'm stumbling on this issue concerning VAOs. So, when you've got a shader bound and you want to bind a VBO for rendering you ask the shader for the location of each attribute your VBO contains data for. This way it's just a step when you're binding a VBO / Shader, correct? If I've got this part right, then where do VAOs fit in with their ability to essentially cache the operations necessary to bind a VBO? To clarify, say I've got ShaderA bound, and I set up a VAO that records what is necessary to bind a VBO. I find out from the shader that "position" is at location 1, "color" is at location 2 and "normal" is at location 3. When I call glEnableVertexAttribute and similar operations, the VAO is recording simply the integer value going into these functions (1, 2, or 3 in this example) not the fact that I've queried the current shader to find out where they actually are. So now if I swap ShaderA for ShaderB and bind that same VAO for drawing, but ShaderB happens to have "position", "color" and "normal" at 2, 3, and 1 respectively, the VAO is going to bind the data in the wrong order and mess everything up. All I can think of is to statically enforce the location of vertex attributes in the shader code, but this seems like the wrong thing to do. Is this my only option?
  8. [quote name='Pragma' timestamp='1334275325' post='4930753'] I'm not really familiar with OpenCL, but in my implementation I just sorted all the photons along one axis (say the x axis). Then the middle photon (call it n) becomes the root of the tree. You can then recurse, sorting 0..n-1 and n+1 .. N-1 along the next axis (in this case y). That way the whole KD Tree building algorithm reduces to sorting, which presumably has some fast OpenCL implementations. [/quote] Yeah, unfortunately sorting is pretty complex to implement on OpenCL (not impossible though) and recursion is impossible. You can simulate it with a stack, but that's the kind of stuff I was hoping to avoid. I'm actually looking toward [url="http://www.peterw.eu/photon-flipping.pdf"]http://www.peterw.eu/photon-flipping.pdf[/url] instead. On page 43 he details a spatial hashing solution that exchanges high memory usage for not having to sort at all. It's actually really clever and seems to cater to the kind of work I'm doing at the moment nicely. I might still go with a kD-tree solution however, I just haven't made up my mind yet ;)
  9. [quote name='Pragma' timestamp='1334265393' post='4930703'] You don't store the photons only in the leaf nodes. Every node of the tree IS a photon - i.e. each node of the tree has exactly one photon. The number of nodes in your tree is equal to SCENE_MAX_PHOTONS and you can store them in a preallocated array. [/quote] Okay I think I understand. It seems I was confusing kD trees with Octrees and hadn't realized that there's a difference in what nodes actually represent. Thanks! I'm working on an implementation as you suggested right now, but I'm not entirely sure how to build the kD-Tree within OpenCL since I don't think there's atomic operations I can utilize to set left / right children =/
  10. But how does that eliminate the problem IN the leaf nodes of the tree? I know how much space the tree itself takes up, sure, but I don't know how many photons are contained between each leaf. Leaf A may have 2 photons stored within it, but leaf G may have 200. Since I can't allocate the memory dynamically within the kernel, I have to define something like LEAF_MAX_PHOTONS and use that in the node struct itself. If that max is 300 photons per leaf, and leaf L has 500 photons I'm losing information. Or am I not understanding your meaning?
  11. I'm working with OpenCL and I've been trying to figure this out for a few days now and I've got nothing and don't quite understand how others do it. My main hang-up is this: if I store my photons like I do right now in a completely unsorted, 1-dimensional array (which is horribly slow) then I can define that array statically at program startup with something like SCENE_MAX_PHOTONS as the array size. That way it doesn't matter how dense the photons in specific areas are as long as the total number does not exceed the array limits. Now I want to implement some sort of acceleration structure (kd-tree, uniform grids, spatial hashing, etc.) but I don't see how with the static requirements in OpenCL this is possible at all. For example, if I had a uniform grid that I defined as having resolution 256x256x256, each grid cell would have to have a statically defined container for all the photons that land within it...but that means that there's an upper bound on how many photons can be in one area that [i]isn't[/i] the same as the maximum number of photons allowed in the scene like it should be. The same problem comes up with kd-trees when I try to think of them. Getting around the lack of recursion is tricky but not impossible for implementing one, but at the end of the day you're left with a leaf-node that's got to [i]contain[/i] some of the photons, and that size has to be defined statically at the time that the kd-tree is built. The only way I can think of to resolve this is to store photons as a linked-list structure at each leaf / grid cell / bucket. But there's no way that's going to be even remotely efficient on the GPU...right?
  12. OpenGL

    [quote name='samoth' timestamp='1332765173' post='4925324'] The fact that commenting out [font=courier new,courier,monospace]ev.wait()[/font] which in fact does "nothing" gives a huge boost suggests that OpenCL as such is not really to blame, but it is a scheduling thing. Waiting on an event twice means being taken off the ready-to-run list and put on it again when the event is set, and being scheduled again when the next time slice becomes available. If you do this thousands of times and time slices are, say, 15-20 milliseconds, this can be a long, long time. Have you tried increasing the schdeuler's frequency (I'm not sure how to do it under any other OS but Windows, where that would be [font=courier new,courier,monospace]timeBeginPeriod(1)[/font])? Alternatively, push a hundred kernels onto the task queue and let them execute, then block in [font=courier new,courier,monospace]finish()[font=arial,helvetica,sans-serif] and see how [/font][/font]long it took all of them to run. I'm sure it will be much faster. You're not benchmarking OpenCL here, you're benchmarking waiting on an event... [/quote] Sorry, I'm really new to OpenCL and a lot of what you said is lost on me... What do you mean pushing a hundred kernels onto the task queue at a time? Currently I'm just enqueueing an NDrange kernel and letting it execute with a workgroup equivalent to the dimensions of the texture I'm writing to. I don't think I understand what you mean. Also I can't quite see what you're talking about with the ev.wait() commands. Are you saying that they should be slowing it down or that they shouldn't be? Do I have too many or too few? I just figured out how to use the Cuda Toolkit's Visual Profiler, and that spat back that I have a very low compute utilization (~24%) so if my GPU is idle most of the time I'm sure I'm not getting the performance I could be in theory...I just don't quite get how to go about that. I've pretty much split up the tasks each work item carries out as much as possible (I'm using a different kernel than the one originally posted, but it's still fairly basic) so I'm unsure how to increase the amount that gets done at one time. I'm still using an NDRange of width x height and letting the drivers decide what local workgroup size to use, could that be the problem?
  13. OpenGL

    Thanks for the reply. Well that sucks! I'm doing this for a independent study at my college and I don't really have the time to be restarting in CUDA . I'm gonna wait and see if anyone else weighs in with similar experience and then make a decision. Thanks again for the info, even though it's not at all what I wanted to hear haha.
  14. I'm learning OpenCL for a project and so far I'm a little disappointed in the performance I've been getting with a really basic kernel so I'm hoping there's just something I'm missing. Here's the kernel that I'm using, all it does is calculate a gradient that's applied to a texture 640x480 resolution texture: [code]__kernel void debug(__write_only image2d_t resultTexture) { int2 imgCoords = (int2)(get_global_id(0), get_global_id(1)); int2 imgDims = (int2)(get_image_width(resultTexture), get_image_height(resultTexture)); float4 imgVal = (float4)((float)imgCoords.x / (float)imgDims.x, (float)imgCoords.y / (float)imgDims.y, 0.0f, 1.0f); write_imagef(resultTexture, imgCoords, imgVal); }[/code] My video card is an Nvidia Geforce 285M GTX, with this kernel running in a release build (C++) I'm getting ~750 FPS. That's not low...but it's not as high as I was expecting. I figure calculating this gradient on this card in GLSL would probably give me quite a bit more. Now I know that GLSL is optimized for this sort of thing whereas raw OpenCL is not so it could just be that, but I thought I should make sure before I get into more complex things since I have plans to really tax this card once I figure out the intricacies of OpenCL. Also here is the code I'm using each frame to execute the kernel: [code]void CLContext::runKernelForScreen(int screenWidth, int screenHeight) { cl_int result = CL_SUCCESS; cl::Event ev; cl::NDRange localRange = cl::NDRange(32, 16); cl::NDRange globalRange = cl::NDRange(screenWidth, screenHeight); //make sure OpenGL isn't using anything glFlush(); //get the OpenGL shared objects result = _commandQueue.enqueueAcquireGLObjects(&_glObjects, 0, &ev); ev.wait(); if(result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } //set the argument to be the image _primaryKernel.setArg(0, _screenTextureImage); //enqueue operations to perform on the texture result = _commandQueue.enqueueNDRangeKernel(_primaryKernel, cl::NullRange, globalRange, cl::NullRange, 0, &ev); ev.wait(); if (result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } result = _commandQueue.enqueueReleaseGLObjects(&_glObjects, 0, &ev); ev.wait(); if (result != CL_SUCCESS) { throw OCException(LookupErrorString(result)); } _commandQueue.finish(); }[/code] I profiled this and found that the bulk of the time is spent on the ev.wait() lines, and commenting those out doesn't do any direct harm but only yields around a 100 FPS gain, also at that point the execution time is almost entirely in _commandQueue.finish() for obvious reasons. If it matters at all, I'm initializing the OpenGL texture as such: [code]glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, screenWidth, screenHeight, 0, GL_RGBA, GL_FLOAT, NULL);[/code] And the respective OpenCL texture object is created with: [code]_screenTextureImage = cl::Image2DGL(_context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, textureId, &err);[/code] Lastly in addition to profiling from the host side, I've also used gDebugger to try and see where the issue is, but the tool (at least as I'm capable of using it) doesn't yield much performance data other than to say that on average the kernel uses around [s]70%[/s] (17%) of the GPU to run. I've tried Parallel NSight as well, but it seems a bit daunting to me in it's complexity. Hopefully I've preempted most of the questions concerning how I'm doing things and someone can make some sense of all this. Is my head on straight here? I don't think I'll be surprised either way if I hear that this is the kind of performance I should or should not expect from OpenCL on this hardware, but like I said I feel like I'd be getting a bit more at this stage from GLSL.
  15. Yeah after I posted this I did a bit more research and quickly discovered the awesomeness of the C++ bindings to OpenCL. Maybe that's my answer then. I suppose given the amount of experience I have with C++ vs Python, it'd probably be faster for me to go that route than practically learn a new language entirely.