Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 07 Mar 2007
Offline Last Active Feb 25 2016 04:52 PM

#5264057 Instancing, and the various ways to supply per-instance data

Posted by CDProp on 29 November 2015 - 12:08 AM

From the reading I've been doing, it seems like there are a few different ways to supply per-instance data while using instancing in OpenGL. I've tried a couple of these. Here is a rundown, as I understand it. With each example, I'll use the mvp matrix (modelViewProjection) as the per-instance item. I'm hoping that you can help correct any errors in my understanding.


Array Uniforms w/ gl_InstanceId



layout(location = 0) in vec4 pos;
uniform mat4 mvp[1024];


void main() {
    gl_Position = mvp*pos;

With this method, you're just declaring an array of mat4 as a uniform, and you're using gl_InstanceId to index that array. The main advantage of this method is that it's easy, because it's hardly different than the normal way of using uniforms. However, each element in the array is given its own separate uniform location, and uniform locations are in limited supply (as few as 1024).


Vertex Attributes with Divisor=1


OpenGL example:

#define MVP_INDEX 2


glBindBuffer(GL_ARRAY_BUFFER, mvpBuffer);
for (int i = 0; i < 4; ++i) {
    GLuint index = MVP_INDEX+i;    
    glVertexAttribPointer(index, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat)*16, (GLvoid*)(sizeof(GLfloat)* i * 4));
    glVertexAttribDivisor(index, 1);
glBindBuffer(GL_ARRAY_BUFFER, 0);

GLSL example:

layout(location = 0) in vec4 pos;
layout(location = 2) in mat4 mvp;


void main() {
    gl_Position = mvp*pos;

With this method, the mvp matrix just looks like a vertex attribute from the GLSL side of things. However, since a divisor of 1 was specified on the OpenGL side, there is only one matrix stored per instance, rather than one per vertex. This allows very clean access to a large number of matrices (as many as a buffer object can hold). You also get all of the advantages that other buffer objects have, such as streaming using orphaning or mapping strategies. However, each matrix uses four vertex attrib locations. There may be as few as 16 total vertex attrib locations available. If you plan on using a shader that requires multiple sets of UV coordinates, blend weights, etc., then you may not have enough vertex attrib locations to use this method.


So, I'm trying to find a method that will allow thousands of instances without using up precious vertex attrib locations. I am hoping that Uniform Buffer Objects or SSBOs will come to the rescue. I haven't yet attempted to use them for this purpose, nor have I found many examples of people online using them for this purpose. Maybe there is a reason for that. smile.png. So here's my current understanding of how it works. I would be much obliged if someone could read it over, and tell me where I'm wrong.


Uniform Buffer Objects


OpenGL example:

GLuint mvpBuffer;
// GenBuffers, BufferData, etc.
glBindBufferBase(GL_UNIFORM_BUFFER, 0, mvpBuffer);
GLuint uniformBlockIndex = glGetUniformBlockIndex(myProgram, "mvpBlock");
glUniformBlockBinding(myProgram, uniformBlockIndex, 0);

GLSL example:

layout(row_major) uniform MVP {
    mat4 mvp;
} mvps[1024];

void main() {
    gl_Position = mvps[gl_InstanceId]*pos;

It seems like this could alleviate restrictions with attrib locations. However, you are limited by GL_MAX_UNIFORM_BLOCK_SIZE, which I believe includes each instance in the instance array. This can be as low as 64kB, which in our case would only allow for 1024 instances, which is no better than the first method.


Shader Storage Buffer Objects


This method would be essentially identical to the Uniform Buffer method, except the interface block type is buffer and you can use a lot more memory. You can also write to the SSBO from within the shader, but that is not necessary for this application. On the down side, the Wiki says that this method is slower than Uniform buffers. Again, I haven't tested this myself, so I may be mistaken about how this works.





#5257292 I'm having trouble making sense of these performance numbers (OpenGL)

Posted by CDProp on 14 October 2015 - 11:34 PM

Alright, so I wasn't able to start on this until late this evening, but I do have some results to share. The following graph shows the time vs. frame number for 50,000 cubes rendered using DrawElementsInstanced (no camera panning):



So, it seems that the gpu is the bottleneck in this case. Almost the entire frame time is spent waiting for SwapBuffers to return. I tried this same experiment with 5,000 cubes, and got the same results (albeit with smaller frame times). That is, gpuTime and swapBuffersTime were very close to the total frame time.


I then tried running the same experiments with DrawElements (not instanced), and I got a very different plot. This time, the frame times and gpu time were about equal still, but the swap buffers time was way lower:



This looks to me like the gpu is still taking the same amount of time to draw the cubes as in the Instanced case, but since the CPU is spending so much more time submitting draw calls, there is much less time left over for waiting for the buffer swap. Does that sound right?


I also tried using an object that is more complex than a cube -- just a quick mesh I made in Blender that has 804 unique verts. Once again, there is no performance difference between the DrawArrays, DrawElements, and DrawElementsInstanced cases. However, the good news is that the triangles-per-second increased by more than 2X with the more complex model, just as you predicted.


So, it appears that my test cases are not great -- they take long enough to draw on the GPU that there is plenty of time on the CPU side to submit all of the draw calls individually.


However, the vertex processing stage does not seem to be the culprit, since there is no difference in GPU time between the indexed and non-indexed cases. Next, I'll experiment more with fragment processing and reducing the number of single- and sub-pixel triangles in the scene.

#5257272 Why are voxels more efficient than polygons?

Posted by CDProp on 14 October 2015 - 09:00 PM

It reminds me of those "infinite detail" demos from years back.

#5191161 How do triangle strips improve cache coherency ?

Posted by CDProp on 04 November 2014 - 12:43 PM

With an index buffer you also get cache coherency; the hardware can just store the most recently accessed indices, and if one of them comes up again, it doesn't need to retransform the vertex.  Of course you need to order your vertices to more optimally enable this, but in the best case adding indices can get you significantly better vertex reuse than strips or fans (memory usage isn't everything).


I think this is a really important point, and here is a good link for more information about it:



#5190163 General Programmer Salary

Posted by CDProp on 30 October 2014 - 09:58 AM

Well, I don't want to derail the thread any further, but I do want to thank Tom, Quat, and stupid_programmer for all of your advice. Very helpful, and I appreciate it.

#5189962 A radiometry question for those of you who own Real Time Rendering, 3rd Edition

Posted by CDProp on 29 October 2014 - 11:01 AM

Radiance is sort of an abstract quantity, but I think it's not so bad if you think about it in terms of it's dimensions. When rendering, we like to think of light in terms of geometrical optics. So, instead of light being waves with a continuous spread of energy, they are discrete rays that shoot straight from the surface you're rendering to the pixel element on the screen. This takes some doing, however, because in reality, light is a continuous wave (classically-speaking -- no intention of modeling things at the quantum level here).

So how do you turn a continuous quantity like an EM wave into a discrete ray? By analogy, consider mass. As a human, you have a certain mass. However, that mass is not distributed evenly in your body. Some tissue is more dense than others. For instance, muscle is more dense than bone. Lets say you knew about the mass density function for your body. That is, if someone gives you a coordinate (x,y,z) that is inside your body, you can plug it into the function and the result will be the mass density at that coordinate. How would you calculate the total mass of your body with this function? Well, you would split up the volume into a bunch of tiny cubes, and you sample the density function in the center (say) of those cubes, and then multiply that density by the volume of the cube to get the mass of that cube, then add up the masses of all the tiny cubes. The tinier the cubes, the more of them you'll have to use, but this will make your mass calculation more accurate. Where integral calculus comes into play is that it tells you the mass you get in the limiting case where the cubes are infinitely tiny and there are infinitely many of them. In my opinion, it's easier to reason about it as "a zillion tiny small cubes" and just remember that the only difference with integral calculus is that you get an exact answer rather than an approximation.

So consider a surface that you're rendering. It is reflecting a certain amount of light that it has received from a light source. We want to think of the light in terms of energy, unlike the mass example. The surface as a whole is reflecting a certain amount of energy every second, which we call the energy flux (measured in Watts, also known as Joules/sec). However, we don't really care what the entire surface is doing. We just want the energy density along a specific ray. So, let's break the surface down into tiny little area elements (squares) and figure out how much flux is coming from each tiny area element. We only care about the area element that is under our pixel. That gives us a flux density per unit area, which is called Irradiance (or Exitance, depending on the situation). So now we know the energy flux density being emitted from the area under our pixel. But wait! Not all of that energy is moving toward our pixel. That little surface element is emitting energy in all directions. We only want to know how much energy is moving in the specific direction of our pixel. So, we need to further break down that Irradiance according to direction, to find out how much of that Irradiance is being emitted along each direction (a.k.a. infinitesimal solid angle). This gives us an energy density with respect to time, area, and solid angle, known as Radiance.

#5189698 General Programmer Salary

Posted by CDProp on 28 October 2014 - 08:16 AM

Ok, I've got a salary negotiation question, and hopefully no one will mind me tacking it onto this thread.


When I first accepted my current position, I accepted a low-ball offer in part because a) the job was (is) awesome, b) I had no higher education, c) I only had a few years of professional programming experience. I just felt the job was much cooler than any job I could hope to find at that stage of my life, and for the most part, I was right. That was about five years ago. Since then, I have gone back to school and I have almost completed my bachelor's in Physics. I have received incremental raises (cost-of-living, or slightly above) but I do not believe that these raises have been commensurate with my increase in experience and education. I am now well below the salary indicated as "average" by every salary survey I can find (Salary.com, Salary Fairy, GamaSutra, etc.) -- about 25% below.


This is not a game programming job, by the way. I'm a graphics programmer for a company that makes training simulators (which are very similar to your video games in most respects!). I want to keep the company identity confidential, so I'll leave it at that.


So here's the rub: my company is not doing very well, and they are in no position to be handing out raises. Additionally, I am in no position to be searching for another job because I still have 6 months left at school. I talked to our CEO to see if we could maybe come up with a plan to bring me up to speed (say) by the time I graduate next year, and I was rebuffed. I was told that we could revisit the question in a year or so, and if the company is doing better, then maybe.


So after having become "that guy" who brought up salary negotiations in the company's time of need (yeesh), and was turned away, I don't know what to do. My main concern isn't the short-term earnings, it's what it will mean for my salary track in the long-term. What if (heaven forbid) the company folds, and I find myself looking for a new job? I will get low-balled by every company out there on the basis of my previous salary. In addition to that risk, I feel that they're essentially asking me to take a pay cut for the company, which wouldn't even be out of the question if I felt like it would be appreciated, but I don't think they see it that way. Lastly, we are a small company, but our overall costs run in the millions of dollars per year, and so even if the company is not doing well, I hardly think that a $15k salary bump for one employee is going to affect things very much.


What does decency and decorum demand that I do here? Should I just drop the issue until the company is doing better? Or at least until I graduate? Should I be looking for other work, or should I not even bother until I'm done with school?

#5189614 General Programmer Salary

Posted by CDProp on 27 October 2014 - 11:38 PM

I'll take $20 per page, if that includes pages generated by the script / template that I write. :P

#5160846 Just a couple of Data-Oriented Design questions.

Posted by CDProp on 16 June 2014 - 08:13 AM

I'm not trying to use it for every piece of code. I'm trying to use it to speed up entity/component updates. I must say, what I'm asking seems like a totally reasonable request. That is, help me understand how data-oriented design is used in games. So far, I've gotten some good replies along with at least two lectures about how I shouldn't be using data-oriented design like a golden hammer. Is that what I'm doing? Because most of the reading I've done on the subject uses this sort of entity-component update as an example of precisely where DOD comes in handy. Is there some way in which I'm misapplying the concept? If so, it would most helpful if you would be explicit about it.

#5134174 Arithmetic vs. geometric mean avg luminance during nighttime scenes

Posted by CDProp on 24 February 2014 - 01:20 PM

Greetings, all.


I was wondering if we could discuss this issue a bit. For the purposes of simple exposure control, it seems common to store the log of the luminance of each pixel in your luminance texture. That way, when you sample the 1x1 mip map level and exponentiate it, you end up with the geometric mean luminance of the scene. This is done to prevent small, bright lights from dimming the scene.


I find that this works really well, but perhaps a little too well. I am using a 16F texture, and so the brightest pixel I can have is 65355. If I have a really dark nighttime scene, such that things look barely visible without any lights, and then I point a bright spotlight at the player (just a disc with a 65355-unit emission), it hardly affects the exposure at all. I would expect a bright light to sort of blind the player a bit and ruin her dark adaptation, so that the rest of the scene looks really dark. I have found that the light needs to cover nearly 20% of the pixels on the screen before it begins to have this effect.


So I switched over to using an arithmetic mean (just got rid of the log/exp on each end) and now it works more like what I would expect.


If you were in my shoes, would you switch to an arithmetic mean, or would you try to find exposure settings that will work better with a geometric mean? 

FWIW, my exposure-control/calibration function is just hdrColor*key/avgLum, where key is an artist-chosen float value, and avgLum is the mean luminance (float). After that, I'm tone mapping with Hable's filmic curve. If you have any suggestions on how to improve it, that would be most helpful. I suppose I could also experiment with histograms and so forth, but I'm not sure if they're meant to solve this particular problem.

#5130724 How much planning do game programmer before writing a single line of code and...

Posted by CDProp on 11 February 2014 - 11:58 PM

I'll tell you what my experience over the years has been. When I was very green, I didn't do much planning at all. I just sat down and started writing code. If the problem was geometrical in nature (as is often the case in 3D games), then I maybe had some diagrams that I drew, but those were really only for the geometry of the problem, and nothing to do with the final code design. My code designs were awful, by the way. I remember one class that took 12,000 lines of code because it did everything. Obviously, this concept of coding by the seat of my pants wasn't working.


So, I switched philosophies. I decided that I was going to plan everything beforehand, with class diagrams and whatnot. My new motto was, "A receptionist who doesn't know C++ ought to be able to implement this from my documentation alone." I felt that I shouldn't write a line of code unless I can prove that it works on paper. Yeah, that didn't work out too well for me, either. It's like trying to see 100 moves ahead in chess. 


Here's the thing. A new programmer doesn't have the ability to take a complex problem, break it down in their head, and then immediately sit down and start coding -- nor do they have the ability to sit down and plan the whole thing out with UML-like diagrams. I can't speak for everybody, but I had to go through this rigmarole of trying solutions, realizing they were crappy, and doing it differently next time. For me, there was no way to short-cut that process.


These days, I can do a much better job reasoning about complex problems in my head, and having a good intuition about what form the solution should take. I'm also better at planning complex problems on paper. So, I used to suck at both. As I gained experience, I got better at both. Go figure.


I always write a little something beforehand. Doesn't have to be much. At the very least, it helps to write down a set of goals and requirements so that I don't go off-track. And, of course, I'm working on a team, so it's often the case that I need to communicate my ideas to others, and that usually entails writing some documentation. Other than that, I tend use notes and diagrams as a sort of secondary storage -- it's difficult for me to keep zillions of details in my head, so if I think of something that I don't want to lose, I write it down. That about sums up the balance I've found for myself.

#5086144 Deferred Shading lighting stage

Posted by CDProp on 15 August 2013 - 09:31 AM

That material class probably covers all that you need at this stage in terms of material options. However, you might find it useful to have separate shaders to cover the cases where a) you have untextured geometry, b) you have geometry with a diffuse texture, but no specular or normal map, c) you have a diffuse texture and a normal map, but no specular, d) etc. If you handle all of these cases with the same shader, you end up sampling textures that aren't there (or else introducing conditional branching into your shader code). Of course, if everything in your game has all of these textures, then it isn't a problem. Unfortunately, I am not that lucky because I have to render some legacy models, some of which have untextured geometry.


As you start working with more advanced materials, you may find that your shader inputs grow in number and become more specialized, and so the number of material shaders you use will grow as well.

#5086126 Deferred Shading lighting stage

Posted by CDProp on 15 August 2013 - 08:35 AM

What I would recommend doing is outputing your positions and normals in view space. If you have any tangent-space normals, transform them in your material shader before outputting them to your g-buffer. That way, you don't need to store tangents or bitangents. I do think it would be a good idea to output specular and ambient properties. If you find that memory bandwidth becomes a problem, then there are some optimizations you could try. For instance, you could reconstruct the position from the depth buffer, thus getting rid of the position texture in your g-buffer. You could also store just two components of each normal (say, x and y) and then use math to reconstruct the third in your light shaders. Even though these reconstructions take time, it's often worth it because of the memory bandwidth savings. Also, if you haven't already done so, you can try using a 16-bit half-float format, instead of a full 32-bit floating point format.

#5085850 Deferred Shading lighting stage

Posted by CDProp on 14 August 2013 - 09:58 AM

That seems more or less correct to me. At some point you'll want to implement some sort of light culling so that you're not shading every pixel with every light. But yeah, typically you'll have one shader for every material type, and one shader for every light type (directional, point, ambient, etc.). In your first pass, you bind the G-buffer as your render target. You render the scene using the material shaders, which output normals, diffuse, etc., to their respective textures in the G-buffer. In the second stage, you go one light at a time, and you render a full-screen quad using the correct light shader for each light. The light shader samples from each of the G-buffer textures, and applies the correct lighting equations to get the final shading value for that light. Make sure to additively blend the light fragments at this stage.


What a lot of people do, in order to cut down on the number of pixels being shaded with a given light, is to have some geometric representation for the light (a sphere for a point light, for example). Before rendering the full-screen quad with the point light shader, they'll stencil the sphere in so that they can be sure to only shade pixels within that sphere. Some even go as far as to treat the sphere almost like a shadow volume, stenciling in only the portions where scene geometry intersects the sphere. This gives you near pixel-perfect accuracy, but it might be overkill. I've been reading lately that some people just approximate the range of the point light using a billboarded quad, because the overhead in rendering a sphere into the stencil buffer (let alone doing the shadow volume thing) is greater than the time spent unnecessarily shading pixels inside the quad that the light can't reach.


Of course, a real point light can reach anywhere. If you were to use a sphere to approximate the extent of the point light, you'd have to use a somewhat phony falloff function so that the light attenuates to zero at the edge of the sphere.

#5085730 Radiometry and BRDFs

Posted by CDProp on 13 August 2013 - 08:55 PM

Thanks, Hodgman. I have some thoughts on that idea, although I may have it wrong. I agree that the the pixel will subtend a solid angle from the point of the view of the surface point that I'm shading. However, I am not certain that it matters in this case. Because we are rendering a perfectly focused image, I believe that each point on our pixel "retina" can only receive light from a certain incoming direction. Here's what I mean (I apologize if a lot of this is rudimentary and/or wrong, but it helps me to go through it all).


If you have a "retina" of pixels, but no focusing device, then a light will hit every point on the retina and so each pixel will be brightened:



If you add a focusing device, like a pinhole lens, then you block all rays except those that can make it through the aperture:



So now, only one pixel sees the light, and so the light shows up as it should: as a point. We now have a focused image, albeit an inverted one. If you widen the aperture and put a lens in there, you'll catch more rays, but they'll all be focused back on that same pixel:



And so I might as well return to the pinhole case, since it is simpler to diagram. I believe that having a wider aperture/lens setup adds some depth of field complications to the focus, but for all intents and purposes here, it can be said (I think) that a focusing device has the effect of making it so that each pixel (and indeed, each sub-pixel point) on the retina can only receive light from one direction:



The orange ray shows what direction the pixel in question is able to "see", and any surface that intersects this ray will be in the pixel's "line of sight." Each pixel has its own such line of sight:



With rasterization, we have things sort of flipped around. The aperture is behind the retina, but the effect is more or less then same. If I put the retina on the other side of the aperture, at an equal distance, I get this:



Now we can see the aperture as the "eye position", the retina as the near plane, etc. The orange rays are now just view vectors, and they are the only directions we care about for each pixel. The resulting image is the same as before, except it has the added bonus of not being inverted (like what a real lens would do).


So with that said, here is what happens if I redraw your diagram, with 5 sub-pixel view vectors going through a single pixel:



So, the single pixel ends up covering the entire light blue surface. You can see that view vectors form a sort of frustum, when confined to that pixel.


I've also added a green point light, with 5 rays indicating the range of rays that will hit that light blue patch. All 5 of those green rays will end up hitting the retina somewhere, but only one of those rays comes in co-linearly with one of the orange "valid directions".