Jump to content

  • Log In with Google      Sign In   
  • Create Account

Grouping/batching to optimize rendering


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 06 March 2012 - 10:18 AM

Hey,

As we all know, batching things is good for performance. However, there are several ways to sort/group things. And I'm wondering if I'm going the right way. I have 3 examples in my engine, and would like to know if the sorting can be done more efficient.

* For the info, this is mainly an indoor situation, and I'm using OpenGL, though I assume the principles are the same regardless the rendering platform.


Example1: Objects
----------------------------------
Several rooms are visible, each with props like furniture, boxes, lamps, decorations, and so on. Typically, such an object has 500 to 2.000 triangles, and 1 "material". A material is:
- a vertex / fragment shader
- Sometimes a geometry shader (rare)
- Shader parameters (textures, color-vectors, factors, ...)

Let's say I have 100 objects to render. Currently, I sort on material. So if there are 50 boxes and 50 barrels, the rendering loop could look as follow:
1- apply material "box" (shaders + set textures/parameters)
2- for each box in boxes
              glPushMatrix;
              glMultMatrix( box.matrix );
                  box.renderVBO;  // Render the raw mesh via a VBO
              glPopMatrix;
3- apply material "barrel"
4- for each barrel .... see step 2
So, we only had to switch materials/textures twice instead of 100 times. However, I could also sort on mesh VBO. Now for each object, the VBO mesh gets set (glBindBufferARB), rendered, unset.
* DirectX has "instancing"... can that be compared to this?

In that case I only have to activate the VBO twice, then repeat the render call. But obviously, this causes more material switches. In practice all the objects with the same VBO have the same material as well, but it will still cause more material switches. 4 different box-models may still share the same material, and thus get grouped together in case I sort per material.

Anyway, how would you sort / batch it?



Example2: Static geometry (walls, floors, pillars, ...)
----------------------------------
Since I'm using a portal-rendering approach, the world gets split up in rooms. Then currently each room gets split up in material-groups. For example:
1- All polygons using material "wood floor"
2- All polygons using material "brick walls"
3- All polygons using material "metal support bars"
4- All polygons using material "concrete ceiling"

Usually the polycount for each "polygon chunk" is very small, since walls/floors are usually simple shapes. A 6-face box room would have 2 tris for group1 (wood floor), 8 tris for the walls, and so on. In practice, the polycount is a bit higher in my case; the average room uses ~2.000 polygons in total. And 1 to 10 rooms are visible at the same time, usually.

Before rendering, all the chunks from all visible rooms are grouped together. So if there are 10 rooms visible, all using the same brick-walls, all those small chunks get grouped together. The render-loop would look like this:
1- material "wood floor" apply (set shaders, textures, parameters)
2- for each floorChunk_using_Wood
                       chunk.render;  // glDrawArray (vertices, normals, tangents, weights, texcoords)
This minimizes the amount of material switches. But in case multiple rooms are visible, I'm rendering quite a lot small geometry pieces, using glDrawArray (no VBO's).


Another method I used is rendering per room & a VBO. First a VBO that contains all the geometry (vertices, normals, tangents, weights, texcoords) for the entire room will be activated. Then for each material, I apply the material and render a piece of that VBO via indices:
for each visibleRoom
      room.activateVBO;
      for each chunkSortedOnMaterial in roomChunks
            chunk.material.apply
            chunk.renderVBO_viaIndices

      room.deactivateVBO;
Geometry wise, this seems a smarter way. Although I'm not sure the relative small polycounts (~2k per room) makes a real difference. The big disadvantage however, is that when there are 10 rooms visible, we need to repeat this 10 times too. Even if all rooms are using the same materials.

* Got to mention that each surface (floor, wall, ...) can use quite a lot texture data each (up to 16 MB texture data all together).

In both cases, I'm still rendering small parts of the room. Combining rooms and thus their geometru would fix that, but does not coop with the portal-rendering approach of course.



Example3: Transparent objects (trees, plants, windows, bottles, ...)
----------------------------------
This one is nasty. I could sort in the similar ways, but obviously that causes depth issues. If I sort purely on depth, batching is near to impossible. Will I sort on material and/or mesh, the depth-order gets screwed up.

Now fortunately, I have little transparent objects. But I wonder how a game like Crysis renders its jungle. I mean, that uses a lot of transparent surfaces right?


Cheers,
Rick

Sponsor:

#2 Tsus   Members   -  Reputation: 1035

Like
1Likes
Like

Posted 06 March 2012 - 05:33 PM

Hi Rick!

So, we only had to switch materials/textures twice instead of 100 times. However, I could also sort on mesh VBO. Now for each object, the VBO mesh gets set (glBindBufferARB), rendered, unset.
* DirectX has "instancing"... can that be compared to this?

OpenGL has instancing, too. (Since 3.1 or 3.2 if I recall correctly.) With instancing you would only submit one draw call to render many objects (keeps the command buffer smaller). You can add in some variation by reading the instanceID in the shader and the fetch from another texture or use other constants for colors.

Anyway, how would you sort / batch it?

I have the feeling that material switches are more expensive, since more stuff has to be set (probably several states, uniforms and worst textures). Perhaps you could profile to find out, which costs more? I think it depends on the size of the textures and VBOs, thus can’t be generalized. Posted Image

If you group by materials you could batch geometry with the same material into a few VBOs, probably somehow clustered so that you can cull them. I think I would place objects that are just once in the room at the beginning of the vbo (all objects could be drawn with a single draw call, if all of them a visible) and objects that can be drawn instanced at the end, this way you can still use instancing by drawing only from parts of the VBO.

In both cases, I'm still rendering small parts of the room. Combining rooms and thus their geometry would fix that, but does not coop with the portal-rendering approach of course.

You could probably find out which objects are multiple times in the scene and draw them by instancing (even if they are in different rooms). It gets a little more complicated by that, and to be honest: I’m not sure whether it is worth the trouble. Posted Image

Now fortunately, I have little transparent objects. But I wonder how a game like Crysis renders its jungle. I mean, that uses a lot of transparent surfaces right?

Most titles try to avoid alpha blended objects if possible (also because they don't work well with deferred shading...). For foliage one would rather use alpha to coverage. I’m not sure whether it is in the GL standard by now, but there are Nvidia extensions that do the work for you. If you don’t want to use them you can build the sample mask based on your alpha value in the fragment shader yourself and set it to gl_SampleMask (the more opaque the more sample bits are 1). With this you (hopefully) only have a few “true” alpha objects (so batching wouldn’t be of much use.)

Cheers! Posted Image

#3 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 06 March 2012 - 06:43 PM

OpenGL3... I was afraid I had to take that step some day. Ho shit, I just discovered OpenGL4 is there as well! Will it ever stop? I'm spending more time on upgrading graphics / shaders / sound / physics libraries instead of the game itself these days Posted Image

by reading the instanceID in the shader and the fetch from another texture or use other constants

In my oldskool knowledge, you can only pass 16 textures at a time to a shader (unless you pack more in a texArray). But a short while ago I red some stuff about c- and tbuffers. Does that mean I can let an instance decide which textures (loaded in the video memory) to grab before it renders? Some thing for constants. I have, for example, 1.000 different materials. If each had 4 parameters, it could fit in one cbuffer. But are the shaders/instance flexible enough to decide, on the GPU, where to find the correct parameters? And moreover, does it really gain some speed? I think yes, because currently the CPU has to define all parameters each time before rendering something. But just checking...

Excuse me, but I'm a bit prehistoric with OpenGL2x, Delphi7, 32 bit computer, and an older Cg shader language!

build the sample mask based on your alpha value in the fragment shader yourself and set it to gl_SampleMask

Another new thing. Never looked at multisample demo's, so could you please explain what's going on here? From my little understanding, you don't use traditional glBlend / glAlpha, but you sort it out yourself by defining a number of layers, and keeping track of previous results somehow? That means the shader does the (limited) depth-sort? Not very precise, but who will notice between a few million tree leaves / grass-blades... Ifso, I don't have to bother about rendering order, and I can try to implement foliage into the deferred-pipeline. Although I still wonder how the edges around leaves/metal fence/barb wire can be done since blending values for deferred rendering is not done.

#4 SimonForsman   Crossbones+   -  Reputation: 6108

Like
0Likes
Like

Posted 06 March 2012 - 08:22 PM

OpenGL3... I was afraid I had to take that step some day. Ho shit, I just discovered OpenGL4 is there as well! Will it ever stop?


You could just use extensions if you prefer that, no need to switch OpenGL version for one feature, look at the EXT_draw_instanced extension. (or NV_draw_instanced for the even older nvidia version of the extension). (any nvidia card that supports the nv version will support the ext version aswell, unless you use a very old driver)
I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#5 Tsus   Members   -  Reputation: 1035

Like
1Likes
Like

Posted 07 March 2012 - 03:52 AM

Hi,

In my oldskool knowledge, you can only pass 16 textures at a time to a shader (unless you pack more in a texArray).

The actual number of active texture units you can have depends a little on the graphics card. (On current hardware it goes up to about 160, see here.)

But a short while ago I red some stuff about c- and tbuffers. Does that mean I can let an instance decide which textures (loaded in the video memory) to grab before it renders? Some thing for constants. I have, for example, 1.000 different materials. If each had 4 parameters, it could fit in one cbuffer. But are the shaders/instance flexible enough to decide, on the GPU, where to find the correct parameters?

The shader decides from which texture to fetch. Therefore each possible texture must be bound to a texture unit. In Cg you have a special input semantic (INSTANCEID) in the vertex shader. You can use this value to index in a sort of material index buffer.
int myTexIndex = materialIndexBuffer[instanceID];  // do this in the vertex shader
vec4 color = tex2d( inputTexture[myTexIndex], texcoord ); // do this in the fragment shader.
Same thing for reading colors and transformations.

And moreover, does it really gain some speed? I think yes, because currently the CPU has to define all parameters each time before rendering something. But just checking...

The binding time is nearly the same, since you have to bind all at once. But the rendering should be faster (fewer draw calls). Though, it is only really beneficial when you have many instances.

Another new thing. Never looked at multisample demo's, so could you please explain what's going on here?

Multi-sampling is an approach to do anti-aliasing. For each pixel a certain number of subpixels is evaluated. The position of the subpixels are randomized (but stratified) and are not in the GL standard (, which means they are up to the hardware vendors). Though, there is one “default” setting of subpixels being equal on all cards.
For 4xMSAA it could look like this:
|__x____|
|______x|
|x______|
|____x__| // Positions of subpixels in a pixel.


You can use multi-sampling in two ways: shading at sample-frequency or pixel-frequency. Sample-frequency means the fragment shader is executed for every subpixel (storing color and depth). In the end, when all objects were rendered, the subpixels are averaged to give the final pixel color. You should prefer this mode if you have strong discontinuities in your textures, since shading at pixel-frequency only anti-aliases polygon edges.
The other mode is shading at pixel-frequency. Here, the fragment shader is executed just once per pixel(!) and only the depth values and a coverage values are generated for each subpixel. (Coverage is a bit vector telling for each subpixel, whether the geometry is there or not.)
|..x../_|
|..../_x|
|x../___|
|../_x__| // In this example only two subpixels cover the object.


With alpha to coverage you modify the coverage vector, by writing to gl_SampleMask (only possible when running at pixel-frequency). This means for instance, when you have an alpha value under 25% you set one bit to 1, which means only for this subpixel the color of the fragment is used. The other three bits can be set by objects behind. In the end all values get averaged and you get smooth borders. Of course this is only a fake and is not as precise as sorting all fragments and blending them correctly, but it gives still good results.

Cheers!

#6 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 07 March 2012 - 04:56 AM

Extensions of course. I wonder though, does the usage of extensions have negative impact somehow, compared to GL3 or 4.x where a lot of functions are part of the core now? I need to enhance the performance of the engine, so all the little things that can help, should be considered.

On current hardware it goes up to about 160, see here.

Really? With passing I mean I bind a texture to one of the channels (glActiveTextureARB( channelNum ). If I let a (cg) shader explicit refer to "TEXUNIT16" or higher, it crashes AFAIK. Tried that on a ~1.5 year old nVidia card btw. Correct me if I'm wrong, but the table you showed is the maximum number of texture-reads within a shader program, not the amount of active textures right?
Ifso, that still means I'm limited to 16 different textures when rendering (instanced) objects...

Well, it doesn't really matter. I assume instancing only makes sense if you have a lot of the same objects. Which is usually not the case for my maps and their contents. Although plants or floor-junk could share the same material(texture-set) to minimize the switches, then gets sorted on VBO instead and render in packs with instancing. A chair that appears 4 times probably doesn't benefit much from instancing, so I keep it sorted on material instead.


Constant-buffers
You said binding-time is nearly the same. Could be, but the main difference is that I only have to fill the constant-buffer once, while with traditional parameters, I have to pass the parameters each time I apply a shader. Example, I have a material "wood1a". It uses 2 textures, and has a specularColor float4 vector. When I apply "wood1a", I have to bind 2 textures, and pass vector. Each time again. Thousands of passings happen each render-cycle currently.

Asides from global dynamic stuff like the cameraposition or lightcolor, those parameters never change though, so isn't it just possible (in OpenGL) to create a buffer that contains *all* static parameters, and upload it once? So I don't have to pass them anymore? I red somewhere such a buffer (in DirectX) has 4096 floats or vectors. Dunno if that is enough to store all numeric parameters from all materials I have, but it is quite a lot. As you showed, I could then do something like:
- float4 parameter1 = cbuffer[ material_offsetIndex + 0 ];
- float4 parameter2 = cbuffer[ material_offsetIndex + 1 ];
- Use the parameters, whatever they are
Then I only have to bind that cbuffer once, and pass the "material_offsetIndex" parameter for each different material. Would that make sense (if possible at all)?



Multi-Sample
I'm a slow learner, but that gives a better understanding. There are plenty of MSAA demo's so I should look there. And hopefully a demo that uses the masking/coverage technique. Faulty ordering doesn't matter as long as the viewer doesn't really see it... which I doubt with foliage. I guess it only works properly for "black-white" transparency (a metal fence for example) and not for translucent surfaces such as glass). But that's ok for grass and the likes.

I still wonder though how edges will look when using it in a deferred rendering pipeline. Averaging colors is not a problem, but normals, and especially positions/depth creates weird results. I could only let the (near) opaque pixels write normals/depth, but that gives a blocky edge when applying a light on it, right?



- Thanks again for the explanation!!
Rick

#7 Tsus   Members   -  Reputation: 1035

Like
1Likes
Like

Posted 07 March 2012 - 05:00 PM

Hey Rick,

Extensions of course. I wonder though, does the usage of extensions have negative impact somehow, compared to GL3 or 4.x where a lot of functions are part of the core now? I need to enhance the performance of the engine, so all the little things that can help, should be considered.

Extensions are probably not that well optimized. (But I don't think that it makes a big difference.)


On current hardware it goes up to about 160, see here.

Really? With passing I mean I bind a texture to one of the channels (glActiveTextureARB( channelNum ). If I let a (cg) shader explicit refer to "TEXUNIT16" or higher, it crashes AFAIK. Tried that on a ~1.5 year old nVidia card btw. Correct me if I'm wrong, but the table you showed is the maximum number of texture-reads within a shader program, not the amount of active textures right?
Ifso, that still means I'm limited to 16 different textures when rendering (instanced) objects...

Hm, the OpenGL docs say you can bind with glActiveTexture exactly that number of textures.
You can test yourself how many are supported on your machine:
int value;
glGetIntegerv(GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS,&value);

I just looked in the Cg docs for shader model 5 and surprisingly they really only have 16, as you said. Posted Image
The number of shader resource views that can be bound in DirectX is also much higher than 16.

Well, it doesn't really matter. I assume instancing only makes sense if you have a lot of the same objects. Which is usually not the case for my maps and their contents. Although plants or floor-junk could share the same material(texture-set) to minimize the switches, then gets sorted on VBO instead and render in packs with instancing. A chair that appears 4 times probably doesn't benefit much from instancing, so I keep it sorted on material instead.

Makes sense, considering the time it would take to setup the instancing. This would have been my choice here, too.

Asides from global dynamic stuff like the cameraposition or lightcolor, those parameters never change though, so isn't it just possible (in OpenGL) to create a buffer that contains *all* static parameters, and upload it once? So I don't have to pass them anymore? I red somewhere such a buffer (in DirectX) has 4096 floats or vectors.

You can create constant buffers (in OpenGL they are called uniform buffer objects) that are even bigger than 64 KB (=4096 32bit 4-component constants). But you can only bind 64 KB of it at a time.

Dunno if that is enough to store all numeric parameters from all materials I have, but it is quite a lot. As you showed, I could then do something like:
- float4 parameter1 = cbuffer[ material_offsetIndex + 0 ];
- float4 parameter2 = cbuffer[ material_offsetIndex + 1 ];
- Use the parameters, whatever they are
Then I only have to bind that cbuffer once, and pass the "material_offsetIndex" parameter for each different material. Would that make sense (if possible at all)?

Perhaps. Reading from a constant buffer is very fast (unless each fragment/vertex tries to read from a different location. That’s why there are tbuffers, because constant buffer suffer from constant waterfalling). Though you have one more indirection: reading from the materialIndex uniform. I think you have to test, whether it is faster, but it would definitely decrease the number of state changes.

Faulty ordering doesn't matter as long as the viewer doesn't really see it... which I doubt with foliage. I guess it only works properly for "black-white" transparency (a metal fence for example) and not for translucent surfaces such as glass). But that's ok for grass and the likes.

Indeed, that’s where stochastic transparency comes in. Unfortunately it is still too expensive for practical usage.

I still wonder though how edges will look when using it in a deferred rendering pipeline. Averaging colors is not a problem, but normals, and especially positions/depth creates weird results. I could only let the (near) opaque pixels write normals/depth, but that gives a blocky edge when applying a light on it, right?

You could take a look at Johan Andersson’s slides. Aside from other cool stuff (like tile-based deferred shading) he has code for the alpha to coverage technique in the slides. (See page 52 for a comparison. Posted Image). Yes, deferred shading can look crappy at the boundaries if you shade at pixel frequency. In the slides I linked is also a note (page 18) how this is resolved. (They adaptively switch to sample-frequency at boundaries).

Cheers! Posted Image

#8 MJP   Moderators   -  Reputation: 11315

Like
1Likes
Like

Posted 07 March 2012 - 05:02 PM

With deferred rendering you would never resolve (average) your G-Buffer contents. You would render the G-Buffer with MSAA enabled, which would give you unique G-Buffer samples in the subsamples belonging to pixels along triangle edges. Then in your lighting you would light each of those subsamples individually, and either write them all out to an MSAA render target (to be resolved later) or resolve it on the fly (which can cause artifacts for certain cases, see the article in ShaderX7 for details). What this really boils down to is a form of selective supersampling, where you have the same memory footprint but only supersample the shading on triangle edges (actually with deferred rendering you can even limit supersampled shading to edges with depth, normal, or material discontinuities to avoid wasted work inside mesh silhouettes). This extends naturally to alpha testing, where you can supersample the alpha test in your shader and use that to have per-subsample visibility rather than per-pixel visibility. After that point everything just works, and when you resolve you get properly antialiased edges. There are no issues with sorting or blending, since each subsample is still depth tested.

Alpha to coverage is similar, but a little different. Instead of supersampling the alpha test you use the alpha value to drive a dither pattern that works across subsamples. But after that point it's the same as alpha test: the G-Buffer will contain normal/depth/albedo data for multiple subsamples, you'll light each subsample individually, and then resolve the result. Again there's no issues with sorting or blending, since each subsample is being treated as opaque and is properly depth tested. You can think of it as if you rendered the whole screen at a higher resolution, and the resolve would then be a downscale to the display resolution.

#9 mhagain   Crossbones+   -  Reputation: 7965

Like
1Likes
Like

Posted 07 March 2012 - 06:31 PM

There should be no performance impact from use of extensions vs core OpenGL.

In many cases a GL_ARB_ extension is promoted directly to the core API completely unmodified.

In recent times features from newer GL_VERSIONs have been back-ported to extension status so they could be used on older hardware.

In some cases an extension is ubiquitous but never promoted to core - anisotropic filtering and S3TC texture compression are two examples; they can't go to core because of patent/IP/legal crap, but everyone uses them and there is no performance impact due to their extension status.

In all cases the extension version should run just as well as the core version. The exception is where the driver exposes an extension but the hardware doesn't actually support it (i.e. the driver emulates it in software) - the GeForce FX series notoriously exposed GL_ARB_texture_non_power_of_two in order to claim OpenGL 2.0 status but would drop you right back to full software emulation if you actually tried to use it. So long as you catch instances of that happening (and you'll know it pretty quick when your performance falls to less than 1 fps) you'll be OK. Otherwise - no cause to be concerned about using extensions.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#10 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 08 March 2012 - 11:49 AM

Ah, I get well served here Posted Image

As for OpenGLxxx, that means I can safely keep using GL2 for now. Maybe its time step to drop the ancient OpenGL 1997 way of thinking, and force myself with VBO's and using shaders for everything (as far as I'm not already doing that). But... upgrading is boring, so I might delay it a bit longer hehe.

Texture amount
Maybe Cg is still a bit behind with that then? I can save myself with 16 textures probably though. But having more allows some more flexibility, and less multi-passing in some cases maybe.

Uniform Buffer Objects
I don't yet understand the difference between c- and tbuffers (I thought tbuffers = texture-buffer), but in practice the parameters will be grouped together in the buffer, and the materialIndex does not vary per pixel normally. One object or one wall has 1 materialID normally.
-edit-
Setting up an UBO in GL is childsplay but... How the hell do I pass & use it in Cg?
-edit again-
Last february, Cg released version 3.1 that supports UBO's. Now it's just a matter of updating my delphi header for the DLL :|


Deferred lighting
Thanks for the battlefield paper, awesome stuff. Also the tiled deferred rendering caught my interest (I hate it when you want to try 100 things at the same time). Little question about that... is it better to make a different shader for each possible lightcount ("shader_1lamp", "shader_2lamps", ... ) or is looping through a variable sized list totally ok these days?

As for Alpha-Coverage... I can ask, but it's better just to try it myself first Posted Image I can see why you have to take multiple samples along edges and light them individually, then combine them as one in the end. But the techniques/steps to do that efficiently are new to me. So again, someone knows a good (OpenGL) demo or paper on alpha-coverage?

Thank you all!

#11 Tsus   Members   -  Reputation: 1035

Like
0Likes
Like

Posted 08 March 2012 - 03:24 PM

Hey Posted Image

Uniform Buffer Objects
I don't yet understand the difference between c- and tbuffers (I thought tbuffers = texture-buffer), but in practice the parameters will be grouped together in the buffer, and the materialIndex does not vary per pixel normally. One object or one wall has 1 materialID normally.

The HLSL docs explain the difference between cbuffers (constant buffers) and tbuffers (texture buffers). This will very likely apply to OpenGL as well. In your case cbuffers seem to be the better choice.

Last february, Cg released version 3.1 that supports UBO's. Now it's just a matter of updating my delphi header for the DLL :|

Alright, you've seen it yourself. Posted Image

Deferred lighting
Thanks for the battlefield paper, awesome stuff. Also the tiled deferred rendering caught my interest (I hate it when you want to try 100 things at the same time). Little question about that... is it better to make a different shader for each possible lightcount ("shader_1lamp", "shader_2lamps", ... ) or is looping through a variable sized list totally ok these days?

This particular approach handles lights a little differently. The degree of freedom is here the tile size (e.g. 16x16=256 threads). Each tile can cull 256 lights in parallel (and loops if there are more), compiles in shared memory a list and afterwards each pixel iterates over this list. This means, the number of lights being processed is different from tile to tile and is also dynamic.

In general, when your shaders directly loop over a list of lights, it shouldn’t be so crucial how long the list is, as long as you read the number of lights directly from a constant buffer (so that the driver has a chance to see which constant value defines the length of the loop). The driver will try to generate for the most often used configurations of states/constants etc multiple versions of the shader byte code and switch to the best optimized version for the respective case. (That’s why it sometimes takes a little longer when you run a shader the first time. The driver does the final optimizations, e.g. replaces parts of the compiled code with hand-optimized code written in Santa Clara. Nice, isn’t it?) There is so much to talk about the driver, it is best you start reading here. Posted Image

As for Alpha-Coverage... I can ask, but it's better just to try it myself first Posted Image I can see why you have to take multiple samples along edges and light them individually, then combine them as one in the end. But the techniques/steps to do that efficiently are new to me. So again, someone knows a good (OpenGL) demo or paper on alpha-coverage?

Humus did some work on that with OpenGL. I haven’t looked at the code, though.

Cheers!

#12 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 08 March 2012 - 05:43 PM

And thanks again!

Still couldn't test the buffers, the game is crashing with the new DLL's, geometry shaders cause an "invalid item" error. Backward compatibility my ass, that's why I hate the endless upgrades :D Oh well, it keeps me busy.


I think I have enough papers, examples and entry points now. Time to read, code (and fix annoying bugs)!

#13 spek   Prime Members   -  Reputation: 996

Like
0Likes
Like

Posted 09 March 2012 - 06:40 AM

Crap, as for sorting/batching, I realised I have another problem that makes the sorting even more difficult. As a first test, all objects now get sorted on material. But... since this engine uses portal-culling, I used to do this:
1- set a scissor-test rect around a portal that makes sector X(a room) visible
2- render the sector behind that portal, + all its contents. Many of its pixels may get clipped by the scissor though.
3- ...repeat for all visible portals


In this indoor situation, you often see only (small) parts of a sector. So a scissor makes sure I don't render pixels that aren't visible anyway. However, when sorting all objects/wall/floors on material, things can get mixed up. For example:
* Render all boxes using material "cardboardBox1"
- render box in sector1
- render box in sector2
- render box in sector1
- ...

So, if I scissor background sector2, it might happen that boxes from other sectors will be clipped. I could do a couple of things:
A. Adjust the scissorrect for each object before rendering it. That means many, many glScissor() calls though.
B. Don't use scissors at all, but risk overdraws / fillrate problems
C. Sort per sector instead of all objects (that is what I did so far actually).
But the more sectors, the more chance you activate the same materials multiple times = more state switching.
D. Use other ways to clip... For each entity, test if it is visible before
inserting it in the renderQueue. For example by testing a boundary box on the portal it is behind.


All 4 don't seem very attractive to me. D makes most sense, but costs some extra CPU power, certainly if you have many objects (though an octree or something could help making clusters of objects visible / invisible instead of doing it per instance).
And neither does it work good for large entities such as a concrete wall that covers the entire screen, but is only partially visible through a door/window portal from a foreground sector...

Another thing I could do is A: and first sort per material, then sub-sort per sector. That decreases the amount of glScissor calls, though it still will be a lot. Currently if there are 10 sectors visible, that means 10 glSiccor calls. If I have do it with sub-sorting, 10 sectors & 40 different objects spreaded all over, there will be 40 to 400 glScissor calls in the worst case.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS