Sign in to follow this  
Butabee

High poly or normal mapped?

Recommended Posts

Butabee    274
Basically I'm wondering if I should just make higher poly models with no shaders or make lower polygon models with normal map shaders. Which would perform better?

Share this post


Link to post
Share on other sites
karwosts    840
Normal mapped low polygon models should render much faster than equivalent high poly models. That is the entire point of normal maps, that they allow you to 'fake' a high polygon mesh without the expense of transforming and rasterizing all of the extra polygons.

Also as a minor note: you don't really use high poly models with "no shaders". These have a shader just like the normal map, and it will look 90% the same. Only difference is normal map will pull its normals from a texture and perform lighting in the fragment shader, while other models will get normals from the vertices and perform lighting in the vertex shader.

Share this post


Link to post
Share on other sites
Hodgman    51235
Normal mapping is definitely "standard" these days ;)

Another thing to keep in mind though is memory usage.
Non-normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
UV (2 x float)
Total = 44 bytes per vertex

Textures:
512x512 diffuse (RGB8)
Total = 786432 bytes

Normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
binormal(3 x float)
tangent (3 x float)
UV (2 x float)
Total = 68 bytes per vertex

Textures:
512x512 diffuse (RGB8)
512x512 normal map (RGB8)
Total = 1572864 bytes
If a 512 normal map takes 786432 bytes, and a vertex from a non-normal mapped mesh takes 44 bytes, that means that for the same memory-cost as a n-map, you could've had an extra 17,000 verts instead.

Share this post


Link to post
Share on other sites
MarkS    180
Quote:
]If a 512 normal map takes 786432 bytes, and a vertex from a non-normal mapped mesh takes 44 bytes, that means that for the same memory-cost as a n-map, you could've had an extra 17,000 verts instead.


Ah. But the normal map simulates 262,144 vertices without the performance hit. It's a trade off, of course, but with the average video card having at least 512 MB of video RAM, memory consumption is a minor concern.

Share this post


Link to post
Share on other sites
kauna    2922
Quote:
Original post by Hodgman


Another thing to keep in mind though is memory usage.

...

Normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
binormal(3 x float)
tangent (3 x float)
UV (2 x float)
Total = 68 bytes per vertex

...



Under Direct3D the equivalent the vertex structure would be something like

position(3 x float)
colour (1 dword) //if needed
normal (3 x float)
tangent (3 x float)
UV (2 x float)
Total = 44-48 bytes per vertex

the bitangent can be computed in shader. Just to point out that the vertex structure doesn't have to be as heavy as presented.

Anyway, in my opinion, hi-poly models and normal maps aren't really exclusive. Normal mapping is technique which gives more detail to the surfaces and in certain cases it may make low-poly objects look smoother. At the surface level, normal mapping can be used to create impression of such details which are almost impossible to create just with polygons (if you want to have reasonable polygon count).

So, make your models as high as required and profile. GPU's can easily push millions of normal mapped polygons. Also, you may use lodding techniques for distanced models.

Cheers!

Share this post


Link to post
Share on other sites
cignox1    735
Quote:
Original post by Hodgman
Normal mapping is definitely "standard" these days ;)

Another thing to keep in mind though is memory usage.
Non-normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
UV (2 x float)
Total = 44 bytes per vertex

Textures:
512x512 diffuse (RGB8)
Total = 786432 bytes

Normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
binormal(3 x float)
tangent (3 x float)
UV (2 x float)
Total = 68 bytes per vertex

Textures:
512x512 diffuse (RGB8)
512x512 normal map (RGB8)
Total = 1572864 bytes
If a 512 normal map takes 786432 bytes, and a vertex from a non-normal mapped mesh takes 44 bytes, that means that for the same memory-cost as a n-map, you could've had an extra 17,000 verts instead.


But normal maps can often be reused for different models (or tiled on the same model), without requiring more memory.

Share this post


Link to post
Share on other sites
Decibit    140
Quote:
Original post by kauna
...
the bitangent can be computed in shader. Just to point out that the vertex structure doesn't have to be as heavy as presented.
...
Cheers!

Both the tangent and the bitangent can also be computed in geometry shader. I think that depending on the quality requirements the normal maps could be significantly compressed. So you should design the rendering part with some flexibility and then test and profile your game with various settings.

Share this post


Link to post
Share on other sites
kauna    2922
Quote:
Both the tangent and the bitangent can also be computed in geometry shader.


I am still unaware of all the potential of the geometry shaders. I am hoping that some day I'll have the chance to look into them. Thank you for telling this.

Best regards!

Share this post


Link to post
Share on other sites
Quat    568
With D3D11, tesselation is one of the big new features, which does mean almost rendering per pixel triangles if you look at some of the tech demos. Of course, you'd want an LOD system.

Share this post


Link to post
Share on other sites
Digitalfragment    1504
Quote:
Original post by maspeir
Quote:
]If a 512 normal map takes 786432 bytes, and a vertex from a non-normal mapped mesh takes 44 bytes, that means that for the same memory-cost as a n-map, you could've had an extra 17,000 verts instead.


Ah. But the normal map simulates 262,144 vertices without the performance hit. It's a trade off, of course, but with the average video card having at least 512 MB of video RAM, memory consumption is a minor concern.


I beg to differ. Texture sampling hurts performance much more than having more vertex data. The cost also goes up significantly with tri-linear and anisotropic filtering. Its made even worse by the fact that the texture fetch will stall the pipeline as the shader must immediately perform a MAD to get it from (0:1) to (-1:1) and then perform a matrix multiply on it to get the normal from tangent space to worldspace.

Of course you can bring the light from worldspace to tangentspace in the vertex shader to get rid of that little matrix multiply, but you end up with some terribly disgusting artificats.

Share this post


Link to post
Share on other sites
MarkS    180
Quote:
Original post by Exorcist
Quote:
Original post by maspeir
Quote:
]If a 512 normal map takes 786432 bytes, and a vertex from a non-normal mapped mesh takes 44 bytes, that means that for the same memory-cost as a n-map, you could've had an extra 17,000 verts instead.


Ah. But the normal map simulates 262,144 vertices without the performance hit. It's a trade off, of course, but with the average video card having at least 512 MB of video RAM, memory consumption is a minor concern.


I beg to differ. Texture sampling hurts performance much more than having more vertex data. The cost also goes up significantly with tri-linear and anisotropic filtering. Its made even worse by the fact that the texture fetch will stall the pipeline as the shader must immediately perform a MAD to get it from (0:1) to (-1:1) and then perform a matrix multiply on it to get the normal from tangent space to worldspace.

Of course you can bring the light from worldspace to tangentspace in the vertex shader to get rid of that little matrix multiply, but you end up with some terribly disgusting artificats.


So, you're saying that a model with over 260,000 visible vertices will display faster than a much lower quality model with a 512x512 normal map? I may be wrong and am willing to accept so, but I seriously doubt this. Remember, I'm not talking about the entire scene, just one model.

Share this post


Link to post
Share on other sites
karwosts    840
Quote:

So, you're saying that a model with over 260,000 visible vertices will display faster than a much lower quality model with a 512x512 normal map? I may be wrong and am willing to accept so, but I seriously doubt this. Remember, I'm not talking about the entire scene, just one model.


A quick test in my simple renderer, just for kicks:

One directional light, all meshes rendered at roughly 1000x1000 pixels.

Single quad with 512x512 normal map.
Average: 1.4ms

Quad subdivided to 17k verts, 32k faces, no textures.
Average 2.0ms

Quad subdivided to 260k verts, 520k faces, no textures.
Average: 6.0ms




Share this post


Link to post
Share on other sites
MJP    19755
Quote:
Original post by Exorcist
I beg to differ. Texture sampling hurts performance much more than having more vertex data. The cost also goes up significantly with tri-linear and anisotropic filtering. Its made even worse by the fact that the texture fetch will stall the pipeline as the shader must immediately perform a MAD to get it from (0:1) to (-1:1) and then perform a matrix multiply on it to get the normal from tangent space to worldspace.


I have to say...that's a terribly short-sighted and misleading statement right there. The cost of sampling a texture versus adding more primitives will vary tremendously depending a variety of factors. This includes (but isn't limited to):

-The size of the texture
-The format of the texture
-The number of texture units in the GPU
-The architecture of texture units in the GPU
-The GPU's texture cache architecture
-The GPU's vertex cache architecture
-The size of a vertex
-The number of vertices you're adding
-The complexity of the vertex shader being used
-The complexity of the pixel shader being used
-The GPU's bandwidth
-The amount of memory available on the GPU
-The GPU's pixel shader architecture
-The GPU's vertex shader architecture
-The size of a triangle on-screen (triangles that don't result in quads of pixels are extremely inefficient for a GPU)
-The GPU's triangle setup rate
-The GPU's rasterization rate
-etc.

I absolutely hate broad, sweeping statements about performance. Don't guess or assume...profile! We have the tools! And of course, don't apply your profiling results to situations that are too far removed from the situation in which they were gathered.

Sorry about the rant, it's over now. :P

Share this post


Link to post
Share on other sites
rubicondev    296
Quote:
Original post by Hodgman
Normal mapping is definitely "standard" these days ;)

Another thing to keep in mind though is memory usage.[code]Non-normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
UV (2 x float)
Total = 44 bytes per vertex

Normal mapped mesh:
Vertex layout:
position(3 x float)
colour (3 x float)
normal (3 x float)
binormal(3 x float)
tangent (3 x float)
UV (2 x float)
Total = 68 bytes per vertex



Hmmm. Just to add a bit of reality to this, youre vertex would ideally be 32 bytes tops, so it fits better in the post-transform cache *and* reduces load on the bus.

Using 3x floats for normals instead of a UBYTE4 for example is pathologically bad - you won't spot any difference at all between the two.

My engine only supports two key formats for models (though their are others for special stuff) as below in directX speak. This alone makes things very simple as the skinned version is admittedly >32 bytes but can do everything the non-skinned ones can. Which is practically anything.


D3DVERTEXELEMENT9 Elements_PNNCTT[]= {
{0, 0, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0},
{0, 12, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 0},
{0, 16, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 1},
{0, 20, D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_COLOR, 0},
{0, 24, D3DDECLTYPE_SHORT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0},
{0, 28, D3DDECLTYPE_SHORT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 1},
D3DDECL_END()
};

D3DVERTEXELEMENT9 Elements_SKIN[]= {
{0, 0, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0},
{0, 12, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 0},
{0, 16, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 1},
{0, 20, D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_COLOR, 0},
{0, 24, D3DDECLTYPE_SHORT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0},
{0, 28, D3DDECLTYPE_SHORT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 1},
{0, 32, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_BLENDINDICES,0},
{0, 36, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_BLENDWEIGHT,0},
D3DDECL_END()
};





There are 2x UV coords in there, and two normals for bump-mapping et al. You can rebuild the 3rd vector in the VS where you also unpack all these packed formats into floats.

If you change your renderer to work with vertices like that above, it'll be the single biggest speed boost you'll ever see - this is important.

Share this post


Link to post
Share on other sites
Martin    194
Graphics cards work by concurrently rasterizing quads, small triangles are expensive because the GPU only achieves a fraction of it's potential due to pixels outside the triangle being turned off. This poor utilisation for very small triangles is one of the strongest arguments for moving away from rasterization on GPUs and towards alternatives such as ray tracing.

Another factor in GPU performance is memory bandwidth, a texel on a normal map is a hell of a lot smaller that a vertex and a bunch of triangulation.

In short, the normal map will perform better, higher poly models however do look better.

Share this post


Link to post
Share on other sites
B_old    689
@Rubicon:
How is the quality of having 16 bit texture coords vs. 32 bit?

You build the tangent space from two vectors. Wouldn't you at least need the sign of the third vector to always get correct results?

[Edited by - B_old on March 14, 2010 4:24:24 AM]

Share this post


Link to post
Share on other sites
rubicondev    296
Quote:
Original post by B_old
@Rubicon:
How is the quality of having 16 bit texture coords vs. 32 bit?

You build the tangent space from to vectors. Wouldn't you at least need the sign of the third vector to always get correct results?

RE: UV - Nobody has ever complained. :) My system supports 4 bits for repeats (ie 16 tiles) and 12 bits for accuracy which is pretty damned accurate tbh - thats about half a texel on a 2048x2048. I guess a situation could be contrived to break it, but it's not cropped up in a lot of normal useage. Plus the gains are worth it imo.

RE: Cross product - Correct. I basically store a bool in the 4th component of one of the normals


Share this post


Link to post
Share on other sites
B_old    689
Quote:
Original post by Rubicon
RE: UV - Nobody has ever complained. :) My system supports 4 bits for repeats (ie 16 tiles) and 12 bits for accuracy which is pretty damned accurate tbh - thats about half a texel on a 2048x2048. I guess a situation could be contrived to break it, but it's not cropped up in a lot of normal useage. Plus the gains are worth it imo.

Could you elaborate on the tiling vs. accuracy thing a bit? I considered just using 16 bit floats. Bad idea?

Share this post


Link to post
Share on other sites
rubicondev    296
Sure. I divide my incoming shorts by 4095 and stick them into a float to pass up to the PS. That gives me a max of 16 whole integers for texture tiling, plust 12 bits of accuracy amongst the decimal.

In practical useage I've never once had any issue at all from artists about the textures going awry at edges etc. 4095 fractions still gives you a granularity of 0.000244140625 and if an artist tells you he needs more than that, he's just being lazy or argumentative! :)

That leaves the 16 tileable repeats issue and this one I have had cause problems before tbh. Once I've explained the problem though, the next rev of art has usually solved it. Afer all, >16 repeats probably means you should have more details or more geometry in your level!

16 bit float will let you be a bit more cavalier with the ranges, but it won't get you any more accuracy, so if you have a texture that repeats 1000 times, then the fractional component will be less as the float gets scaled accordingly. You also lose some accuracy because the float16 still needs bits to express the floatiness iyswim.

Having said that, you still end up with more choices and I would've used them myself except older cards might not have them, and I try to pick formats that are as backwards compatible as possible. I honestly don't know how far back in time you could go before finding a card that can't take that vertex decl, but UBYTE4 and SHORT2 should work on anything.

Hell, if you're happy with only one set of UV, you could still fit a pair of proper floats in there. Most effects that need a second set of UV can often generate them in the VS, but it does limit you with your normal-mapping having to fit exactly over your real textures etc. Mostly that's ok, but not always.

In any case, fitting what you're trying to do into 32 bytes is still something you should pay any price for. Maybe have two format variants, one that has a pair of floats and one that has two UV but using smaller values.

Hope that helps...

Share this post


Link to post
Share on other sites
B_old    689
Thanks for the explanation! Currently I don't see the need for a second pair of UV-coordinates, so I think I take your advice and stick to 2x32 float for now.
Just for my future reference. When we talk about 16 bit float, do you mean a short divided by 4095 like you say in your first sentence or a real floating point format? (Without the tiling!)

I still have some room for compressing my normals/tangent space though, as you have seen in the other thread.

Share this post


Link to post
Share on other sites
rubicondev    296
16 bit floats as in proper floats but smaller. Don't go there just in case - they don't really gain you anything in practical terms. Just to clarify though, you can still have whole numbers: 17.5 is a valid number to put into a float16 and would get you 17 repeats and a bit more, it's just that the accuracy of the "bit more" would be really shit.

What I'm doing with this /4095 thing is usually referred to as "fixed point" (in this case "4.12 fixed point") which in all other circumstances is a nasty throwback to the dark days of cpu's without decent floating point support :)

I'm not actually sure atm why I divide by 4095 instead of 4096 but there was a reason for it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this