Archived

This topic is now archived and is closed to further replies.

Problem: Texture kill my FPS... it's bad

This topic is 5382 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi =) I thought that many many triangles were the cause of a low FPS.. and another little cause were large textures. instead I''ve a low performance with small texture... I don''t use a particular filter (only linear filter).... perhaps I must use more filters? My textures are 100X100 bitmap... they aren''t big.. Can you give me an advise? PS. I render meshes grouped for type (i.e first all trees, then all house, then all rock... ecc) thanks!!!

Share this post


Link to post
Share on other sites
First off you should make your texturesize power of 2 (ie. 64x64, 128x128, 256x256 etc). At least that would make the memory management more efficenly, though I doubt it is the casue of your performance-hit.

What kind of hardware are you running, and how many object do you actually render? And is it smooth without textures?

MenTaL

Share this post


Link to post
Share on other sites
A few more things:

1) Group your rendering by texture so that you avoid calling SetTexture more than once for the same texture if at all possible.

2) Use mip mapping, this helps a lot if the texture has to be minified.

3) Don''t render too few triangles per Draw* call - if each Draw*Primitive renders less than say 100 polygons, it''ll hurt your performance.

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Can you elaborate why calling draw* with less then 100 poly's would hurt performance? And how bad of a hit is it using vertexbuffers ( since you only draw 2 triangles per quad) for drawing UI's for example ?

[edited by - Oordeel on March 24, 2003 10:55:41 AM]

Share this post


Link to post
Share on other sites
quote:
Can you elaborate why calling draw* with less then 100 poly's would hurt performance? And how bad of a it is it using vertexbuffers ( since you only draw 2 triangles per quad) for drawing UI's for example ?


1) There is a CPU cost in calling the API, internal processing in the D3D runtime, device driver interaction etc. If you're passing less than 100 polys per call, then the CPU "setup" costs can be much higher than the time it takes for the hardware to actually render those polygons.

2) Each draw call gets turned into a "command" ("draw these, using this"), these go into a list, usually FIFO. The graphics chip sucks these commands out of the FIFO for processing. The size and nature of the FIFO is dependent on the driver and chip - but they are often fixed in length. Once the FIFO is full, no new commands can be added until the first in the list has been processed - i.e. read by the chip.

3) The combination of the above two things also means that as well as the CPU stalling until there is room in the FIFO (and effectively killing parallelism - which is key to good performance!), the GPU will be starved of data so rendering much less than it could be per frame.

4) For some things, small batches may be unavoidable. However, say for your UI, many of the things can still be bathed together into a single draw call - for example all characters using the same font texture could be rendered with one call. Using and recycling one dynamic VB for all UI elements of the same vertex format is good too.


The above is assuming of course that you're drawing more than say 100 polys per frame in total - you only see the difference when you're shifting decent amounts of polys per frame. i.e. 100,000 polygons rendered in 20 5,000 poly batches should give you better performance than the same 100,000 polygons rendered in 1,000 100 polygon batches.

IIRC there's an old Excel spreadsheet on the nVidia website (developer.nvidia.com) from the GeForce2 days where they've profiled a)batch sizes, b)vertex formats and plotted the performance effects.


--
Simon O'Connor
Creative Asylum Ltd
www.creative-asylum.com

[edited by - S1CA on March 24, 2003 11:51:46 AM]

Share this post


Link to post
Share on other sites
oordeel, I am in the same boat as you. I have a rather sophistocated 2D engine I have written, but it does not use batching (drawing multiple quads at once).

One of the main reason I have not used batching in my engine is because many of the games I use have quads that need to be sorted and drawn without the ZBuffer (they have alpha and color keys). Since they are sorted by position rather than by texture, I have to constantly reset the texture. This kills batching, because you have to stop, reset the texture, then draw the next set of sprites, and so on. For 2D, not batching is still somewhat acceptable, however, I anticipate a lot of people''s opinions being in cotradiction with mine.

That said, if you can sort by texture to limit the number of times you have to interrupt your drawPrimitive function, your performance will increase greatly. I believe the optimum is between 1,000 and 2,000 traingles on newer cards.

Bluechip, let me see if I can give you suggestions, or at least things to think about:

The size of the texture can often be as important as how large the texture is on the screen. If I have 100 sprites in the distance with small textures, the framerate will drop significantly when I move closer (or make the sprites larger). The reason is because if you are not using a ZBuffer (where the program checks to see if a pixel should be drawn or not depending on what has been drawn so far), then pixels will be drawn and redrawn for each sprite. So even though you are using 100X100 textures, if they are being redrawn on the entire screen, your graphics card will choke.

Also, how many sprites are you drawing? Try playing with different zooms of the sprites and numbers. Also try textured and untextured. Perhaps your graphics card isn''t that great, and it''s simply to blame. There are so many problems that can be caused from both how you code and your computer''s hardware. Hope this all helps,

--Vic--

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
If you check the latested GDC 2003 papers at nVidia''s developer site, you see one called Batching something.

Basically, they''ve really really really analysed the performance hit of drawing only a few polys at a time.

The results vary depending on CPU but a loss of 100-300 times in performance is expected when using very small numbers of polygons.

They then go on to say, you should aim for approx. 300 DrawIndexedPrimitive() calls per frame so as to not waste all your CPU on initializing draw calls.

They also mention that 25K DrawIndexedPrimitive() calls per second will completely saturate a 1GHz CPU, such that it''s doing nothing else except tightly loop and issue batches to draw... no animation, no AI, just setup of draw calls. If you assume a frame rate of 60Hz, 25000/60 = 416... so if you have no AI or gameplay at all, and a 1 GHz CPU, expect 416 to be your maximum number of draw calls to maintain your framerate.


The number one thing you can do for your game is to go to nVidia''s site and start reading whitepapers, so you''ll know your technical limitiations.

Share this post


Link to post
Share on other sites
Hi.... thanks very much for these answers
here there are more information...

******* bergfald ***********
in my 3d enviroment I use:

100 trees - 230 triangle - 2 texture BMP [64X64,100X100]
25 terrain square - 2 trinagle - 1 texture JPG [756X512]
1 house - 450 triangle - 5 texture bmp [all about 256X256]
20 wall - 12 triangle - 1 texture JPG [756X512]
1 skydome - 64 triangle - 1 texture JPG [256X256]
1 carot - 6000 triangle - 2 texture bmp [all 100X100]

I don''t thing that there are too many triangle [about 29804]...
is it right?

I''ve a Celly2 566@892, 256 MB of RAM, and a Geffo DDR, and I get from 20 to 30 FPS.

***************** S1CA **************************
I use X file.... so I put all the meshes equal in a list, e for each element in the list I call the render method.
then I take next list and do it again....
It''s a silly work?
There is a better than this?


**************** Roof Top Pew Wee ********************
Now I don''t use sprites ( you mean a 2d object, right?) and I always use a Z-buffer (the only exception is the skydome).



Thanks again folks ^___^

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hmm.. some of your textures are quite big and not power of 2. Especially [756X512] is very bad! I think any API has to create a texture of the size [1024x1024] to store that.. and I don''t think that''s good for your texture cache. Use the nVidia Stats driver with the capture tool (in the download section of the registered member area) to find out more about your problems. Have a look at the docs there too figure out how to detect CPU-GPU stalls and critical function calls in your code.

Good luck,
narbo

Share this post


Link to post
Share on other sites
Interesting thread. Let me ask you this. Hypothetically, say you have 60,000 vertexes and you''re hardware caps show that your maximum count is 65535 (GF2). Would it be better to send them all in one DrawIndexedPrimitive call, or would it be better to make multiple calls using 2000-3000 per call. For the sake of argument, assume that they are all being rendered using the same texture.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Check the nVidia paper I mentioned. They have a graph for each card with the two axes being Polys Per Batch and Max Poly/Frame.

http://developer.nvidia.com/view.asp?IO=presentations then select Batch, Batch, Batch, what does it really mean?

The short answer is that you won''t see any noticable increase from 3k to 60K batches, except possibly on a GeForceFX card (can''t say, the graph doesn''t go that far).


Basically, the graphs for GeForce2, 3, and 4 pretty much level out at 700 polys. The GeForce FX continues to rise (at incredible pace!)

A GeForce2 can see a 40x increase by moving from 10 poly to 130 poly batches.

A GeForce3 can see a 60x increase by moving from 10 poly to 200 poly batches

A GeForce 4 can see a 90x increase by moving from 10 to 700 polys

A GeForce FX can see 1000x increase by moving from 10 to 700.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
switiching textures is killlllller, make ur world all ahve 1 huge texture and just uvw unwrap it, thats what i did and i noticed a GIIIANT boost

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Render front to back. First render the objects most likey to be seen, houses, then render your rocks and carots, then render your ground.

Share this post


Link to post
Share on other sites
@bpopp

sure.. because every Draw..Primitive call has an overhead - that''s what BlueChip describes. If you use a Draw..Primitive call for too few vertices (depends on the gfxcard) then you stall your CPU because setup takes longer than drawing the batch - and that''s what you should avoid.

Share this post


Link to post
Share on other sites
Makes sense. I was just double-checking because I've heard in multiple threads that the "ideal" vertex count for various cards was between 2000 and 4000. I guess they what they really meant was that those were the ideal minimums . Thanks for the reply.

Anonymous (or someone) can you expand on what you mean by using one big texture and uvw unwrapping it?

[edited by - bpopp on March 24, 2003 6:31:08 PM]

Share this post


Link to post
Share on other sites
What the AP means by one big texture is this:

Most GFX hardware is currently optmized for 256x256 textures. if you have four 128x128 textures, or 2 256x128 textures, put them in one texture and change your UV coords to point to the appropriate place in the map.

Also, nVidia points out that if you''re using the textures anyway, ignore the 256x256 rule. If you have 4 256x256 textures, put them on a single 512x512 map. More maps? Use a bigger texture. Use as few textures as you can.

This assumes you don''t use texture wrapping, ie: Texcoords greater than 1 or less than 0.

UV unwrap tools do this. Not being an artist, I''ve only heard the term thrown about, but have no idea what the tools actually do (Getting our artists to explain things to programmers isn''t good. They''re afraid of what we''ll do with knowledge, instead of embracing a better mutual understanding... oh well)

This is one thing you could do, assuming you''re fine with slightly smaller textures. Lets say your 256x256 textures look just as nice at 192x192, and never wrap more than 64 pixels in any given polygon. Now lets pretend each block of ascii art represents 64 pixels of art.

Lets make 2 192x192 textures in ascii art:

123 ABC
456 and DEF
789 GHI

Now let''s put them into one texture

123ABC
456DEF
789GHI

Now lets assume we have a poly where we want it textured like

1231
45

We can''t do this with the newly joined textures, there''s no way to wrap from 123 back to 1 on a poly without going through ABC... but wait, we made them 192x192 for a reason

Reorganize our textures such that 192x192 becomes 256x256, with the two textures together as a 512x256 map.

1231ABCA
4564DEFD
7897GHIG
1231ABCA

Since we know our UVs never exceeded 64 pixels of wrapping, we can "wrap" into our extended texture area, then begin the next poly back near the top or left edge. We''re using a more optimal texture layout for hardware, and still support wrapping somewhat, though the artists need to be aware of a 64 pixel wrapping limitation.

I''m assuming UV unwrappers do something similar, automagically. Moving bits of texture around to make things not wrap, to allow packing of multiple textures into a single texture page. A proper UV unwrapper would modify your UV coords for you, not be limited to 64 pixels of wrap, and generally be invisible in your art pipeline, but the idea SHOULD be the same.


Share this post


Link to post
Share on other sites