• Advertisement
Sign in to follow this  

Lack Of Performance: What might i by doing wrong?

This topic is 4514 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all, Whenever i write a DirectX rendering engine i never seem to get the performance that is expected, in fact it seems i never even get close. My engine for now simply renders quads and nothing else. Here is a basic overview of how my static scene is built and rendered. 1) All quad descriptions are parsed from a file into a quad manager. 2) A Scene class runs through all of the quads in the quad manager and then does three things. a) The pointers in the quad manager are sorted by material id. This is a string. b) A vector of structs is filled with information about batches. Batched by texture. This minimises DIP calls to only the number of textures. c) 1 index buffer and 1 vertex buffer are built. 3) In the main loop the Scene class sets the VB and IB to the device and runs through the batches. Now in testing i had 1000 quads described in the file. The engine builds this into an IB with 6000 indices and a VB with 4000 vertices. With this load i get about 60fps. Isn't this awefully slow? Since i am building a static scene here the above is done once and all that is done per-frame is one DIP call (since they only use 1 material for now) for 2000 triangles. I worked out that the throughput of my renderer here is ~120000 triangles per second. I should be pushing through 30,000,000 according to some recent engine specs i found. So if you can think of any obvious reasons i might be getting hopeless throughput i'd be glad to here them. Thanks, ace

Share this post


Link to post
Share on other sites
Advertisement
Hmm if I understand right, you re-build your buffers every frame ? That's what takes time. Try with static buffers, you'll probably have a lot more FPS ^^

Also, the structs you use in the vector : try to make your struct 16 bits aligned, and use an optimized version of vector (I assume you're using the std one. Try making your own one taking advantage of the struct alignment) it might run a little faster.

Share this post


Link to post
Share on other sites
Actually i'm not rebuilding the buffer each frame, sorry if it read that way. It seems optimising std::vector is of little importance now. There must be a bigger bottleneck somewhere.

ace

Share this post


Link to post
Share on other sites
Your engine will stick to 60fps if that's your monitor's refresh rate and VSync is on. So, first make sure your VSync is off.

Also, what primitives are you rendering? Triangle lists? Strips? Are you sending many of 'm together in one batch, or are you drawing one quad per primitive call?
(You can combine primitives using degenerate triangles).

Share this post


Link to post
Share on other sites
Two things off the top of my head: You don't have vertical sync enabled, and also if you double the triangle count - what happens to the framerate?

Share this post


Link to post
Share on other sites
I'm rendering them as a triangle list, 2000 triangles in a batch and presentation interval is set to immediate.

ace

Share this post


Link to post
Share on other sites
Quote:
Original post by Source
Two things off the top of my head: You don't have vertical sync enabled, and also if you double the triangle count - what happens to the framerate?


I can't answer that right now for sure because i'm not in front of the work, but off the top of my head the framerate halves, i think.

ace

Share this post


Link to post
Share on other sites
Not to imply that you don't know what you're doing, since I got a feeling that you know your way just fine, I'd recheck the VSync thing. Run the program with PIX and see that you really call the SetRS with the correct parameter, and that you don't call it again with 'default' or something else that would enable vsync again. That 60FPS number is way too "refresh rate"-y to imply that your VSync is on. From the description of your scene I think you should be closer to 400FPS.

Share this post


Link to post
Share on other sites
With regards to the VSync thing... it's worth baring in mind that most Nvidia/ATI driver control panels that I've seen allow for a forced override. I know that the 9800 pro I've got now can go from "Always Off" -> "Application Preference" -> "Always On". If I switch to the stock "Quality" profile then it sets the VSYNC to "Always On".

Might be worth checking that your driver isn't ignoring your application [wink]

hth
Jack

Share this post


Link to post
Share on other sites
Thanks for the help so far,

I can reassure you that this isn't a vsync problem. It does run at like 600FPS when i have less quads. I use FRAPS to determine this. It most certainly isn't snapping to the illustrious 60fps [smile].

ace

Share this post


Link to post
Share on other sites
Any more ideas? The only reason i am nagging is because i don't have the internet at home and i only have a small window at uni.

thanks,

ace

Share this post


Link to post
Share on other sites
Do you have a specific point at which the frame rate drops significantly? You said less quads run faster. Maybe it's something specific you do after 572 quads or something like that. I'd try looking for something more specific regarding the quads amount; maybe run 100 frames for every value of quads and collect some statistics.

Share this post


Link to post
Share on other sites
Maybe you are using the debug version of directx.
Or maybe you are using REF and not HAL.
Some other application like an anti-virus software could slow down your application or use so much memory that your program have to use virtual memory.

Share this post


Link to post
Share on other sites
Ace, you should really explain in more detail how you're doing things. Because I don't really understand it ^^ You say 1 vertex and 1 index buffer, but you don't rebuild it each frame ? If you don't rebuild it each frame, what's the use of the vector of structs ? (and what do you store in those structs ?) ?

Maybe with more details we will be able to point out something ^^

Share this post


Link to post
Share on other sites
Well i have a primitivebase class, that has primtivequad inherit from it. The base class has a vector for the vertices and indices for the derrived primtives. The vertices and indices from these primitives (which incidentally will be different, for panels, cubes and other primitives) will be filled by the derrived primitive. So when the actual vertex buffers and index buffers are built, the vertices and indices are being taken from each primitive.

The vertex buffer and index buffer are only built once.

Understand? [smile]

ace

Share this post


Link to post
Share on other sites
so, the only thing you do in the main loop (every frame) is
- setting the VB and IB as sources
- doing one DIP call for about 4000 tris
and get only 60 fps?
What hardware are you running this on?

Share this post


Link to post
Share on other sites
A bit of a long-shot, but try running your application against the debug runtimes (maximum output) and see if it's complaining about something. Maybe it's emitting a warning saying that a particular part of your code isn't optimal usage...?

Typical examples of this (even though you indicate you're not changing buffers) is when the debug runtimes churn out 1000's of lines about resources being locked in the "wrong" pool and the "performance penalty could be severe"...

hth
Jack

Share this post


Link to post
Share on other sites
Not sure ^^ But that's probably due to my english skills (not so good ^^)
Well, let's forget about how it works in the detail. The last thing I can suggest is to check the parameters you used to create your buffers : using the wrong parameters can really influence performances. If you're creating the buffers only once (at loading time for instance) then create static buffers (usage : 0 and pool : managed if I remember well)

Then, you can also set the DirectX runtime to debug, and check the output in Visual. For example, it will tell you if your FVF is wrong for a given vertex buffer, or when you create a buffer with certain flags, it will warn you about performances issues. So set the debug runtime, run your app a few seconds, and check the debug output of Visual for any stuff like "Direct3D9: (WARN) : ...." or "Direct3D9: (ERROR) : ...."

I don't see anything else ... if you really have 1 buffer built once, no lock at runtime, I don't see why it runs so slow :/

Oh, and the thing I said about std::vector ... be carefull though. It can easilly become the bottleneck in an engine. I don't remember where I read something about that, but since then, I completely stopped using them ^^

Share this post


Link to post
Share on other sites
Yes the only thing i am doing is setting the sources and calling the DIP. I don't recall any such debug messages from DirectX. The debug output was one of the first things i checked. I do get 2 or 3 redundent renderstate changes per frame, nothing to be concerned with.

I am running this on an Athlon64 3800+ 1GB ram and Radeon 9700pro, so no real chokers there.

Anything else?

Thanks BTW! [smile]

I apologise for not being able to post code but i am writing an engine and we all know how large the code gets.

ace

EDIT: new post - anything that involved std::vector was done before the main loop. I'm just thinking i might load up the simple cylinder sdk example and large up the dIP calls etc and see how teh FPS drops.

Share this post


Link to post
Share on other sites
A few things I'd like to add:

- If you have any doubt of a CPU bottleneck, use a profiler to make sure nothing is taking too long. This should be easy. (you just don't sound real sure theres not CPU problem).

- Try increasing and decreasing the VS and PS (or pipelines, if you're using the FFP) demand. Add more or less texture lookups, and remove or add lighting, lights and such for vertex pipeline. This will help determine which part of the pipeline is under stress, and limiting your performance.

- Try making the triangles smaller (zoom out with the camera). This could make a HUGE difference, especially if you are currently drawing many pixels. Try limiting overdraw somehow, maybe even sorting front to back if you have the time to implement it. If zooming out increases your performance significantly, you're simply maxing out the pixel pipeline.

Hope this helps :).

[Edited by - sirob on October 11, 2005 12:19:30 PM]

Share this post


Link to post
Share on other sites
Ace, if you really want, I would be more than happy to give you some nice timing code. That way you can keep track of your exact fps, and a few other nice details. Just PM me.

Share this post


Link to post
Share on other sites
How are you rebuilding your vertexbuffer/indexbuffer, are you using the correct flags, dynamic|discard|readonly whatever...

It might be better to sort by texture then material... though maybe not.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement