Sign in to follow this  
Erik Rufelt

OpenGL Getting Max Tri/sec rate

Recommended Posts

Hi, I've been experimenting with different ways of drawing triangles, and no matter what I try I seem to be getting very low tris/sec rates. What's the optimal way to achieve as high geometry rate as possible? I've tried both OpenGL and Direct3D 9.0, with about the same results. I have an ATI x800XT 256mb graphics card, and get best results drawing using vertex buffers in D3D and vertex buffer objects in OpenGL. D3D gets max ~90 MTris/sec, and OpenGL ~70MTris/sec. I've tried larger/smaller vertex buffers and found that this is as high as it gets. I draw without textures or anything, it's the same if i cull away all tris. ATI's website says the card should be able to draw like 500MTris/sec. I realize this is impossible to reach =) but i'm not even getting 1/5 of this. I've found sites where other people have posted much higher rates, but I have not managed to get them myself. An example app that came with the DX9 SDK reported about 160MTris/sec while using optimized meshes. I've tried all different settings i could think of, and the data is stored on the card, not in system memory. If i keep them in system mem i get like 10MTris/sec =) Are there better ways to draw triangles, or is this something wrong with my comp? Thx, /Erik

Share this post


Link to post
Share on other sites
Hello..

There are alot of factors to be considered.
nVidia has a nice pdf with step-by-step tp get the highest tris/sec out of your card. ATI should have something similar.

Some pointers:
* Use as few batches as you can (and still try to use shorts for indices ;]).
* Remeber to store the indices on the graphics-card as well...
* Some vertex formats are more card-friendly than others. For example one 32 bit int for the colors is better than 4 floats...

And then obvious stuff like vertex and fragment programs.. textures..
Try to just use vertices and nothing else (less data == more speed.. usually).

Good luck!

Share this post


Link to post
Share on other sites
Quote:
Original post by Erik Rufelt

ATI's website says the card should be able to draw like 500MTris/sec. I realize this is impossible to reach =) but i'm not even getting 1/5 of this. I've found sites where other people have posted much higher rates, but I have not managed to get them myself.
An example app that came with the DX9 SDK reported about 160MTris/sec while using optimized meshes.


Congratulations, you've been bitten by the PC hardware marketing BS bug. Your card would have no problem writing 500M triangles/second... if your machine had enough bandwidth to send triangles to the GPU that fast. The PC hardware scene is full of meaningless specs that can never be acheived because there is a limiting bottleneck somewhere else in the system. You could probably get a little higher than the 90M triangles/second as evidenced by the program that comes the DX9 SDK you mentioned, but give up any hope of ever reaching 500M, it's just not gonna happen unless you have a PCI express card.

Share this post


Link to post
Share on other sites
Hi,

Thanks for the reply
I use just one batch, tried drawing it once per frame as well as 10+ times per frame, with same result.
I have also tried more batches.
I don't have any colors or anything, just the vertices, D3DFVF_XYZ.. adding D3DFVF_DIFFUSE and a color (32 bit int) made the rate drop with 40%.. which seems to imply that the problem is memory..
Everything is stored on the card..
I've tried using 16bit indices as well, but it didn't result in any change..
I get 125 MTris/sec now with
device: D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, D3DCREATE_HARDWARE_VERTEXPROCESSING
vbuf: D3DFVF_XYZ, D3DUSAGEWRITEONLY, D3DPOOL_DEFAULT
ibuf: D3DFMT_INDEX32, D3DUSAGE_WRITEONLY, D3DPOOL_DEFAULT
SetFVF: D3DFVF_XYZ
SetIndices
SetStreamSource(0, buf, 0, sizeof(D3DXVECTOR3))
and DrawIndexedPrimitive(D3DPT_TRIANGLELIST..);
Is it possible to tell d3d how to store the vertices on the card?
I was thinking perhaps it's somehow defaulted to double precision or something..

Maybe someone could try my program and see what tri rate u get?
So i'm not trying to fix something in my program when the problem is really my comp =)
http://gys.mine.nu:180/~rufelt/D3DGeometryRateTest.exe
Link


cwhite: that shouldnt be a problem.. everything is stored on the card so hardly anything is sent to it per-frame..


thx,
/Erik

Share this post


Link to post
Share on other sites
I get 78M tris per second on my Radeon 9800 pro with 256 megs or ram.

Try drawing in wireframe mode, it might cut the fill rate.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
It's not been mentioned but I think you'll also discover that the triangles they claim they render in huge volumes are *single pixel*, i.e. not a true triangle, just one pixel (technically a polygon, just a very small one!). Try that and I bet your throughput is higher still.

Cunning buggers aren't they?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I've you try to optimize your vertex buffer?
NVidia has made a tool called NvStrip, which reorganize data in a perfect strip, with vertex order optimized for memory paging and access data optimized for cache misses.
At least, a primitive can be faster tha another, depending on the type of mesh.
A very smooth mesh is faster to draw than a sharped mesh, because more vertices are shared by triangles. If you have a triangle to draw, you have more chances that the vertices in it are allready transformed in the cache. So the perfect case is a very smooth mesh (like a sphere) reorganized by NvStrip.
The only way to reach the maximum number of triangles is to have a perfect geometry with a perfect strip, but in practice it never happen because it's too restricting for graphists.
I don't think PCI Express will increase the polycount if the datas are allready on the video memory.
I've seen one time more than 13M triangles per second on a G-Force I, and the maximum limit vas 14M. This was a perfect strip case...

Ronan

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yes, I suspect that the so-called "triangle transformation rate" is actually the vertex transform rate. If you use a triangle strip/fan then the two are almost exactly equivalent because if you don't change render state a vertex is guaranteed to transform identically, and of course triangle sounds more impressive than vertex.

I also have no idea how clipping works in hardware, so it's possible that you'll only get full speed when the clipping cases are trivial (i.e. completely on screen or completely behind a single clipping plane). I don't think that would be much of a dent though because in any normal scene almost all triangles will fall into this category.

Share this post


Link to post
Share on other sites
Hi,

Thanks for all the replies!
I will try to optimize the mesh more and use tri strips, I guess that's the only way.
I found this though
http://www.delphi3d.net/forums/viewtopic.php?t=364&view=previous
Link
Someone has posted he gets 200MTris/sec with x800pro with the demo in that thread .. but when I download it I only get 80.. guess it must be my system :(

/Erik

Share this post


Link to post
Share on other sites
I get 118MTris/sec with the Delphi3D.net app, and 30MTris/sec with your app. This is a pretty big difference [looksaround].

Additionally, my card is retarded, so take my stats with a pinch of salt :)
Example framerates:
Counter-Strike1.6 in 640x480x32 on de_aztec: 30fps
DesertCombat in 1280x1024x32 all details on high, 70fps


Hardware:
Radeon 9600 Pro 256MB VRAM (Catalyst 3.5 Drivers)
MSI NEO-FIS2R Mainboard
Intel P4 non-HT 2.40b Ghz
512MB DDR300 RAM

Share this post


Link to post
Share on other sites
Quote:
ATI's website says the card should be able to draw like 500MTris/sec. I realize this is impossible to reach =)

u should be able to get close, last time i tested ive gotten exactly what the card can do (gf2mx 20M/tri sec actual rendered) havent tested it fully (cause once it went over 100M/tri/sec it sort of became pointless) with my gffx5900 but im 99% sure if the specs saiz it can do 345M/tris a sec it is possible to do 345M/sec (benchmarked) on this card
for opengl use glCullFace( GL_FRONT_AND_BACK ) though u should be able to get a similar framerate with actually drawing them.
theres quite a few pdfs on the web
eg heres a d3d centered one http://www.ati.com/developer/gdc/D3DTutorial3_Pipeline_Performance.pdf
on how to achieve better performance have a read/ check out other benchmarking apps to see what they achieve

Share this post


Link to post
Share on other sites
Hi,

I reinstalled WinXP and suddenly the demo from DelphiGL got 230 mts, while my D3D9 app went from 120 to 80 mts, and my own GL test app from 75 to 60 mts. This was all very strange.
AGP didnt work cause I hadnt installed the AGP motherboard drivers. After I installed that my D3D9 app went up to 110 mts, my GL app back up to 75, and the DelphiGL app dropped down to 80, as it was with my old system.
Removing the AGP and reinstalling ATI drivers got it back to what it was when I had just installed WinXP.
Has anyone experienced anything similiar to this?
Or have any suggestions on how to fix it?
That my own apps got lower mts without AGP seems to suggest that perhaps the indices are not stored on the card (even though I create buffers on the card for them), however I really can't understand how the demo app could possible drop from 230 to 80 mts.. from enabling AGP..

very messed up :(

[Edited by - Erik Rufelt on February 10, 2005 10:00:49 AM]

Share this post


Link to post
Share on other sites
The problem is AGP
with AGP disabled the Delphi3D demo gets 230 MTris/sec
AGP 8x = 80
AGP 4x = 40

Seems like it stops storing indices on the graphics card or something like that, when AGP is enabled..

Share this post


Link to post
Share on other sites
the max transform speed given by video card manufacturers is just a plain lie.
The same as the ps2 can not render 25 mtri/s.
There seems to be a arketting inflation. I suppose the guys from the marketting division come to see the devs and ask : under ideal condidtions, supposing a ridiculous frame buffer, and using an ib that indexese the same few small triangles over and over again, using the largest batch size possible, without shaders, without even displaying the triangles, how fast can we go?
our competitor said 100M, can we do more?
btw, is there a possibility to have that throughput test app (D3DGeometryrate test in full screen, since the vsyn is usually forced on in windowed mode?)

Share this post


Link to post
Share on other sites
I've tested it in fullscreen, with the same results.
I cant upload a new version since i accidently deleted the code hahaha =)
Anyways, the problem is not getting the 500 MTris/sec, I don't really care how many tris/sec the card can handle, I just want it to handle all the tris it can. Something it seems very reluctant to do when AGP is enabled.
The same app, which code I have searched for any kind of checks for anything system related (there is nothing like that) runs at 230 mts with AGP OFF, and 80 mts with AGP 8x. (40 mts with AGP 4x).
It seems like for some reason the card uses system mem when AGP is enabled.. has anyone else ever experienced something like this??

I e-mailed ATI and got an autoresponse with some BIOS settings one could try with problems matching my key-words (?) so I think I'll try that and hope..

Share this post


Link to post
Share on other sites
I think you are mistaken. The difference could probably be explained like this:
- when AGP is OFF, the geometry data is put in video memory directly (no bus transfer needed, fastest memory pool available).
- when AGP is ON, the geometry data is put in AGP memory, which is slower than video memory (but much faster than system memory), since each time you'll render this data it has to be transfered from AGP to video memory temporarily.

Although in theory it's better to put data in video memory, don't forget that the amount of video memory is much lower than the amount of AGP memory, and that textures/color buffers have to reside in video memory. In a real application, AGP is probably the only good solution.

Y.

Share this post


Link to post
Share on other sites
The geometry is not put in system memory, since that yields an geometry rate of only 10Mtri/sec. No more can be sent over 8x AGP, it's that much slower.
However the indices might be put in system memory..
They should not, definately.
A friend of mine has run the same apps as me, and he doesn't have this problem, he has a much older card and he gets higher geometry rate with AGP enabled than me, but when I disable AGP mine is higher with geometry stored on the card (as it should)
Somehow my system does something that makes the whole process slower (like move things to system memory) when I enable AGP.
I have tried setting chaning bios settings for AGP mem and some other things, with no result.
I have 256 MB memory on the card, I draw maybe 1 million tris + colors.. + 1 million indices .. = 20 bytes / vertex = 20 mb, 1/10 of what's available
no textures or anything
I've checked the free mem with some system scan apps, it says about 220 mb

Share this post


Link to post
Share on other sites
Done some more testing, seems like the problem only appears in OpenGL using VBOs .. still very strange, doesn't happen on other peoples systems.. at least no one I have asked..
thx for all the replies

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Marketing numbers made simple!

Step 1: Create a back facing triangle (3 vertices)
Step 2: Turn on back face culling
Step 3: Create the biggest index array that you can (65000 indices?) using
the smallest index size (i.e. shorts maybe even bytes, although you'll
have to play around to see what is fastest on your hardware).
Step 4: Populate the index array with (0, 1, 2) repeated until the index array
is full.
Step 5: Draw your entire index array (which comes out to thousands of
triangles).

Doing it this way, you get the fastest possible triangle rate the card is capable of. All vertices will always hit on the post transform cache.

Is this a bogus test? Yes and no. The difference between a 10% post transform cache hit rate and a 50% post transform cache hit rate can be *huge* on
performance, especially if the vertex program is long (these days, most fixed
function is just a vertex program inside the driver).

Will you get this in practice? No way....

This is just a portion of your back of the envelope performance figure...
(transfer rate and cache hit rate being the other ones, at which point it
may not matter if you are fill limited).


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Forum Statistics

    • Total Topics
      627646
    • Total Posts
      2978384
  • Similar Content

    • By xhcao
      Before using void glBindImageTexture(    GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format), does need to make sure that texture is completeness. 
    • By cebugdev
      hi guys, 
      are there any books, link online or any other resources that discusses on how to build special effects such as magic, lightning, etc. in OpenGL? i mean, yeah most of them are using particles but im looking for resources specifically on how to manipulate the particles to look like an effect that can be use for games,. i did fire particle before, and I want to learn how to do the other 'magic' as well.
      Like are there one book or link(cant find in google) that atleast featured how to make different particle effects in OpenGL (or DirectX)? If there is no one stop shop for it, maybe ill just look for some tips on how to make a particle engine that is flexible enough to enable me to design different effects/magic 
      let me know if you guys have recommendations.
      Thank you in advance!
    • By dud3
      How do we rotate the camera around x axis 360 degrees, without having the strange effect as in my video below? 
      Mine behaves exactly the same way spherical coordinates would, I'm using euler angles.
      Tried googling, but couldn't find a proper answer, guessing I don't know what exactly to google for, googled 'rotate 360 around x axis', got no proper answers.
       
      References:
      Code: https://pastebin.com/Hcshj3FQ
      The video shows the difference between blender and my rotation:
       
    • By Defend
      I've had a Google around for this but haven't yet found some solid advice. There is a lot of "it depends", but I'm not sure on what.
      My question is what's a good rule of thumb to follow when it comes to creating/using VBOs & VAOs? As in, when should I use multiple or when should I not? My understanding so far is that if I need a new VBO, then I need a new VAO. So when it comes to rendering multiple objects I can either:
      * make lots of VAO/VBO pairs and flip through them to render different objects, or
      * make one big VBO and jump around its memory to render different objects. 
      I also understand that if I need to render objects with different vertex attributes, then a new VAO is necessary in this case.
      If that "it depends" really is quite variable, what's best for a beginner with OpenGL, assuming that better approaches can be learnt later with better understanding?
       
    • By test opty
      Hello all,
       
      On my Windows 7 x64 machine I wrote the code below on VS 2017 and ran it.
      #include <glad/glad.h>  #include <GLFW/glfw3.h> #include <std_lib_facilities_4.h> using namespace std; void framebuffer_size_callback(GLFWwindow* window , int width, int height) {     glViewport(0, 0, width, height); } //****************************** void processInput(GLFWwindow* window) {     if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)         glfwSetWindowShouldClose(window, true); } //********************************* int main() {     glfwInit();     glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);     glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);     glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);     //glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);     GLFWwindow* window = glfwCreateWindow(800, 600, "LearnOpenGL", nullptr, nullptr);     if (window == nullptr)     {         cout << "Failed to create GLFW window" << endl;         glfwTerminate();         return -1;     }     glfwMakeContextCurrent(window);     if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))     {         cout << "Failed to initialize GLAD" << endl;         return -1;     }     glViewport(0, 0, 600, 480);     glfwSetFramebufferSizeCallback(window, framebuffer_size_callback);     glClearColor(0.2f, 0.3f, 0.3f, 1.0f);     glClear(GL_COLOR_BUFFER_BIT);     while (!glfwWindowShouldClose(window))     {         processInput(window);         glfwSwapBuffers(window);         glfwPollEvents();     }     glfwTerminate();     return 0; }  
      The result should be a fixed dark green-blueish color as the end of here. But the color of my window turns from black to green-blueish repeatedly in high speed! I thought it might be a problem with my Graphics card driver but I've updated it and it's: NVIDIA GeForce GTX 750 Ti.
      What is the problem and how to solve it please?
  • Popular Now