Archived

This topic is now archived and is closed to further replies.

Lyve

Drawing many points in OpenGL

Recommended Posts

Lyve    116
Hello list, maybe you can help me: I want to render a big amount of points in opengl. Maybe you can help me to increase the frame rate. I need more than I have at the moment, if possible. At the moment, I''m using glDrawArrays to draw them. My data type for each point consists of three 32 bit floats and three additional 32 bit floats for the normal. Yes, I need the normal because the points have to be lit. The rendering consists of: glVertexPointer(...); glNormalPointer(...); glDrawArrays(...); Not very much, but: I need to draw MORE points if possible. I also made sure that no sensless renderstates are set. This is the setup of my render states: glDisable( GL_CULL_FACE ); glEnable( GL_LIGHTING ); // As mentioned, I need lighting glEnable( GL_DEPTH_TEST ); // Sorry, I need it as well glDisable( GL_BLEND ); glDisable( GL_NORMALIZE ); glDisable( GL_DITHER ); glDisable( GL_POINT_SMOOTH ); glDisable( GL_ALPHA_TEST ); Some other things I checked, without success: glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST ); glHint(GL_POLYGON_SMOOTH_HINT, GL_NICEST ); glHint(GL_POINT_SMOOTH_HINT, GL_NICEST ); My current speed: 50fps for 320.000 points with point size one on a AMD XP 1800+ with a Geforce 4 TI 42000. I also tried to put the points into a display list. But the speed is even slower: 29fps for the configuration mentioned above. Rendering with single calls to glVertex3fv, glNormal3fv etc. results in a frame rate of ~32. Any tips how to increase the rendering speed? Thanks in advance! Lyve _____________________________________ http://www.winmaze.de, a 3D shoot em up in OpenGL, nice graphics, multiplayer, chat rooms, a nice community, worth visiting!

Share this post


Link to post
Share on other sites
Sander    1332
Try removing all the G_HINT functions. On some of the newer drivers they can cause your video card to crap on itself. Don''t know why though. Seems that nobody know why.

The only other thing I can think of: Use Vertex Buffer Objects (VBO''s) to create a write-only buffer in video memory and render those. That''s about all I can think of with the little code you posted.

One question though: Why do you need so much points anyway? It look''s like massive overkill to me.

Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]


GSACP: GameDev Society Against Crap Posting
To join: Put these lines in your signature and don''t post crap!

Share this post


Link to post
Share on other sites
Lyve    116
>Try removing all the G_HINT functions.

Checked it, no speed increase.

>The only other thing I can think of: Use Vertex Buffer Objects >(VBO''s) to create a write-only buffer in video memory and >render those. That''s about all I can think of with the little >code you posted.

This is not possible for us because the positions of the points is changing frequently.
Anyway, thanks for the tip, I will check it with my co-workers, maybe we can come up with a solution. Sometjhing like storing points into video memory temporary while they don''t have to be touched.

>One question though: Why do you need so much points anyway? It >look''s like massive overkill to me.

No, it''s not overkill, I''m sorry. I''m working in a company that simulates milling operations in realtime. We have a workpiece as quad that consits of ton''s of very little points.
The number of points that I mentioned is even not very much. We have large objects of more than half a meter here that has to be simulated with a resolution with more than one needle per millimeter.
We need power, power, power. MORE power, if possible.


_____________________________________
http://www.winmaze.de, a 3D shoot em up in OpenGL, nice graphics, multiplayer, chat rooms, a nice community, worth visiting!

Share this post


Link to post
Share on other sites
Sander    1332
quote:

>The only other thing I can think of: Use Vertex Buffer Objects >(VBO''s) to create a write-only buffer in video memory and >render those. That''s about all I can think of with the little >code you posted.

This is not possible for us because the positions of the points is changing frequently.
Anyway, thanks for the tip, I will check it with my co-workers, maybe we can come up with a solution. Sometjhing like storing points into video memory temporary while they don''t have to be touched.

I still recommend them. If you create a write-only video buffer for your VBO, you will get the speed of rendering from video memory wile still being able to update the information inside it. That''s the nice thing about VBO''s. It is only slow to read from video memory. Writing is pretty fast :-)

By the way, are you sure that the rendering pass is the slowest part of your app? Have you profiled it? What did it say? How about updating all the positions? For math: try to eliminate any and all sqrt() sin() cos() and similar function. There are much faster alternatives for those, depending on how much accuracy you need. Inline everything. Use a lot of const''s and pass by reference. Avoid unneccecary (sp?) creation of variables. Use ASM wisely.


Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]


GSACP: GameDev Society Against Crap Posting
To join: Put these lines in your signature and don''t post crap!

Share this post


Link to post
Share on other sites
Lyve    116
Do you have a link to a sample where VBOs are used? The only information I get is directly from sgi, not very intuitive. The website from ATI doesn''t seem to carry their sample anylonger.

Share this post


Link to post
Share on other sites
Sander    1332
First thing google coughed up: Delphi3D. Grab the top most zip file. There was another article that specifically created the read-only VBO but I can''t remember which one right now. I''ll post it if I can find it.

Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]


GSACP: GameDev Society Against Crap Posting
To join: Put these lines in your signature and don''t post crap!

Share this post


Link to post
Share on other sites
Lyve    116
Hi Sander!

I''ve finished my implementation for VBOs, I''m impressed how easy it was. However, I don''t get any speed increase on my Geforce 3 here at home. I''ll check it tomorrow at my workplace with the GF4.

Thanks for all the help!

Lyve

Share this post


Link to post
Share on other sites
Sander    1332
Are you sure you created a write-only VBO? There are many ways to create VBO''s. Some are faster, some are not. If you created a VBO that you could read from, this would explain why you don''t see any increase in framerate. Maybe your card just doesn''t go any faster.

I checked the nvidia site. The geometry limit for the GeForce3 isn''t listed (my best guess: about 40 million vertices / second). The limit for the GeForce4 TI is about 113 million. Those numbers are for unlit vertices however. You render 16 million lit vertices per second. Looks like you have not reached the limit yet. Here are a couple of things I can think of. Not sure if they work but might be worth a try:

- Render GL_QUADS instead of GL_POINTS. It could very well be that the video card is optimized to handle triangles as they are the most important geometry type. A quad is automatically split into 2 tri''s so you can still put it all in one buffer. It will increase your geometry and bandwidth four times but the card might optimize it a lot better.

- Drop the lighting and do it manually. Use the GL_COLOR_BUFFER to set the final color of the particles manually. This will work if you are not CPU limited but GPU limited.

- Get a GeForceFX. The 5900 has a geometry limit of 338 vertices a second. That''s trice yout geForce4 TI.

- Choose a different PixelFormatDescriptor. 16 color bits. No stencil. No alpha. 8 bits of depth. No additional buffer. Only 1 backbuffer.

- profile, profile, profile. Maybe you can speed up other parts of your app, like the particle updates (math).


Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]


GSACP: GameDev Society Against Crap Posting
To join: Put these lines in your signature and don''t post crap!

Share this post


Link to post
Share on other sites
Dredge-Master    175
Is that 320 or 320000 points. The decimal point is confusing.

Anyway, a few things that I had to optimise on my voxel engine (used points or cubes, and up to 24 lights) was using a few changes to the code, and also using a 3-way (get that thought out of your head) linked list, and also ONLY rendering the VISIBLE surface.

The visible surface bit is the important part. if you have lets say a cube.

the cross section is

0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

you sure as hell don''t want to render all of them, so instead render

0 0 0 0 0
0 0
0 0
0 0
0 0 0 0 0

I hope that didn''t go all skewy.


Another one is instead of using use

Another is to re-write your math code. This is really helpful for lighting.
Software lighting is also really handy (eg, the 24 lights mine had) and it allows you to loose some detail if you wish to speed it up.

You can change some of your math functions to use lookup tables. Sin, sin^1, cos, cos^1, tan and tan^1 lookup tables with 65534 entries are pretty effective

Another tip if it is still too slow is if you have a static, or atleast a very slow moving light (or one that only moves if you rotate the block) is to have the normals for the lights preset. Basically the lighting will be static, as opposed to dynamic (what it probably is doing now since I think you are calculating the normals once per frame)


Oh and the best thing is to use a cute font, and really spiffy looking buttons



Oh, and optimise your GL code. change states as few times as possible etc, if you can use pointers to the floats (which you are doing) etc and also you may want to fiddle with things like
glFlush
if you have it on by not using it and so on.
If the bottle neck is both your code AND the frequency of your monitor it can effect it.

Lets say your monitor is doing 75hz. If you have OpenGL waiting for it to finishe the blit, and lets say your code cycling is at 3000hz, then that basically means your actual frequency should come out at 73.2hz.

if your monitor was set to 60hz, and your code was cycling at 1000hz, then your overall frequency would be 56.6hz.

if your monitor was set to 60, and you code was only capable of a cycle of 200hz, then you would be capped at 46.1hz.

and so on.

But lets say if you didn''t have your opengl set to waiting for the refresh to end, you would only be restricted by the lowest rate (in most situations the monitors refresh rate) which in the above examples is 60.

I guess by yours since you are getting a refresh around 30, it is rendering code that is causing the bottle neck.

All the same, if you stop opengl from waiting (by default usually) you can gain one or two fps.

Anyway you get the drift

Share this post


Link to post
Share on other sites
Dredge-Master    175
Here is a link to the voxel program I wrote.
http://members.optushome.com.au/jlferry/download/voxel_sfx.exe

currently the model is around 61600 voxels.
default is 6 lights.
click on the ? for instructions

Share this post


Link to post
Share on other sites
Lyve    116
Hi Sander,

I don''t know how to create the "Write Only" buffer how you call it. I do it the usual way with the type GL_STATIC_DRAW_ARB.

This is also the way it is used in all examples I''ve found. While trying to get it faster, I found out that there is a huge difference between VBO and vertex arrays if I disable the lighting. It seems that this is the bottleneck of the graphics accelerator.

I will continue research this evening, I think I''m on the right way. We will also check the idea to do it with quads / polygons, thanks for the hint.

_____________________________________
http://www.winmaze.de, a 3D shoot em up in OpenGL, nice graphics, multiplayer, chat rooms, a nice community, worth visiting!

Share this post


Link to post
Share on other sites
Lyve    116
Hi Dredge-Master,

I mean 320000 points of course.

Only the top and the bottom of our cube is rendered as points. The four sides are rendered as line, that actually don''t cost very much.
If I remove the rendering of the cube I get much over 1000fps, so state changes are not the problem.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Dredge-Master, you are really, REALLY confused. :D

Share this post


Link to post
Share on other sites
Dredge-Master    175
quote:
Original post by Anonymous Poster
Dredge-Master, you are really, REALLY confused. :D


LOL!
with regard to the
quote:
Originally posted by me
Is that 320 or 320000 points. The decimal point is confusing.



It's because international standards use , for the thousand seperators. We use decimal points for...you guess it...decimal points only. and punctuation of course. I wasn't sure if Lyve was from the US, or just typing 320 with a bunch of decimal points (which I've seen alot of in the engineering field).
Why does one country always have to have a bunch of weird ANSI specs, whilst the rest use ISO, or atleast measure everything in furlongs.
Then again, I guess gallons and miles are quite novel


Regarding the rendering - other than compiler settings, I can't really suggest much except trying to optimise the lighting.



forgot the slash in the ubb quote tag

[edited by - Dredge-Master on August 13, 2003 8:46:34 AM]

Share this post


Link to post
Share on other sites
Lyve    116
Lol it becomes interesting. I only used the "." because I thought it is something "usual" in other countries that is wanted. I''m from germany, in a german forum I wouldn''t have used the . hehe

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
I liked the ending with glFlush and your speculations about the monitor best.
Dredge-Master, you ARE really, REALLY confused. :D

Share this post


Link to post
Share on other sites
Dredge-Master    175
it should have been glFinish. typo. didn''t realise it sorry.

btw - the monitor refresh rates is correct. the card doesn''t slow it down much. check out some of the educational pdfs at the sgi website, and also their slide shows.

Share this post


Link to post
Share on other sites
Dredge-Master    175
oh and also disable vsync for the monitor refresh. then glFinish will only depend on the rendering side. not using finish, flush and vsync will only make it dependent on your code.
I always leave vsync on though, but it won''t make the rendering wait for the monitor. I still don''t trust it though.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
So your performance tips is software lighting and glFinish.
I also want to entertain with those calculations but could not find the pdf. Could you please post the exact link? Thanks in advance!

Share this post


Link to post
Share on other sites
Lyve    116
Face culling doesn''t work for points, so I''m disabling it (enabling wouldn''t change the speed because even with culling all points are rendered)

Share this post


Link to post
Share on other sites
Crispy    556
quote:
Original post by Lyve
Face culling doesn''t work for points, so I''m disabling it (enabling wouldn''t change the speed because even with culling all points are rendered)


Right - I should''ve read more carefully. I had polygons in mind....

Share this post


Link to post
Share on other sites
vincoof    514
50fps / 320,000pts give ~15,000,000 points per second.
I''d say it''s really good for lighted points on a GF4Ti4200, especially if the points move every frame. Don''t expect getting much more. A bit more probably, but not much.

What is your lighting parameters ? How many lights ? Directional lights ? Don''t tell us you have spot lights

VBO are very good to speed up things, but :
- if the driver is clever enough, he can do this optimization for you without even asking for it,
- nVidia had performance issues with VBO in their first implementations. Please download the very latest drivers.

Are your particles always in the field of view ? If not, maybe you can group some and perform frustum culling on their bounding box. It may save rendering of thousands of points, but you need to be able to group points and to compute their bounding boxes at a very low cost, otherwise you won''t get performance boost.

Have you benched the CPU load ? what processor do you have ? Depending on the GPU power / CPU power contrast, maybe you will have to let the graphics card work more or the central processor work more, things like computing software lighting.

What is the point movement policy ? Can''t you compute their position in a vertex program ? That would be very useful if the CPU is limited.

And a last thing : never setup a 16-bit color buffer with nVidia cards. It will kill you performance. Moreover, if you desktop is in 16bit actually, you should set it to 32bits.

Share this post


Link to post
Share on other sites