Question about GPU draw calls

Started by
9 comments, last by Sollum 6 years, 2 months ago

Hello!

I am experimenting with 3d and using LibGDX platform. I want to do "engine trail" effect in 3d space. Well, a more simple version would be simple lines.
Tried with GL Immediate Mode and it looked nice, but the problem is the amount of draw calls!
It was around 150 calls to draw pretty curved line. One line.

And the solutions of itself isn't valid, because i have to create different model instances, and transform them separately to make a curved line in order to draw in a proper model batch.

Is there any spreadsheet or information on what is considered "normal" amount of draw calls for a mobile/desktop game?
It seems that most 3d effects take up a lot of resources, but i have no idea to benchmark what is normal and what is a no no.

 

Example of what i am trying to achieve.

 

Advertisement

You should be able to do that effect with 1 draw call. How does your current solution work such that you need >100 draws?

12 hours ago, Sollum said:

Is there any spreadsheet or information on what is considered "normal" amount of draw calls for a mobile/desktop game?

IMHO: For a mobile game, 100 to 200, for a desktop game 1k to 10k.

As Hodgman implies, typically a "ribbon particle trail" is just a single very long strip of quads (or triangles), you can store them all in a single vertex buffer, and draw them in once call.

This is generally a very cheap effect. For fixed length trails, you can even treat the vertex buffer as a circular buffer, and just keep overwriting the oldest quad in the buffer with the new quad as the tracked object moves.

15 hours ago, Sollum said:

GL Immediate Mode

OpenGL's immediate mode has been deprecated for a very long time. Switch to using vertex buffers and glDrawArrays/glDrawElements, and your current performance problems will melt away...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

4 hours ago, Hodgman said:

How does your current solution work such that you need >100 draws?

I was experimenting with trails and tried to use GL Immediate Mode.

I haven't figured out fully framework i am using. I was thinking of storing quads in one model, but whenever it is created, everything is stored in meshes, that are stored in VBO's. Each mesh is a different draw call. If model has 5 meshes, it will be 5 calls and so on.

I could try updating existing VBO with new vertices (based on documentation, it has that option https://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/graphics/glutils/VertexBufferObject.html#updateVertices-int-float:A-int-int-), but wouldn't it be expensive and taxing to interact with VRAM after data loading is finished?

Sorry, my thinking and questions might sound dumb, but i am used to storing everything in VRAM at the load time and using GPU later on for drawing. It seems like a taxing action to recreate/adjust VBO's on the go.

Thanks for all the tips!

With old immediate mode stuff, you should just be able to use a single glBegin/glEnd pair and all of the glVertex calls in between them.

Streaming data via a VBO is not that bad -- you can send a few hundred megabytes per frame over the PCIe bus without a problem! Sending a few megabytes of vertex is well below the theoretical maximums. In my game I stream 3.5MB of vertex data per frame to draw these vehicle trails and it hardly shows up at all on my performance profiler: http://www.22series.com/assets/images/screenshots/22_racing_series_video_game_future_physics_racer_03.jpg

In D3D this is pretty simple -- you just use the DISCARD flag to tell the driver to orphan the old buffer storage and allocate a new page from an internal ring -- but in GL it's a little trickier to tell the driver that this is your intention. Check out both of these for some different methods of streaming VBO data in GL:

https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf

https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming

Orphaning is the easy way to do it (as long as you follow one of the orphaning recipies in the above links), or if you really want to improve performance further (probably only required if you want to push it over 100MB per frame), you can use persistent mapping, and/or unsynchronized mapping (A.K.A. NO_OVERWRITE in D3D) to implement a circular/ring buffer, as mentioned above.

Thanks a lot!
I will look into "orphaning", but i am afraid that mobile platforms are not that "powerful" for that :(

2 hours ago, Sollum said:

I will look into "orphaning", but i am afraid that mobile platforms are not that "powerful" for that :(

At the same time you'll likely not have as many quads to push as the screen sizes are smaller also.  Also, until performance data tells you it's an issue, don't assume it will be.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

2 hours ago, Sollum said:

but i am afraid that mobile platforms are not that "powerful" for that

You might be surprised. Back when I worked on the Fire Phone, there were all sorts of chipset vendor-specific OpenGL extensions to speed up specific performance cases. The high-end mobile chipsets spend a lot of effort on rendering. But at the same time, there be dragons...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Specifically for mobile rendering, I'd recommend the documentation at https://developer.qualcomm.com/software/adreno-gpu-sdk/tools as a starting point (assuming Android/Snapdragon mobile chipset).

Admin for GameDev.net.

imageproxy.php?img=&key=2aa9ad0c79a985beThanks for all the tips!

I found a way how to do stuff with the help of framework i am using, turns out, i had to implement everything into a batch, calling glDrawArrays outside of batch bounds did not produce results i wanted.
Anyways, i managed to achieve the result i wanted.

Its quite fun tho, to observe android logs and profiler tool. In this test i allocate 7000 float array each frame and rebuild it. GC dump numbers are nice! :D

Edit: It was "strange" experience and kind of counter intuitive for me. I have made 4 2D games and always tried to cache, reuse or preload data as much as I could get away with. And now, I had to dynamically recreate arrays of data, each frame.
Are all graphic effects based on that? What if i'd want to make dust particles going from bellow car tires? A blind guess would be to make at least ~10000 3 cordinate vectors, recalculate them each frame, based on movement vector and delta draw time, and then recreate all the buffer data all over? Can mobile phones even handle that?

Anyways, here is the result!
image.png.fa46b1bbcac01ecc73c1dae31b34dfbb.png

image.png.f1f417a8663ed97fb2e9de1519a1037e.png

This topic is closed to new replies.

Advertisement