• entries
  • comments
  • views

"instancing" with DX9

Sign in to follow this  
Norman Barrows


The new PC has arrived (i7-6700K, GTX1080), and I'm starting on the final graphics for the game.

Better grass plants were one of the first things. The placeholder graphics were ultra-low-poly, just 3 or 4 quads. I could only draw them every 5 feet or so, any closer together was too many instances. They were scaled very wide and flat to cover the gaps. it looked ok en masse, but individual plants weren't that great.

So after watching Youtube tutorials, i made some better plant models. I was testing them out, but noticed a frame rate hit. 2500 instances was ok, but a bit of a performance hit. 10,000 was a big performance hit.

So i did some experimenting. i would track the number actually drawn, and the number of polys in the mesh being used, and the FPS, and the scale. at first it seemed the number of draw calls was having the biggest effect, followed by the number of tris in the mesh used, with mesh scale (big tri rasteriization) practically no effect (scale up the grass in your game to 20-30 feet tall - it looks wild!). but on closer inspection, i was drawing about the same number of quads with 10000 one quad meshes vs 2500 3 quad meshes, but 10000 was half as fast. turned out it was the figuring out of which of the 10,000 to draw that was the performance hit.

So then it was time for some test code. The optimal way to do it would be to pre-calc the lsit of plants to draw, along with their world mats. set the texture, material, mesh, clamp, cull, tex scale, etc just once, then for each plant in the list, set world mat, then call D.I.P.

I tried it with 2500 plants. a 100x100 foot area with one plant every 2 feet. worked like a charm. no change in framerate. still 60 FPS. then i tried 10000 plants (200x200 foot area, one plant every 2 feet). worked like a charm. no change in framerate.



the total meshes drawn is in addition to the 10000 grass plant meshes. those are the ground, tress, berry bushes, the huts, the teepee, the fire, people, etc.


this shows a closeup of the grass. still needs work, but its getting better.

and here's the test code: /* optimal: pre-calc all world mats set alphatest, cull, clamp, material, and tex scaling once. set each tex once. for each tex, set each mesh once. for each mesh, set world mat, call DIP */ #define max_grass_list 10000D3DXMATRIX grass_list[max_grass_list]; int num_grass_list;void init_savanna_grass_list(){ int x1,z1; float x,y,z,sx,sy,sz,ry,jitter,scale; ZeroMemory(grass_list,sizeof(D3DXMATRIX)*max_grass_list); num_grass_list=0; for (x1=0; x1<100; x1++) { for (z1=0; z1<100; z1++) { x=26200.0f+(float)x1*2.0f; z=13100.0f+(float)z1*2.0f; jitter=grass_offset((int)x,(int)z); jitter/=2.0f; x+=jitter; z+=jitter; y=heightmap(cm[cm0].mx,cm[cm0].mz,x,z); scale=(float)grass_scale((int)x,(int)z); sx=scale*0.05f; sy=scale*0.03f; sz=scale*0.05f; ry=grass_rotation((int)x,(int)z); Mstart(); Mscale(sx,sy,sz);// MrotateRADS(0,rx); MrotateRADS(1,ry);// MrotateRADS(2,rz); Mmove(x,y,z); grass_list[num_grass_list]=Mmat; num_grass_list++; } } }void test_draw_savanna_grass(){int a;HRESULT result;trace[0]=0;Zsettex(454); // 6=oceanZsetmesh(4);Zalphatest(1);Zcull(1);Zclamp(1); Zsetmaterial(7);Ztexture_scaling(0);for (a=0; aSetTransform(D3DTS_WORLD,&grass_list[a]); result=Zd3d_device_ptr->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,0,0,Znum_verts,0,Znum_primitives); trace[0]++; if (result != D3D_OK) { Zmsg2("test draw savanna grass error"); exit(1); } }}
so it would seem that lots of drawcalls is not necessarily a bottleneck. perhaps even less of a bottleneck than high poly meshes. i tested plant meshes with as few as two triangles, and as many as 64 triangles. as is often the case, its visible surface determination that's the bottleneck here. In such cases, optimization in the form of DOD seems to be the best approach.


I turned off the test code to draw savanna grass, and turned generate_savanna_grass back on. I kicked it up from one plant every 3 feet to 1 plant every 2 feet (from 10,000 to 22,500 plants per 300x300 foot chunk). This required an increase in the number of max_meshes in a terrain chunk (from 20,000 meshes to 25,000 meshes). but with a chunk cache of 90 chunks, that ran out of memory. So i knocked the cache size back down to its original 30 chunk size.

i'm now making 36,000+ draw calls per frame at 60 FPS - clip range in this shot is set to 400 feet for everything:


Since the data structure for visible surface determination is the key to all this, i'll provide a few details....

the entire graphics engine is build around a struct called a Zdrawinfo struct. it contains all the basic parameters for a draw call, such as mesh ID, texture ID, material ID, and flags like cull, clamp, alphatest, etc.

When a terrain chunk is generated, data from the world map and plant maps determine the terrain type and placement of trees etc. A terrain chunk is a list of Zdrawinfo structs - all the meshes in the chunk. it also has an index, which lists the structs in texture, mesh order. When a mesh is added to the list, an in-order insertion is done to the index. so the chunk contains the list of Zdrawinfo structs, and an index that consists of a list of textures, and for each texture, a list of meshes, and for each mesh, a list of instances in the Zdrawinfo list that use that texture and mesh combo. The index is implemented using ints for storage, and stores texture IDs, mesh IDs, and Zdrawinfo list indices . When rendering, the index is used to draw the meshes in texture, mesh order - to minimize state changes. The graphics engine includes state managers for all states such as cull clamp, alpha test, etc. - so so any redundant changes occurring there while rendering the instances of a texture.mesh combo will be automatically filtered out. Chunks are generated on-demand in the foreground if necessary, as well as being generated in the background using a multi-pass look ahead system to generate chunks around the player before they move into visible range. Generating 22,500 plants vs 10,000 in a single background pass is slightly noticeable. I'll probably need to split that up into two passes. Depending on the type of terrain in a chunk, a chunk can require up to 6 or 8 passes to generate in the background. background chunk generation runs at 15Hz. The game now features a user defined variable frame rate limiter that scales update to the desired framerate. Vsync is still on to avoid tearing, but desired FPS values of 1 through 1000.are supported by the code - if you have the hardware to do it.

NOTE: Apologies for the somewhat dark images. these screenshots were taken at 6:30 AM game time under natural lighting conditions.

Sign in to follow this  


Recommended Comments

Thank you for sharing! I think that plants, and grass especially can have large impact on the player's idea of "good" graphics. No one enjoys just having a bundle of grass "pop" into the scene.





NOTE: Apologies for the somewhat dark images. these screenshots were taken at 6:30 AM game time under natural lighting conditions.

I like that you make note of this and stop questions early ;)

Share this comment

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now