Jump to content
  • Advertisement


This topic is now archived and is closed to further replies.


Taking advantage of T&L

This topic is 6929 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've noticed it mentioned in a lot of places that for hardware T&L to be truly effective, the vertices need to be stored in local video memory. I was wondering if anyone knows how much that affects one's ability to implement various vis solutions(BSPs, Portals, Octrees, whatever). As far as I can tell, you'd need to run your vis stuff to figure out what polygons you're going to need for the frame, upload them to the card, and then the T&L engine can take care of the rest and transform them. Is that need to upload going to kill a lot of the benefits of using the T&L, or is there another way around this that I can't see just yet?


Share this post

Link to post
Share on other sites
Not just local video memory, you can also do AGP memory, and get a good performance boost. AGP memory is quick to write, but slow to read.

There is a way to use the above knowledge to get good performance out of hardware TnL, while keeping our vertices local in system memory.

We create a single vertex buffer of fixed size, say 1K vertices. We specify it as "write-only." The driver should put it in AGP memory. We initialize an index value to 0.

We start our HSR, portal, BSP, whatever.

When we see that we have vertices to render, we check if there are enough slots in the vertex buffer left. If so, we lock with the "Do Not Overwrite" flag. If there are not enough slots, we reset the index to 0, and lock with the "discard contents" flag. We put our vertices in the vertex buffer, unlock. We then call DP() on the vertex buffer, with a start vertex of index and number of vertices equal to how many vertices we just inserted. We then increment index.

Be sure to create a write-only VB! This will allocate the VB in AGP, for quick-write, and quick-read by the card. If you don't specify this, the VB will be in video memory and the copying over the bus will kill your frame rate.

Be sure to lock with NOOVERWRITE & DISCARDCONTENTS. No overwrite just returns a cached pointer, and takes about 50 cycles. If the driver is busy drawing the VB, we won't have to wait for it to complete because we're promising that we're not going to touch memory in use by previous DP operations. Naturally, if you do touch memory used by previous DP operations, the behavior is undefined. DISCARDCONTENTS will swap the VB to another one that is not in use. This is slower than NOOVERWRITE, but not by much since we don't have to wait for a DP operation to complete before a lock can be taken. The first few times this is called, the driver will allocate new vertex buffers, which is slow. Eventually, it will have a pool of VB's and just round-robin between them.

The above techniques all require DX7, they won't work on DX6 and below.

Sameer Nene has given a good overview of this procedure several times on the DXDev mailing list, you may want to search the archives for further information.

Share this post

Link to post
Share on other sites

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!