Real 2D game prog. in D3D8

Started by
24 comments, last by MatuX 22 years, 3 months ago
In DirectDraw7 you just blitted rects and it was ultra fast, in D3D8, you have to send vertex buffer batches of 2 triangles (a quad) to DrawPrimitive and it is obviously ultra slow since you have to repeat the operation NTiles*NLayers and batching only 2 triangles just makes things worse. So, is there a way of doing real 2D game programming in Direct3D8 or we f*cked up? --------- Non viable "solutions" 1. Rendering a full 3D terrain.. This is simply stupid, you obviously don't know the main feature of DD7 is the ability to deliver high quality and very artistic images, you certainly cannot do: http://matux.hypermart.net/diablo2_screen009.jpg or http://matux.hypermart.net/diablo2_screen001.jpg using this technique. 2. Another "solution" was to add all the textures I could into a macro-texture, ie. all 64x32 tiles into a big 256x256 tile. Then you render in batches of multiple polygons and you just change the tu,tv coordinates to point to the desired tile. But now, this ends up being either a map art eater or not being faster at all since you wont have all the textures you will use to render a single scene in the macro-texture so this mean that you will have to use as few textures as possible because it will still batch very few polys if you design high detailed scenes like: http://matux.hypermart.net/007.jpg or http://matux.hypermart.net/009.jpg, and using very very detailed scenes with high-quality pre-render images like those are the real point of doing 2D. And no, more than 256x256 is not an option since there're still lots of Voodoos and TNT cards out there. Edited by - MatuX on January 15, 2002 1:54:03 PM
Advertisement
whats wrong about directx7 interfaces for 2d games?
hm...

if you REALLY want to use directx8,
use CopyRects() to build a (self maintained) "texture cache",
and then fire all rects with the same texture in one batch.
sorting will probably not work because the draw order would
change.

another thing:
is "one-rect" batch rendering really SO slow?
the textures remain on the card, so there shouldn be a problem.

ofcourse you will never get the performance of 2D blits when using 3D, so - as above - why not use d3d7+dd7?

btw. if you start game-dev NOW t&l hw will be standard when you are done.
have you actually tried 2D quads?

Yes, you may not be doing batching, but I have found that you can render thousands of quads in a given frame before the framerate drops to unacceptable levels (if you do it right). I'd argue with the "obviously ultra slow" part based on my own test results on a gf2Go laptop.

If you really want to, you could batch up several transformed quads on the CPU and send them all to the card at one time. You'll get slight performance gains (on the order of +1 fps in my tests). The gains may be better on lesser hardware. For my apps, I've never had to get fancier than the inefficient single quad technique.

Keep in mind, you are probably more fill limited than geometry limited.

I'm not even sure about the "ofcourse you will never get the performance of 2D blits when using 3D" part. Real 2D blits might beat 2D quads in very simple tests, but I have never seen anyone test them side by side on the same hardware. When you start to do more complex operations (blending, etc.) the 3D technique should outperform DD7.

Try it - create a simple DD7 app and the equivalent DX8 app. (This should only take an hour or so). Compare the results. If you would like, post the code here so that people can try it on multiple machines and make sure that one app is not more optimized than the other.

Edited by - CrazedGenius on January 15, 2002 2:44:06 PM
Author, "Real Time Rendering Tricks and Techniques in DirectX", "Focus on Curves and Surfaces", A third book on advanced lighting and materials
I''m surely will end up doing that if no one can justify why everyone insists so much on using 3D apis to do 2D.
I know your frustration. I''ve been away from game programming for about a year and when I came back I took one look at Directx 8 and said WTF is this?? I read a whole bunch of articles and all of them said different things. I think the best solution however is to use the D3DXSPRITE interface. There''s not a lot of documentation on it but it''s pretty easy to use once you understand it. This is the basic way to get something on the screen;
  //set up DXGraphics and windows the normal way//get a texture from a file//The render routine would look something like thisvoid Render(){     //Clear the backbuffer to a black color     dx8.pd3dDevice->Clear( 0, NULL, D3DCLEAR_TARGET, D3DCOLOR_XRGB(0,0,255), 1.0f, 0 );     // Begin the scene     dx8.pd3dDevice->BeginScene();     //This is where D3DXSprite comes in     dx8.pd3dxSprite->Begin();     dx8.pd3dxSprite->Draw(pSrcTexture, pSrcRect, NULL, NULL, 0, &trans, 0xFFFFFFFF );     dx8.pd3dxSprite->End ();          // End the scene     dx8.pd3dDevice->EndScene();     // Present the backbuffer contents to the display     dx8.pd3dDevice->Present( NULL, NULL, NULL, NULL );}  


pSrcTexture is similar to a surface. It holds the bitmap.
pSrcRect is a RECT that tells DX which area of the source surface you want to display.
trans is the location that you want to draw to on the screen.

So, if you wanted to move the image around the screen, you would change the values in the tans struct. If you wanted to do animation you would change the values in the pSrcRect.


Hope that helps.
I think it was S1CA who pointed out that some drivers have been using the "3D way" to do DD under the hood for awhile now.

I''m not sure I can justify it to you. Many of the points that seem obvious to you are not actually true (at least not on newer hardware). You may have to try it for yourself.

You and I are on opposite sides of the spectrum. Based on what I''ve seen, I can''t see how anyone can justify using DirectDraw (on newer hardware). I think you might be surprised with the results when you actually get things going...
Author, "Real Time Rendering Tricks and Techniques in DirectX", "Focus on Curves and Surfaces", A third book on advanced lighting and materials
I can reach 80fps performing just 350 DrawPrimitives using batches of 2 polys (which is actually more or less what the billboarding example delivers when rendering 350 trees). I''m on a Duron 800@950 with a GF2MX. You may think, but that is perfectly fine! But the minimum specs we''re trying to reach is a TNT1, worst is that I can''t even test the engine on a TNT2...

CrazedGenius, you said you could deliver several *thousands* of quads on a frame without losing too many fps. Did you use some kind of batching or you actually made thousands of calls to DrawPrimitive?

The problem isn''t CPU, I''ve even tried underclock my Duron from 950 to 600Mhz and I just lost 5fps, which certainly is nothing and can be gained with enough code optimization (non DX API-related code optimization).

Removing Alphablending will give me a boost of 10fps, removing SetTexture (which is actually being done every frame) will give me a boost of 20fps. Those numbers are nothing compared when I remove the actual DrawPrimitive line (everything else is processed) where I get a boost of 400fps

We think the game will be finished in a year or less, so maybe we can tweak our minimum specs... I have been trying stuff like abusing alphablending (all the drawn quads are blended), even performing unuseful render states changes and the fps won''t change that much...

The bad thing here is that 350 DrawPrimitives is ~30% of what we plan to deliver on in-game graphics.

D3DXSprite??! I started with that, it''s the slowest SLOWEST thing in the world
A couple points about your post:
The TNT is going to be slower for this because it doesn't support 3D as well as later cards. Perhaps DD would be better in that specific case...

Removing DrawPrimitive should cause a huge increase, but not because you are saving geometry processing. It's because nothing is drawn. No drawing, depth testing, texturing, blending, etc. the vertex procesing is the least of your worries.

Here are some specs and caveats:

I am rendering 1000 blended quads on the screen at 30 fps using code that is not at all optimized (see below) on a GF2Go (the same chip as your MX). Yes, 30 fps is not fast, but you would probably also never need to draw 1000 quads. The quads don't kill you, the pixels do. (I'm using the typical RTS as a benchmark - if you have 1000 units on the screen, you can't really hope for great frame rates)
The performance increases greatly with lower quad counts, no blending, and smaller textures. This is the importent bit. If I use very small textures, my framrate increases. If I use very large textures, the framerate will die. Keep this in mind...

About the code. The testing code is based on the "2D in DX8" article on this site. It is not at all optimized in any way. In fact, it's quite bad (the test code, not the article). I have a panel class that contains 4 vertices and it's own texture. I create 1000 panels, meaning that I create 1000 little vertex buffers and 1000 copies of the same texture (obviously, I must be crazy). Each frame sets a random translation matrix (within the bounds of the screen), changes the vertex buffer, changes the texture, sets the transform, and renders the panels with blending. 1000 quads = ~30 fps. 500 quads = ~40 fps. 500 unblended quads = ~60 fps.

Some comparison numbers to DD - I use a 64x64 texture in this case. This means there are 4096x1000x30 = 123Mpixels/s or ~500MB/s (check my math...). Compare those numbers to the pixel throughput of your DD apps (I don't know, but I *think* it should be much better - also, remember that these are blended)

I don't use D3DXSprite - I prefer to reinvent the wheel for more control and optimization (although obviously not in this case!)


Edited by - CrazedGenius on January 15, 2002 4:19:52 PM
Author, "Real Time Rendering Tricks and Techniques in DirectX", "Focus on Curves and Surfaces", A third book on advanced lighting and materials
2d tiles may be a tad faster than 3d tiles (on some cards) when rendered normally. But there''s so much you can do with 3d tiles that won''t cause a performance hit that would cripple 2d tiles: lighting, fades, transparency, etc.

Besides, DirectX 8 still has all the objects for DX7.


Make games, it''s fun
Hmm.. It seems you, Crazed, was right It''s a fill rates issue. I was able to deliver 140fps with +550 quads and I reached that because I used 64x32 textures. Bigger textures makes the difference, rendering hundred of them will just kill your 2D app.
I tried some 128x128 and 256x256 quads and I got 84fps rendering only 56 quads.

So, doing some calculations I ended up having ~154,000 triangles per second which isn''t bad as I''m not using TnL... At least, that is what I think

I ran a nVidia benchmark that told me my GF2MX was able to handle 500,000 textured triangles without using TnL, so, this means I could deliver, at least, some thousands more triangles on my engine, so, I came up with 2 solutions I would like to share with you guys and tell me what do you think:
1. Will cropping the texture into small pieces, say, the 64x32 texture into 4 textures of 32x16, and render four times more triangles make things speed up? The problem here is "balance", I need to find the balance between texture size and amount of triangles to render per frame.
2. I can''t remember!!!! While writting the first solution I forgot the second one lol..

PD: Using TnL completely screws up the app making all the quads constantly flash (normal/black/normal/black/etc.) and it won''t even display all of the quads on the screen, just some of them, plus it''s much slower, I haven''t worked on this bug, yet.
PD2: Thanks for all your help guys, it is really worth

This topic is closed to new replies.

Advertisement