Sign in to follow this  
tokaplan

GeFroce2 MX: texture stages unexprected behaviour

Recommended Posts

Hi everyone I've noticed that though according to device caps GeForce2 MX supports all 8 DX texture stages, starting with the third stage most operations either work not the way they should or do not work at all, just darkening the scene. My software must function on GeForce2 and I wonder what the possible cause could be. I've checked and rechecked the documentation, but didn't find any information. So I must be doing something wrong, what may it be?

Share this post


Link to post
Share on other sites
I was using GeForce2 MX for a long time, and from what I noticed, there are 8 texture stages available, but in only 2 you can actually sample textures.

You can do other computations as much as you want in other 6 stages.

What I say is deduced from my own experience and trials/errors, but I doubt I could be wrong on that.

Share this post


Link to post
Share on other sites
This is right, the documentation claims it supports only 2 textures simultaneously. But the problem is, when I do the following (this is just an example, I've tried everything:

Texture[0] = (NormalMap);
ColorOp[0] = DotProduct3;
ColorArg1[0] = Texture;
ColorArg2[0] = Diffuse;

TextureFactor = float4( 0.4, 0.4, 0.4, 1 );
Texture[1] = (BaseTexture);
ColorOp[1] = Lerp/* arg0*arg1 + (1-arg0)*arg2 */;
ColorArg1[1] = Current;
ColorArg2[1] = Texture;
ColorArg0[1] = TFactor;


ColorOp[2] = Add;/*arg1*arg2+arg0 ?*/
ColorArg1[2] = Current;
ColorArg2[2] = Specular;

In this sample the final stage (2nd) does not produce any effect at all, as if it wasn't present. In other cases (with different operations) it may just darken the scene. When I switch to a reference rasterizer, it works fine.

Share this post


Link to post
Share on other sites
I can say I use 3 stages fine on that chipset. However I've found that you may just have to tweak which stages go where to make it work.
On the other hand you may just be forgetting to set some render/texturestage state which the reference driver is doing automatically.

Share this post


Link to post
Share on other sites
There's a page in the documentation that describes the known limitations of old generations of graphics hardware, but unfortunately I can't find it. Basically, the graphics card can perform a fixed equation (e.g. (A * B) + (C * D)) and must map your texture stage settings onto this equation for rendering. As such, there are really a lot of limitations - ISTR that a lot of devices don't support D3DTA_CURRENT on stages other than the first or second. Also, it's sometimes worth it to switch the texture and color arguments.

Sorry if this isn't very helpful, I'm hazy on these matters. Perhaps S1CA will chime in and clarify things a bit [smile]

Share this post


Link to post
Share on other sites
Quote:
Perhaps S1CA will chime in and clarify things a bit


I'll try [smile] It's been a good few years since I did anything with a GeForce2.


1. As Coder says, the chip is hardwired to performs a fixed equation for each of the texture units (aka "register combiners"). The inputs to, and a few operators on the equation are reasonably (but not completely) flexible, it's these inputs and operators you're selecting when you call SetTextureStageState() for a particular stage.



2. The SetTextureStageState() API is conceptual rather than a direct representation of how the stages are physically connected in the hardware. The settings you ask for with SetTextureStageState() get translated from the conceptual D3D model into a setup for the fixed register combiner(s) by the nVidia device driver at draw time.

For all 2 D3D texture stage setups, the driver can always translate your SetTextureStageState() calls into something the hardware can use.

For 3 and 4 D3D texture stage setups, the driver has known translations for a handful of setups, but not many. ISTR there are only two 3-stage setups and one 4 stage setup actually possible with the available GeForce 256/2 drivers (one of them is for emboss bump mapping, another is a lightmapping variant); over time (during the GF256 to GF2 timeframe) the driver writers did add new translations when people requested them so a few more may be available in the latest drivers.



3. The document here describes the equation(s) and inputs available on the GeForce256 and GeForce2 range, although it's slightly OpenGL centric, it should give you a much better idea of what the D3D SetTextureStageState() calls get translated into, and what things are/aren't physically possible with the hardare:
http://developer.nvidia.com/object/registercombiners.html



4. The GeForce 256/GeForce 2/GeForce 2MX/GeForce 4MX does NOT really have 8 usable stages. The 8 stages reported by the driver is for a *hack* to allow direct access to the single register combiner available on a the Riva TNT/TNT2 whereby an invalid combination of D3D SetTextureStageState()s is used to signal combiner configuration to the driver.

The hack had use in the days of the TNT before triadic operations were added to SetTextureStageState(); but since it can only expose a single combiner (physical hardware texture unit), it's much less useful (but still exposed) on GeForce256/2/4MX.

The 8-stages aren't usable for anything other than the 8-stage-combiner-hack; there are only really 4 usable stages (as mentioned above - and even then there's only one combination that will work with those 4 stages)



5. The MaxSimultaneousTextures device cap is the most important one to look at. In the case of the GeForce 256/2, this tells you how many physical texture combiners are present, and so the number of **unique** textures you can use (so if the value of that cap was 2, any 3 texture operation would only work if one texture was used in 2 different stages.



6. Because the number of possible combinations of SetTextureStageState() setups would be too large if each combination had a device cap, MS added the IDirect3DDevice9::ValidateDevice(). ValidateDevice asks the part of the driver which is responsible for translating D3Ds conceptual texture stages "can you find a valid translation for this combination of states, and if not, why not?"

When trying to work out why a SetTextureStageState() combination doesn't work, put a ValidateDevice() just before your DrawPrimitive() call [don't leave it in after development, it's an expensive call]. That will return errors such as D3DERR_TOOMANYOPERATIONS which will give you some clue about what the driver doesn't like.

It should be quite easy to write an automated "which 3-stage SetTextureStageState combinations will work on this card" tester using ValidateDevice().

The documentation for ValidateDevice() also gives a few tips on which types of things can cause failure.



7. If what you're trying to achieve doesn't have an equivilent mapping on the [now] limited GeForce2MX hardware, then unfortunately you'll either have to:

a) sacrifice visual quality and drop the feature.


b) sacrifice performance (slightly) and implement the technique using multiple passes.


c) find some other cunning way of achieving the same result; a few ideas/tips:

- write your whole texture blending setup out as an equation; it's often possible to find alternative combinations of operations which will achieve the same effect - or at least something similar.

- complex texture ops usually perform more than one job so are often very handy for re-arranging the terms of your texture blending equation.

- pre-multiplied alpha saves at least a multiply in places you can use it.

- fog and specular are performed in a separate specialised/non-general combiner - if any of your operation involves interpolation with a constant, particularly a distance based one, then fog can be abused. And if any part involves addition of a per vertex value or global constant and iteration across the polygon, then specular can be abused. The MATERIALSOURCE render states give you a few more possibilities.

- The D3DTA_ALPHAREPLICATE and D3DTA_COMPLEMENT modifiers can be useful on some hardware and bad on other hardware.

- If any of your 2 textures aren't using their alpha channels, then you have a handy per-pixel scalar which is perfect for monochrome lightmaps when used with ALPHAREPLICATE or one of the other more complex colour/alpha operations.

- Sometimes the texture format you use DOES matter to the driver when it's translating from D3Ds conceptual stages to the real ones (and when ValidateDevice is trying the same). Some formats can require extra work for the hardware (on some hardware..) to make their values available to the texture combiner - complex formats and types can mean less chance of getting that 3 or 4-stage operation to work. Be particularly wary of cube maps and trilinear filtering.

- if the effect you're trying to achieve gets written with some form of frame buffer blending (eg SRCALPHA:INVSRCALPHA), bear that in mind when finding things to re-organise in your blending equation - the frame buffer blend can often save you a final multiplication or addition.

Share this post


Link to post
Share on other sites
Quote:
Original post by tokaplan
Thanks a lot, S1CA, that was great!!
I just have one question left: how bad is the performance penalty for multipass rendering? Is it the same as rendering twice?
Thank you a lot!


In terms of the API calls to D3D and depending on the cost of your scene manager, it can be as bad as rendering twice. If your app is CPU bound, you might not want to incur that cost, or at least not for everything (scalable shader-LOD is a good idea - drop some part of your lighting for far away objects when running on older hardware.

If you have a lot of static stuff which shares the same state/shaders you can pre-transform it into world space at load/author time to significantly reduce API and driver call overhead.

In terms of the actual rendering, the price of the second pass can be pretty cheap since the first pass filled in the Z buffer for you so the second pass only renders pixels that are truly visible (i.e. the 2nd pass has 0 overdraw for all opaque geometry). In some situations, particularly with complex pixel shaders/texture blending it can even make sense to "lay down Z" first and do 3 passes where the first pass only writes to the Z buffer (the COLORWRITE render state set accordingly). That way complex (slow) pixel processing is only performed for pixels that actually need to be visible.


Quote:
Actually, this is not specular, this is the whole lighting - diffuse + ambient. I do it in a shader.


Even so, Sim (who knows more than I do about the inner workings of nVidia chips than I do [wink]) has a point which I touched on briefly (but no so well): you can add the per-vertex Gouraud interpolated specular on for free... that's *something* extra at the end of your pipeline that you may be able to squash your lighting equation into - doesn't have to be specular either - just remember you have extra operations that occur after the traditional texture blend cascade.


BTW: if you're looking for stuff to drop when reorganising/simplifying lighting, per-vertex diffuse looks pretty good unless you have really low-polygon models - per-vertex specular on the other hand looks pretty bad unless you have really high tesselation. So if you're stuck and want to do per-pixel lighting, do the specular per pixel and the diffuse per vertex (it looks better than the other way around).

Also bear in mind that per-pixel diffuse lighting without moving objects or moving light sources is effectively just an expensive form of detail texturing - use detail texturing if things don't move (same doesn't work for specular, though some forms of emboss will give ok-ish results)...

BTW2: Sim's journal here is well worth a read if you want to aim for high end effects on lower end cards.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this