GeFroce2 MX: texture stages unexprected behaviour

Graphics and GPU Programming Programming

Started by tokaplan November 04, 2005 06:26 AM

9 comments, last by S1CA 18 years, 5 months ago

122

Author

November 04, 2005 06:26 AM

Hi everyone I've noticed that though according to device caps GeForce2 MX supports all 8 DX texture stages, starting with the third stage most operations either work not the way they should or do not work at all, just darkening the scene. My software must function on GeForce2 and I wonder what the possible cause could be. I've checked and rechecked the documentation, but didn't find any information. So I must be doing something wrong, what may it be?

deffer

755

November 04, 2005 06:32 AM

I was using GeForce2 MX for a long time, and from what I noticed, there are 8 texture stages available, but in only 2 you can actually sample textures.

You can do other computations as much as you want in other 6 stages.

What I say is deduced from my own experience and trials/errors, but I doubt I could be wrong on that.

tokaplan

122

Author

November 04, 2005 06:42 AM

This is right, the documentation claims it supports only 2 textures simultaneously. But the problem is, when I do the following (this is just an example, I've tried everything:

Texture[0] = (NormalMap);
ColorOp[0] = DotProduct3;
ColorArg1[0] = Texture;
ColorArg2[0] = Diffuse;

TextureFactor = float4( 0.4, 0.4, 0.4, 1 );
Texture[1] = (BaseTexture);
ColorOp[1] = Lerp/* arg0*arg1 + (1-arg0)*arg2 */;
ColorArg1[1] = Current;
ColorArg2[1] = Texture;
ColorArg0[1] = TFactor;

ColorOp[2] = Add;/*arg1*arg2+arg0 ?*/
ColorArg1[2] = Current;
ColorArg2[2] = Specular;

In this sample the final stage (2nd) does not produce any effect at all, as if it wasn't present. In other cases (with different operations) it may just darken the scene. When I switch to a reference rasterizer, it works fine.

d000hg

1,208

November 04, 2005 06:45 AM

I can say I use 3 stages fine on that chipset. However I've found that you may just have to tweak which stages go where to make it work.
On the other hand you may just be forgetting to set some render/texturestage state which the reference driver is doing automatically.

tokaplan

122

Author

November 04, 2005 07:38 AM

Which ones?))

Muhammad Haggag

1,358

November 04, 2005 02:47 PM

There's a page in the documentation that describes the known limitations of old generations of graphics hardware, but unfortunately I can't find it. Basically, the graphics card can perform a fixed equation (e.g. (A * B) + (C * D)) and must map your texture stage settings onto this equation for rendering. As such, there are really a lot of limitations - ISTR that a lot of devices don't support D3DTA_CURRENT on stages other than the first or second. Also, it's sometimes worth it to switch the texture and color arguments.

Sorry if this isn't very helpful, I'm hazy on these matters. Perhaps S1CA will chime in and clarify things a bit [smile]

S1CA

1,418

November 06, 2005 04:57 PM

Quote:Perhaps S1CA will chime in and clarify things a bit

I'll try [smile] It's been a good few years since I did anything with a GeForce2.

1. As Coder says, the chip is hardwired to performs a fixed equation for each of the texture units (aka "register combiners"). The inputs to, and a few operators on the equation are reasonably (but not completely) flexible, it's these inputs and operators you're selecting when you call SetTextureStageState() for a particular stage.

2. The SetTextureStageState() API is conceptual rather than a direct representation of how the stages are physically connected in the hardware. The settings you ask for with SetTextureStageState() get translated from the conceptual D3D model into a setup for the fixed register combiner(s) by the nVidia device driver at draw time.

For all 2 D3D texture stage setups, the driver can always translate your SetTextureStageState() calls into something the hardware can use.

For 3 and 4 D3D texture stage setups, the driver has known translations for a handful of setups, but not many. ISTR there are only two 3-stage setups and one 4 stage setup actually possible with the available GeForce 256/2 drivers (one of them is for emboss bump mapping, another is a lightmapping variant); over time (during the GF256 to GF2 timeframe) the driver writers did add new translations when people requested them so a few more may be available in the latest drivers.

3. The document here describes the equation(s) and inputs available on the GeForce256 and GeForce2 range, although it's slightly OpenGL centric, it should give you a much better idea of what the D3D SetTextureStageState() calls get translated into, and what things are/aren't physically possible with the hardare:
http://developer.nvidia.com/object/registercombiners.html

4. The GeForce 256/GeForce 2/GeForce 2MX/GeForce 4MX does NOT really have 8 usable stages. The 8 stages reported by the driver is for a *hack* to allow direct access to the single register combiner available on a the Riva TNT/TNT2 whereby an invalid combination of D3D SetTextureStageState()s is used to signal combiner configuration to the driver.

The hack had use in the days of the TNT before triadic operations were added to SetTextureStageState(); but since it can only expose a single combiner (physical hardware texture unit), it's much less useful (but still exposed) on GeForce256/2/4MX.

The 8-stages aren't usable for anything other than the 8-stage-combiner-hack; there are only really 4 usable stages (as mentioned above - and even then there's only one combination that will work with those 4 stages)

5. The MaxSimultaneousTextures device cap is the most important one to look at. In the case of the GeForce 256/2, this tells you how many physical texture combiners are present, and so the number of **unique** textures you can use (so if the value of that cap was 2, any 3 texture operation would only work if one texture was used in 2 different stages.

6. Because the number of possible combinations of SetTextureStageState() setups would be too large if each combination had a device cap, MS added the IDirect3DDevice9::ValidateDevice(). ValidateDevice asks the part of the driver which is responsible for translating D3Ds conceptual texture stages "can you find a valid translation for this combination of states, and if not, why not?"

When trying to work out why a SetTextureStageState() combination doesn't work, put a ValidateDevice() just before your DrawPrimitive() call [don't leave it in after development, it's an expensive call]. That will return errors such as D3DERR_TOOMANYOPERATIONS which will give you some clue about what the driver doesn't like.

It should be quite easy to write an automated "which 3-stage SetTextureStageState combinations will work on this card" tester using ValidateDevice().

The documentation for ValidateDevice() also gives a few tips on which types of things can cause failure.

7. If what you're trying to achieve doesn't have an equivilent mapping on the [now] limited GeForce2MX hardware, then unfortunately you'll either have to:

a) sacrifice visual quality and drop the feature.

b) sacrifice performance (slightly) and implement the technique using multiple passes.

c) find some other cunning way of achieving the same result; a few ideas/tips:

- write your whole texture blending setup out as an equation; it's often possible to find alternative combinations of operations which will achieve the same effect - or at least something similar.

- complex texture ops usually perform more than one job so are often very handy for re-arranging the terms of your texture blending equation.

- pre-multiplied alpha saves at least a multiply in places you can use it.

- fog and specular are performed in a separate specialised/non-general combiner - if any of your operation involves interpolation with a constant, particularly a distance based one, then fog can be abused. And if any part involves addition of a per vertex value or global constant and iteration across the polygon, then specular can be abused. The MATERIALSOURCE render states give you a few more possibilities.

- The D3DTA_ALPHAREPLICATE and D3DTA_COMPLEMENT modifiers can be useful on some hardware and bad on other hardware.

- If any of your 2 textures aren't using their alpha channels, then you have a handy per-pixel scalar which is perfect for monochrome lightmaps when used with ALPHAREPLICATE or one of the other more complex colour/alpha operations.

- Sometimes the texture format you use DOES matter to the driver when it's translating from D3Ds conceptual stages to the real ones (and when ValidateDevice is trying the same). Some formats can require extra work for the hardware (on some hardware..) to make their values available to the texture combiner - complex formats and types can mean less chance of getting that 3 or 4-stage operation to work. Be particularly wary of cube maps and trilinear filtering.

- if the effect you're trying to achieve gets written with some form of frame buffer blending (eg SRCALPHA:INVSRCALPHA), bear that in mind when finding things to re-organise in your blending equation - the frame buffer blend can often save you a final multiplication or addition.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

tokaplan

122

Author

November 07, 2005 08:13 AM

Thanks a lot, S1CA, that was great!!
I just have one question left: how bad is the performance penalty for multipass rendering? Is it the same as rendering twice?
Thank you a lot!