Sign in to follow this  
yk_cadcg

[solved] how to save 4-channel to 2-channel?

Recommended Posts

yk_cadcg    100
Thanks to all! :) in summary, i'm trying to save both gram space and copyback time, and it's hard for now. ---------------------------------- hi, [dx10, vs2005, geforce8800] i'm doing blending and render to texture. I only need 2 channels for blending, one channel is color(any of r/g/b, blendOp = Min), the other is Alpha(blendOp = Add. Actually Inc, since I let ps.outColor = (x,x,x,1)). But the hardware only supports full 4-channel textures (2-channel formats don't have Alpha), thus wasting copyback time. how to save 4-channel to 2-channel? Thanks! [Edited by - yk_cadcg on October 12, 2007 11:36:59 PM]

Share this post


Link to post
Share on other sites
sirob    1181
Are you sure this actually poses a performance issue? Have you compared blending with 2 channels compared to 4?

Share this post


Link to post
Share on other sites
yk_cadcg    100
Thanks Sirob,
1, this is not a critical performance issue, readback is very fast, for a 1M 4-channel texture, this only costs <100ms.
but it's a waste of resource, since only 2 channels are used.
2, i can't test with a 2-channel texture: there is no 2-channel Format that contains Alpha channel, such as R32A32. (I have to use one color and one alpha, since there're 2 different blendOp, i can only place these 2 Ops to color and alpha, seperately. 2 colors such as R32G32 can only share one BlendOp.)

Quote:
Original post by sirob
Are you sure this actually poses a performance issue? Have you compared blending with 2 channels compared to 4?


Share this post


Link to post
Share on other sites
yk_cadcg    100
Thanks Sirob,
1, this is not a critical performance issue, readback is very fast, for a 1M 4-channel texture, this only costs <100ms.
but it's a waste of resource, since only 2 channels are used.
2, i can't test with a 2-channel texture: there is no 2-channel Format that contains Alpha channel, such as R32A32. (I have to use one color and one alpha, since there're 2 different blendOp, i can only place these 2 Ops to color and alpha, seperately. 2 colors such as R32G32 can only share one BlendOp.)

Quote:
Original post by sirob
Are you sure this actually poses a performance issue? Have you compared blending with 2 channels compared to 4?


Share this post


Link to post
Share on other sites
jollyjeffers    1570
You might be able to get implicit conversion through casting using one of the copy-resource functions. I'm not on my D3D10 dev machine now so I don't have the specs/docs to confirm this.

Regardless of whether the API can do it I'd imagine you'll be best off implementing some sort of staging resource and a GPU-based working-to-staging conversion. For example, do a simple render-to-texture from 4-channel to 2-channel and convert accordingly in PS.

However I suspect that the extra cost of performing this conversion is likely to offset against the halving of GPU<->CPU bandwidth. You may end up making your software a lot more complex for a relatively minor performance improvement.

I was reading about NVPerfHUD 5 yesterday. Drill into your app with this sort of tool before you start over-complicating your code based on theoretical assumptions.

hth
Jack

Share this post


Link to post
Share on other sites
Zipster    2359
If you want to use alpha blending you're stuck with four channels, so there's not much you can do in terms of saving bandwidth. Disabling color writes on the channels you don't need should speed up rendering though.

If it's that much of an issue, try triple buffering your algorithm and doing the alpha blending manually. You'll only need two-channel textures then, at the cost of an extra texture and the manual blending ops.

BTW, there is a duplicate of this thread further below. I originally posted this reply there, however the duplicate should probably be deleted.

Share this post


Link to post
Share on other sites
Matias Goldberg    9577
I'm no shader expert, but my guess, it will only "save" bandwidth, since shader units makes use of paralellized instructions. I believe it will compute 4 floats in the same time as computing 2, because it wouldn't be operating at full capacity.
As for bandwidth, your R component would later need to be copied to the B and G component for the final result, since that's how video cards work (okok... except you're using YUV front buffers or something similar) so you'll lose the performance gain somewhere.


I hope I'm Right :P
Dark Sylinc

PS: What I'm trying to say, is that I don't think it's worth optimizing it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this