Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 16 Mar 2000
Offline Last Active Jul 02 2009 11:03 PM

#4422418 HLSL ps_2_0 running shader twice

Posted by on 18 March 2009 - 01:57 AM

Original post by woytah
Well as i can see it has no effect. Maybe because it is working on the same sampler and therefore second pass outputs the same result as the first one.
Quite possible - you'd want to use render-to-texture and a technique most people refer to as "ping ponging". In the first pass you render from A to B and in the second you render from B to A, thus the 2nd pass gets to see the output of the 1st. It requires intervention by the application to manage render targets though.

Original post by woytah
If second pass would write something to frame buffer, i could then take what is already in frame buffer and use it in second pass in shader... but is that possible?
No, you can't do this. Direct3D is quite strict about the read/write permissions such that you can't read from a source whilst you're also wanting to write to it. It's due to this restriction that the aforementioned 'ping pong' technique exists.

You should be able to write a blur shader in a single pass in ps_2_0. A lot of people will use a two-pass gaussian filter as its seperable and more efficient but you can still do it in a single pass if necessary. The main limitation here is that you can only do 32 tex2D()'s in ps_2_0 which for a regular grid limits you to a 5x5 kernel.

There was a paper by ATI from several years ago that allowed you to double your effective sample count via clever placement of sampling coordinates. This along with a sparser sample grid should allow you to blur a pixel with much more source data.

In particular, for a linearly filterable texture (basically anything except FP formats in D3D9) you can place the sample point in the middle of a pair or quad of pixels and the result is the average of all underlying pixels.

| x | x |
| x | x |

| | |
| | |

In the above diagram you make four samples (the x's) and then average them out in your own code. In the bottom diagram you make one sample (the x in the middle) and the value returned is already the average of all the 4 texels - basically saving you a bunch of TMU and ALU operations that you can then invest elsewhere [grin]


#4117038 Generate Normal map from heightmap algorithm

Posted by on 09 December 2007 - 02:00 AM

Original post by Zipster
I've never implemented one myself, but I believe that many normalmap generators are based on finding the gradients of the image, ala edge detection. For instance, the Sobel operator is a classic example even though it might not be the most accurate. The implementation would be pretty straightforward from there I would think.
Sobel filter is pretty straight forward to implement and the quality is ok. For reference, here's my SM4 code that implements 3 approaches to fetching a per-pixel normal:
float3 FetchNormalVector( float2 tc, uniform bool readFromTexture, uniform bool useSobelFilter )
if( readFromTexture )
// Use the simple pre-computed look-up
float3 n = texNormalMap.Sample( DefaultSampler, tc ).rgb;
return normalize( n * 2.0f - 1.0f );
if( useSobelFilter )
Coordinates are laid out as follows:

0,0 | 1,0 | 2,0
0,1 | 1,1 | 2,1
0,2 | 1,2 | 2,2

// Compute the necessary offsets:
float2 o00 = tc + float2( -vPixelSize.x, -vPixelSize.y );
float2 o10 = tc + float2( 0.0f, -vPixelSize.y );
float2 o20 = tc + float2( vPixelSize.x, -vPixelSize.y );

float2 o01 = tc + float2( -vPixelSize.x, 0.0f );
float2 o21 = tc + float2( vPixelSize.x, 0.0f );

float2 o02 = tc + float2( -vPixelSize.x, vPixelSize.y );
float2 o12 = tc + float2( 0.0f, vPixelSize.y );
float2 o22 = tc + float2( vPixelSize.x, vPixelSize.y );

// Use of the sobel filter requires the eight samples
// surrounding the current pixel:
float h00 = texHeightMap.Sample( DefaultSampler, o00 ).r;
float h10 = texHeightMap.Sample( DefaultSampler, o10 ).r;
float h20 = texHeightMap.Sample( DefaultSampler, o20 ).r;

float h01 = texHeightMap.Sample( DefaultSampler, o01 ).r;
float h21 = texHeightMap.Sample( DefaultSampler, o21 ).r;

float h02 = texHeightMap.Sample( DefaultSampler, o02 ).r;
float h12 = texHeightMap.Sample( DefaultSampler, o12 ).r;
float h22 = texHeightMap.Sample( DefaultSampler, o22 ).r;

// The Sobel X kernel is:
// [ 1.0 0.0 -1.0 ]
// [ 2.0 0.0 -2.0 ]
// [ 1.0 0.0 -1.0 ]

float Gx = h00 - h20 + 2.0f * h01 - 2.0f * h21 + h02 - h22;

// The Sobel Y kernel is:
// [ 1.0 2.0 1.0 ]
// [ 0.0 0.0 0.0 ]
// [ -1.0 -2.0 -1.0 ]

float Gy = h00 + 2.0f * h10 + h20 - h02 - 2.0f * h12 - h22;

// Generate the missing Z component - tangent
// space normals are +Z which makes things easier
// The 0.5f leading coefficient can be used to control
// how pronounced the bumps are - less than 1.0 enhances
// and greater than 1.0 smoothes.
float Gz = 0.5f * sqrt( 1.0f - Gx * Gx - Gy * Gy );

// Make sure the returned normal is of unit length
return normalize( float3( 2.0f * Gx, 2.0f * Gy, Gz ) );
// Determine the offsets
float2 o1 = float2( vPixelSize.x, 0.0f );
float2 o2 = float2( 0.0f, vPixelSize.y );

// Take three samples to determine two vectors that can be
// use to generate the normal at this pixel
float h0 = texHeightMap.Sample( DefaultSampler, tc ).r;
float h1 = texHeightMap.Sample( DefaultSampler, tc + o1 ).r;
float h2 = texHeightMap.Sample( DefaultSampler, tc + o2 ).r;

float3 v01 = float3( o1, h1 - h0 );
float3 v02 = float3( o2, h2 - h0 );

float3 n = cross( v01, v02 );

// Can be useful to scale the Z component to tweak the
// amount bumps show up, less than 1.0 will make them
// more apparent, greater than 1.0 will smooth them out
n.z *= 0.5f;

return normalize( n );
Should be pretty straight-forward which bits are relevant. In general the pre-generated normal map from a file (first branch) was highest quality with Sobel second highest and the third the worst (but still not terrible). Performance was acceptable in all cases, but obviously the TMU usage for Sobel makes it a little slower.


#4114146 Dynamic Branching in HLSL

Posted by on 04 December 2007 - 10:09 PM

To elaborate a bit, you'll often find it boils down to the cmp instruction once compiled.

so instead of something like:

if( condition )
result = some_complex_function();
result = some_other_complex_function();

Where you'd expect it to only execute one of the branches, you'll get:

result_a = some_complex_function();
result_b = some_other_complex_function();
result = cmp( condition, result_a, result_b );

In this situation there is no dynamic branching and you execute BOTH branches.


#4113524 Dynamic Branching in HLSL

Posted by on 04 December 2007 - 12:39 AM

Original post by StarStudded
hmmm, what exactly do you mean by batches? Does this refer to areas on the screen or (in my case) the receiving surfaces of different objects?
A batch, as I understand it, is a grouping of pixels that are in flight at the same time - their start/finish is synchronized to some degree. Exactly what size the batch is and how it is defined (e.g. a 2x3 area of screen space or a 16x16 area..) seems to vary across architectures and isn't something I know in great detail unfortunately. It would appear to have some optimal ratio to the number of pixel shading units, ROPs and TMUs a GPU has.

Original post by StarStudded
If this is the case, how could I modify my code to take advantage of batching.
I've not seen anything to suggest that you can except for using this knowledge to inform your shader design. That is, if you expect the conditional in a branch to vary on a per-pixel or per-every-other-pixel basis then it might not be worth putting the branch in. However, if the conditional might only change every 100 pixels then it becomes a more clear cut case.

Maybe some sort of thresholding would help - instead of "if( all_in_shadow )" try "if( most_in_shadow )"? e.g. using a lt/gt instead of eq operator.

Original post by StarStudded
Oh, hey, here's an idea. What if there was a way to sense in the program (roughly or precise) which objects were receiving shadows or shadow edges. Those objects that were not occluded would not have to preform such high PCF.
Nice idea, but I would suspect (feel free to prove me wrong [wink]) that the amount of work to detect this situation would outweigh the advantage you'd get by reducing the workload on those few pixels...


#3952900 SlimDX -- A Prototype MDX Replacement Library

Posted by on 01 May 2007 - 10:09 AM

Original post by Demirug
As I am prefer not to move to far away from the original concepts of both APIs writing a multi API application would as complex as with C++.
I agree - just because .NET is a higher level language doesn't magically make this sort of thing easier.

It was difficult to make my idea clear, but basically I meant having the same design philosophy for MD3D9 and MD3D10 rather than making them source-code compatible or making some trivial "auto-porting" API.

Say a D3D9 developer picks up Promit's MD3D9 API and later wants to check out D3D10 so moves over to Ralf's MD3D10 implementation. If they were somehow 'aligned' then it'd make this transition a whole lot easier, rather than having to go back to square-1 and re-learn a whole different way of interacting with what is, under the covers, a fundamentally similar API.

Anyway... I'm going to stick this thread for a bit to encourage some further discussion. I get the distinct impression there are various members of the community who want to do something about Managed DirectX yet there seem to be a number of blocking factors involved in actually getting it moving forward. Maybe putting this topic in the spotlight will generate the right sort of interest to get things rolling...?


#3952395 SlimDX -- A Prototype MDX Replacement Library

Posted by on 30 April 2007 - 10:01 PM

Whilst it wouldn't be easy, I think there would be an enormous amount of value in getting any MD3D9 and MD3D10 interfaces similar.

Obviously they can't be identical - but to use the same design guidelines, rules and so on could smooth out transitions as well as help those cross-targetting 9 and 10 (or would that explode on dependencies?).


#3941334 [hlsl] Cook-Torrance lighting

Posted by on 17 April 2007 - 11:23 AM

Original post by Lifepower
Jack, could you share your Cook-Torrance code for D3D9? [smile]
hmm, well maybe... just maybe...

float4 psCookTorrance( in VS_LIGHTING_OUTPUT v ) : COLOR
// Sample the textures
float3 Normal = normalize( ( 2.0f * tex2D( sampNormMap, v.TexCoord ).xyz ) - 1.0f );
float3 Specular = tex2D( sampSpecular, v.TexCoord ).rgb;
float3 Diffuse = tex2D( sampDiffuse, v.TexCoord ).rgb;
float2 Roughness = tex2D( sampRoughness, v.TexCoord ).rg;

Roughness.r *= 3.0f;

// Correct the input and compute aliases
float3 ViewDir = normalize( v.ViewDir );
float3 LightDir = normalize( v.LightDir );
float3 vHalf = normalize( LightDir + ViewDir );
float NormalDotHalf = dot( Normal, vHalf );
float ViewDotHalf = dot( vHalf, ViewDir );
float NormalDotView = dot( Normal, ViewDir );
float NormalDotLight = dot( Normal, LightDir );

// Compute the geometric term
float G1 = ( 2.0f * NormalDotHalf * NormalDotView ) / ViewDotHalf;
float G2 = ( 2.0f * NormalDotHalf * NormalDotLight ) / ViewDotHalf;
float G = min( 1.0f, max( 0.0f, min( G1, G2 ) ) );

// Compute the fresnel term
float F = Roughness.g + ( 1.0f - Roughness.g ) * pow( 1.0f - NormalDotView, 5.0f );

// Compute the roughness term
float R_2 = Roughness.r * Roughness.r;
float NDotH_2 = NormalDotHalf * NormalDotHalf;
float A = 1.0f / ( 4.0f * R_2 * NDotH_2 * NDotH_2 );
float B = exp( -( 1.0f - NDotH_2 ) / ( R_2 * NDotH_2 ) );
float R = A * B;

// Compute the final term
float3 S = Specular * ( ( G * F * R ) / ( NormalDotLight * NormalDotView ) );
float3 Final = cLightColour.rgb * max( 0.0f, NormalDotLight ) * ( Diffuse + S );

return float4( Final, 1.0f );

The above is a straight copy-n-paste from my final year's disseration at the University of Nottingham. From empirical testing it appears to be correct, but I can't say I exhaustively tested all scenarios.


#3613624 Vertex cache

Posted by on 22 May 2006 - 10:57 PM

Yeah, it is pretty much as simple as you said [smile]

I recently put together a vertex cache demo. Read the main article and the follow up for details. The second part has updated code with full examples.

Bare in mind that the cache size changes between different GPU's. Nvidia's earlier chips were 16 elements, the more recent being 24 elements. ATI's appear to be 14 elements - but I dont have any solid evidence for that (damn ATI for not supporting VCache queries [flaming]).

Original post by ClementLuminy
Do you know a little library which sort any IB in order to be used with Vertex Chache ??
As sirob suggested, you've got the Optimize methods for ID3DXMesh... but you've also got D3DXOptimizeFaces() and D3DXOptimizeVertices() for non-mesh geometry.


#362915 [C++] D3D/PIX profiling helper

Posted by on 08 December 2005 - 05:48 AM

Evening all, I've been doing some work on one of my projects and came up with a nifty little trick that I thought I'd share with you guys. Maybe you'll find it useful.. I would hope that everyone who's using a reasonably up-to-date version of Direct3D 9 has experimented with PIX for Windows. If not, why not? [smile] A lesser known set of features are the D3DPERF_BeginEvent(), D3DPERF_SetMarker() and D3DPERF_EndEvent() API calls. I covered them a while back in my developer journal. What I've written is hardly rocket-science, but it's one of those (I think) more useful uses of the C/C++ preprocessor. Using the following bits of code you can add a PROFILE_BLOCK in your code and it'll do the rest for you - including making sure that it cleans up correctly. D3DUtils.h (Download directly from here)
#include "dxstdafx.h"

#ifndef INC_D3DUTILS_H
#define INC_D3DUTILS_H

// These first two macros are taken from the
// VStudio help files - necessary to convert the
// __FUNCTION__ symbol from char to wchar_t.
#define WIDEN2(x) L ## x
#define WIDEN(x) WIDEN2(x)

// Only the first of these macro's should be used. The _INTERNAL
// one is so that the sp##id part generates "sp1234" type identifiers
// instead of always "sp__LINE__"...
#define PROFILE_BLOCK_INTERNAL(id) D3DUtils::ScopeProfiler sp##id ( WIDEN(__FUNCTION__), __LINE__ );

// To avoid polluting the global namespace,
// all D3D utility functions/classes are wrapped
// up in the D3DUtils namespace.
namespace D3DUtils
	class ScopeProfiler
			ScopeProfiler( WCHAR *Name, int Line );
			~ScopeProfiler( );

			ScopeProfiler( );

D3DUtils.cpp (Download directly from here)
#include "dxstdafx.h"
#include "D3DUtils.h"

#include <time.h>

namespace D3DUtils
	// Class constructor. Takes the necessary information and
	// composes a string that will appear in PIXfW.
	ScopeProfiler::ScopeProfiler( WCHAR* Name, int Line )
		StringCchPrintf( wc, MAX_PATH, L"%s @ Line %d.\0", Name, Line );
		D3DPERF_BeginEvent( D3DCOLOR_XRGB( rand() % 255, rand() % 255, rand() % 255 ), wc );
		srand( static_cast< unsigned >( time( NULL ) ) );

	// Makes sure that the BeginEvent() has a matching EndEvent()
	// if used via the macro in D3DUtils.h this will be called when
	// the variable goes out of scope.
	ScopeProfiler::~ScopeProfiler( )
		D3DPERF_EndEvent( );
A few notes:
  1. Just do a #include "D3DUtils.h" in the code you want to use it
  2. A random colour is created for the sampler, but PIXfW doesn't currently make use of this.
  3. I've used the dxstdafx.h PCH file that you find in the SDK. If you're not using this, then make sure you replace it with d3dx9.h, windows.h and math.h.
  4. PIXfW only monitors Direct3D calls, so theres no point in using this code to watch sections that don't contain any D3DX/D3D calls!
The usage is pretty simple. It creates an instance of ScopeProfiler on the stack such that it's destructor will automagically get called when it goes out of scope. The destructor contains a D3DPERF_EndEvent(), making sure that the D3DPERF_BeginEvent() is correctly matched...
// Watch an entire function:
void CALLBACK OnFrameRender( IDirect3DDevice9* pd3dDevice, double fTime, float fElapsedTime, void* pUserContext )
    // other code goes here

// Watch a specific subset:
    V( g_HUD.OnRender( fElapsedTime ) );

// Use the class directly to override the default-generated
// name:
if( g_SettingsDlg.IsActive() )
    D3DUtils::ScopeProfiler( L"OnFrameRender() - Setting Dialog Rendering", __LINE__ );
    g_SettingsDlg.OnRender( fElapsedTime );
The result, when you run a PIX Full Call Stream Capture:
(The events added by the program are highlighted in pink)
Feel free to do whatever you want with the code. Use it and abuse it - At your own risk of course [wink] Cheers, Jack