Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 16 Aug 2004
Offline Last Active Apr 02 2013 08:28 PM

Topics I've Started

Dynamic runtime linking of DirectX 11

15 August 2011 - 08:03 PM

Is there anyway to dynamically link against the DirectX 11 DLLs in a way that's error tolerant ? i.e. I can detect if the d3d11.dll exists on a system and gracefully fail

I'm assuming d3d11.dll won't exist if you don't have a DirectX 11 gfx card/driver installed ? I can use delayed loading to mitigate the problem, but I'll still get a runtime error when I try and call D3D11CreateDevice and it doesn't exist.

For a simple C API I could use LoadLibrary to dynamically bind APIs at run time e.g. for imagehlp.dll:
 hImagehlpDll = LoadLibraryA( "imagehlp.dll" );
 if ( hImagehlpDll == NULL )
   return 0;//DLL not found gracefully exit

  //Bind functions as C callback functions that can then be used as if they were statically linked C functions
 pSGLFA = (tSGLFA) GetProcAddress( hImagehlpDll, "SymGetLineFromAddr" );

 //Execute the function
 pSGLFA( hProcess, (DWORD)addrPC, &offsetFromSymbol, &Line )

But how do I do this for a complicated C++ API like DirectX 11 ?

FXC /Fh option doesn't seem to work

04 August 2011 - 09:25 PM

I'm trying to precompile my FX file to avoid long compile times at runtime. I'm trying to embed the compiled output in a C header file (as this would make the loading process MUCH easier as I wouldn't have to mess around with loading asset files at runtime).

According to the FXC docs the /Fh option should do this (at least as I read the docs):

/Fh <file>Output header file containing object code.

But for me this has exactly the same results as /Fc, e.g. output an assembly code listing.

My command line looks like this (PS is the entry point to my pixel shader):

fxc /FhFoo.h /EPS /Tps_4_0 Foo.fx

Any ideas ? Anyone else got the /Fh option to output a C friendly header file ? Or am just misreading the docs ?

Weird results from using a D3D11_QUERY_EVENT Query

10 June 2011 - 12:24 PM

I am trying to use the Query system in DirectX 11 to monitor when DX commands have completed. But I get weird results, basically when I wait on an event it appears the context will always wait until ALL the currently submitted commands have finished, even if the the commands in question were submitted AFTER the event I'm waiting on.

Basically I do something like this, where run a small number of DX commands, enqueue an event, then run a large number of DX commands, and then enqueue a second event . No matter how much more work I between the two events. It will always wait a long time for the first event, and almost no time at all for the second event. This which implies to me its waiting for ALL the commands to finish when I wait on the first command not just the ones that were submitted before the first event. Is that the expected behaviour, or am I doing something wrong ? Is there a way to make DX wait for just the commands that are submitted before the event in question ?

	//Make sure all previous commands have finished
 	while( g_pImmediateContext->GetData( g_pEventQuery0, NULL, 0, 0 ) == S_FALSE ) {} 
	double time0 = GetTimeMilliSecs();

   //Run a small number of DX commands

   //Run a large number of DX commands


   //Wait for first event
	while( g_pImmediateContext->GetData( g_pEventQuery1, NULL, 0, 0 ) == S_FALSE ) {} 
	double time1 = GetTimeMilliSecs();

      //Wait for second event
     while( g_pImmediateContext->GetData( g_pEventQuery2, NULL, 0, 0 ) == S_FALSE ) {} 
	double time2 = GetTimeMilliSecs();

	double t0= time1-time0;
	double t1= time2-time1;

	printf("%f %f\n",t0,t1);

In this example wait times for the two events are:
t0 0.69156749546527863
t1 0.00030016526579856873

I create my events like this:
	D3D11_QUERY_DESC pQueryDesc;
	pQueryDesc.Query = D3D11_QUERY_EVENT;
	pQueryDesc.MiscFlags = 0;

	g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery0);
	g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery1);
	g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery2);

DirectX GPU upload/download bandwidth

26 May 2011 - 06:23 PM

Hi all....

I'm trying to implement a complicated GPGPU operation using DirectX 11 (Tried DirectCompute/CUDA with limited success). I'm making good progress, but the bottleneck is uploading the data to the GPU and downloading the results from the GPU. Basically my operation involves:
  • Uploading a 128x64 float4 texture to the GPU
  • Render a 64x64 full screen quad with a trivial pixel/vertex shader
  • Download the resulting 64x64 float4 frame buffer
I've tried various high spec cards (currently using a GeForce GTX 480) and various ways of parallelizing the operation (so more than one "job" is running concurrently) but the fastest I can this operation to happen is about 1ms or so . If i remove the upload and download step (and just wait for the quad to render), then the operation takes around 0.15ms, so 85% of my time is being spent in upload/download,

It seems like I should be able to get much faster bandwidth than this (1ms per job means I can only perform 16 "jobs" in a 16ms frame :( ), given the published bandwidth numbers of this kind of card. Am I being too optimistic about how long this kind of thing should take ? Or I am doing something dumb in my DirectX code ?

I'm uploading my input data as a 64x128 DXGI_FORMAT_R32G32B32A32_FLOAT texture. I create the input texture with usage D3D11_USAGE_DYNAMIC and CPUAccessFlags D3D11_CPU_ACCESS_WRITE. To write my data I map the texture (passing in usage as D3D11_MAP_WRITE_DISCARD) and do a memcpy.

To download my result I create a 64x64 render-target texture (also DXGI_FORMAT_R32G32B32A32_FLOAT) with usage D3D11_USAGE_DEFAULT and CPUAccessFlags 0. I use CopyResource to copy the RT to a second staging texture (Usage D3D11_USAGE_STAGING, CPUAccessFlags D3D11_CPU_ACCESS_READ). then do a map on the staging texture.

In order to get some parallelism (and hopefully some kind of pipelining where one operation is uploading while one is downloading) I tried several things. I have tried having several sets of input/output/RT textures, invoking the draw, etc. commands on all of them, then doing the RT map/unmap. I've also tried doing the same except using deferred contexts to do the upload via a command list built in a different thread. But can't average any faster than 1ms per job no matter how many concurrent jobs I try and run.

Any ideas ? I can include more source code if anyone is interested.

Thanks all