• Content count

  • Joined

  • Last visited

Community Reputation

125 Neutral

About griffin77

  • Rank
  1. OpenGL Library for ray-tracing

    OpenRL is a good option if you are familiar with the OpenGL API. As it is heavily based on OpenRL.   There is also a *fairly* responsive support forum:
  2. Dynamic runtime linking of DirectX 11

    [quote name='yckx' timestamp='1313461136' post='4849667'] Apparently the same way, according to [url=""]this AppHub thread[/url]. [/quote] Thanks for that. There is in interesting link from MS in that thread that talks about a DLL from Microsoft that does alot of this (but its an separate DLL you'd need to distribute with your app ): [url=""]http://msdn.microsof...y/ee416644.aspx[/url] I actually came up with a pretty simple solution to this. I use LoadLibrary to verify if the d3d11.dll exists, but I never use it. I just gracefully exit if its not found without calling any DX functions. As the DX DLL is delay-loaded this works fine, and if DirectX 11 does not exist on the computer my code is run on, then none of the delay loaded functions are executed, and I don't hit any errors.
  3. Is there anyway to dynamically link against the DirectX 11 DLLs in a way that's error tolerant ? i.e. I can detect if the d3d11.dll exists on a system and gracefully fail I'm assuming d3d11.dll won't exist if you don't have a DirectX 11 gfx card/driver installed ? I can use delayed loading to mitigate the problem, but I'll still get a runtime error when I try and call D3D11CreateDevice and it doesn't exist. For a simple C API I could use LoadLibrary to dynamically bind APIs at run time e.g. for imagehlp.dll: [code] hImagehlpDll = LoadLibraryA( "imagehlp.dll" ); if ( hImagehlpDll == NULL ) return 0;//DLL not found gracefully exit //Bind functions as C callback functions that can then be used as if they were statically linked C functions pSGLFA = (tSGLFA) GetProcAddress( hImagehlpDll, "SymGetLineFromAddr" ); //Execute the function pSGLFA( hProcess, (DWORD)addrPC, &offsetFromSymbol, &Line ) [/code] But how do I do this for a complicated C++ API like DirectX 11 ?
  4. That is my understanding too. That there is magic going on behind the scenes when you compile a pixel shader that converts the pixel shader code into low-level GPU instructions that use shared memory and the like (basically everything you have to do yourself when you write CS or Cuda code). I have used DeviceMemoryBarrier() in a pixel shader, the documentation is VERY sketchy. As I understand it this is basically a hint to tell the compiler all the GPU threads in the current block should finish accessing globak memory before continuing. Used correctly this should reduce the memory access overhead associated with different threads accessing global memory. But without a coherent description of exactly what this means in the context of pixel shader its difficult to know if I'm using it correctly. Does anyone know of a good description of what this function means in the context of a pixel shader ?
  5. [DX10] FX File Deployment

    [quote name='NeonLibra' timestamp='1312516107' post='4844854'] Since I'm obviously new to this whole world of programming and skimming through various tutorial libraries I don't see a lot of discussion on this, I have another question. Is it possible to contain this within the binary? Keep everything nice and self contained, etc. [/quote] The way I describe above (where the shader binary is embeded in the C header file) will do this
  6. [DX10] FX File Deployment

    As demonstrated by [url=""]my rather dumb post[/url] I am embeding my shader files directly in my C headers. This avoids any runtime loading/compiling, which makes life easier (of course you usually need this for other games assets, shaders are a weird intermediate step between code and assets). Of course runtime loading/compiling can be a godsend if you want to edit shaders on the fly and see the results in real time.
  7. FXC /Fh option doesn't seem to work

    Derp....... user error It WAS outputing a C-friendly header, but it was embedding the assembly listing at the top surrounded by #if 0...#endif
  8. FXC /Fh option doesn't seem to work

    Sorry about double post. Not sure what happened there. If the mods can delete one of them, that would be cool.
  9. I'm trying to precompile my FX file to avoid long compile times at runtime. I'm trying to embed the compiled output in a C header file (as this would make the loading process MUCH easier as I wouldn't have to mess around with loading asset files at runtime). According to the FXC docs the /Fh option should do this (at least as I read the docs): [quote]/Fh <[i]file[/i]>Output header file containing object code.[/quote] But for me this has exactly the same results as /Fc, e.g. output an assembly code listing. My command line looks like this (PS is the entry point to my pixel shader): fxc /FhFoo.h /EPS /Tps_4_0 Foo.fx Any ideas ? Anyone else got the /Fh option to output a C friendly header file ? Or am just misreading the docs ?
  10. [quote name='shiqiu1105' timestamp='1307717351' post='4821728'] I read the book [url=""][font="Verdana"][size="2"]Advanced Lighting and Materials with Shaders[/size][/font][/url] recently, which I found really facinating! It gave me huge insight on Spherical Harmonics Lighting and ray tracing. So I'd really like to further investigate the source code. But I can't find it on google or anywhere else? Can anyone give me a direction? [/quote] I do but don't think I can post it online. You could also try this paper which goes over a lot of the same material and includes source code:
  11. I am trying to use the Query system in DirectX 11 to monitor when DX commands have completed. But I get weird results, basically when I wait on an event it appears the context will always wait until ALL the currently submitted commands have finished, even if the the commands in question were submitted AFTER the event I'm waiting on. Basically I do something like this, where run a small number of DX commands, enqueue an event, then run a large number of DX commands, and then enqueue a second event . No matter how much more work I between the two events. It will always wait a long time for the first event, and almost no time at all for the second event. This which implies to me its waiting for ALL the commands to finish when I wait on the first command not just the ones that were submitted before the first event. Is that the expected behaviour, or am I doing something wrong ? Is there a way to make DX wait for just the commands that are submitted before the event in question ? [code] //Make sure all previous commands have finished g_pImmediateContext->End(g_pEventQuery0); while( g_pImmediateContext->GetData( g_pEventQuery0, NULL, 0, 0 ) == S_FALSE ) {} double time0 = GetTimeMilliSecs(); //Run a small number of DX commands doSmallStuff(); g_pImmediateContext->End(g_pEventQuery1); //Run a large number of DX commands doBigStuff(); g_pImmediateContext->End(g_pEventQuery2); //Wait for first event while( g_pImmediateContext->GetData( g_pEventQuery1, NULL, 0, 0 ) == S_FALSE ) {} double time1 = GetTimeMilliSecs(); //Wait for second event while( g_pImmediateContext->GetData( g_pEventQuery2, NULL, 0, 0 ) == S_FALSE ) {} double time2 = GetTimeMilliSecs(); double t0= time1-time0; double t1= time2-time1; printf("%f %f\n",t0,t1); [/code] In this example wait times for the two events are: t0 0.69156749546527863 t1 0.00030016526579856873 I create my events like this: [code] D3D11_QUERY_DESC pQueryDesc; pQueryDesc.Query = D3D11_QUERY_EVENT; pQueryDesc.MiscFlags = 0; g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery0); g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery1); g_pd3dDevice->CreateQuery( &pQueryDesc, &g_pEventQuery2); [/code]
  12. DirectX GPU upload/download bandwidth

    [quote name='mhagain' timestamp='1306498190' post='4816401'] [quote name='Matias Goldberg' timestamp='1306462759' post='4816281'][b]Edit:[/b] Also looks to me you're stalling the GPU by breaking async transfers. I strongly suggest you [url=""]read this[/url] (performance considerations section).[/quote] This can't be emphasised enough. Bandwidth is only one consideration for this kind of use case, and is most often not the most relevant one. You can have all the bandwidth in the world, you can have minimal latency, but if you need to stall the GPU you're still going to suffer. [/quote] Yeah that's why I run multiple concurrent jobs. I basically kick of ton of jobs at the same time, and only once they are all churning away do I start waiting for results. While I'm stalled waiting for the map to get the results of the first job, the rest can still be running. It seems to be working pretty well. There just seems to be some weird warm up cost the first time you do a read or write to a GPU buffer, once I do a dummy run to get rid of that, bandwidth/latency are not a problem. Now I'm down to 0.1ms per job I think GPU compute will become my bottleneck (once I implement a non-trivial pixel shader to do the work I want to do)
  13. DirectX GPU upload/download bandwidth

    Though you were right about the 480 BTW. That 0.1ms figure if for my 580. My 480 is half that speed (0.2ms)
  14. DirectX GPU upload/download bandwidth

    OK so I think I figured this out. Thanks for all the advice. My issue appears to be that the GPU really likes to be "warmed up" before doing any real computation. Obviously something is happening behinds the scenes the first time you do a Map or UpdateSubresource that makes the first run really slow. My loop looks something like this: [code] for(int i=0;i<NUM_THREADS;i++) { g_pD3DContext->UpdateSubresource(g_inputTexture[i], 0, NULL, g_inputData[i], g_jobDim*16*g_qwordsPerInputItem , g_jobDim*16 ); g_pD3DContext->PSSetShaderResources( 0, 1, &g_inputTextureRV[i] ); g_pD3DContext->PSSetSamplers( 0, 1, &g_pSamplerLinear ); g_pD3DContext->OMSetRenderTargets( 1, &g_renderTargetView[i], NULL ); g_pD3DContext->Draw( 4, 0 ); g_pD3DContext->CopyResource( g_stagingBuffer[i], g_renderTargetTexture[i] ); } for(int i=0;i<NUM_THREADS;i++) { D3D11_MAPPED_SUBRESOURCE mappedResource; g_pD3DContext->Map( g_stagingBuffer[i], 0, D3D11_MAP_READ, 0, &mappedResource); float *outputDataGPU = (float*)(mappedResource.pData); memcpy(outputDataCPU[n*NUM_THREADS + i],outputDataGPU, g_outputDataSize); g_pD3DContext->Unmap( g_stagingBuffer[i], 0); } [/code] Basically if just run this a couple of times (passing dummy input data and ignoring the result of the final Map statement) before I run it "in anger" then after that I can run my jobs in less than 0.1ms.
  15. DirectX GPU upload/download bandwidth

    [quote name='Matias Goldberg' timestamp='1306462759' post='4816281'] Oh just realized. You're using Geforce 480. That card [url=""]SUCKS[/url] [url=""]BAD[/url] [url=""]in[/url] [url=""]GPU->CPU[/url] transfers. It's a common problem in the 400 series (except Quadro version, pretty lame if you ask me). Find another card and try again. Or find a way not to use GPU->CPU transfers that much [/quote] Ahh that is good to know. Though I've also tried on my GTX 580 with similar results (will double check tonight though). Does that have the same problem? I also have a Quadro 4000 somewhere I can try (but when I was attempting to do this in CUDA/OpenCL that actually got much work performance than may GTX 480)