- Viewing Profile: Reputation: pcmaster
Community Stats
- Group Members
- Active Posts 140
- Profile Views 1,414
- Member Title Member
- Age Age Unknown
- Birthday Birthday Unknown
-
Gender
Male
-
Location
Prague, Czech Republic
User Tools
Contacts
pcmaster hasn't added any contacts yet.
Latest Visitors
#4981650 Why are most games not using hardware tessellation?
Posted by pcmaster
on 19 September 2012 - 06:05 AM
#4971379 f32tof16 confusion
Posted by pcmaster
on 20 August 2012 - 02:21 AM
float2 toBeQuantised(333.333, 666.666); uint half1 = f32to16(toBeQuantised.x); uint half2 = f32to16(toBeQuantised.y); uint twoHalfs = half1 | (half2 << 16);
But this doesn't make that much sense or use, in addition to what Kauna said :-)
#4946972 Structured buffer float compression
Posted by pcmaster
on 07 June 2012 - 01:13 AM
#4946733 Structured buffer float compression
Posted by pcmaster
on 06 June 2012 - 06:16 AM
Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.
There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.
I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.
#4940427 [SOLVED] Disabling interpolation of vertex attributes causes error
Posted by pcmaster
on 15 May 2012 - 09:06 AM
Do
"float lightRange : RANGE"
and
"nointerpolation float lightRange : RANGE"
look the same to you? They don't. For the linker they don't either :-)
I recommend using the very same struct on both GS output and PS input, you'll save yourself trouble. Otherwise you'll have to add the "nointerpolation" keyword to GS_OUT matching members, too.
#4936754 Speed up shader compilation (HLSL)
Posted by pcmaster
on 02 May 2012 - 07:53 AM
- manually unroll loops (works better (in terms of compilation time) than using [unroll], [fastopt] or whatever compiler hints)
- especially true for nested loops!
- the deeper the called function, the worse
- look for redundant texture sampling which could be pulled up from loops or functions - you'll get cache hit, however it will compile longer
What doesn't help (neither compilation speed nor performance):
- trying to manually optimise ALU operations
I guess most of this will be true for DX9, too.
#4928755 FBO and RBO and how they are used
Posted by pcmaster
on 06 April 2012 - 06:25 AM
The OpenGL terminology actually is way more complicated than what I've just presented, study it thoroughly here:
http://www.opengl.or...mebuffer_Object
http://www.songho.ca...ngl/gl_fbo.html
A short answer to the difference between GL FBO and RBO:
There is one active FBO that is the target of all rendering output and it might "contain" several target textures - a colour, another colour texture, maybe yet another texture to store anything auxiliary, a depth (all these are called FB attachments)... You can attach basically "any" number of any textures or RBOs to a FBO at once.
A RBO is a single texture and is one of attachments to a FBO. A RBO content can be modified exclusively by rendering to it while attached to a FBO (possibly with other RBOs or textures or not). RBO content can then be copied to another texture (so called "unpacking"). RBO doesn't have mip-maps. RBO cannot be pre-initialised with any pixel data. I'd use a RBO as a depth buffer (Z-buffer).
An ordinary OpenGL texture can have mip-maps and any of its mip-slices can indeed serve the very same purpose as a RBO, that is serve as a render target.
Also, ordinary textures can serve as "sources" of data in your shaders (actual surface-modifying colour data, normals or anything at all). RBOs are "destination-only". And FBOs, again, encapsulate various textures and/or RBOs and as such don't posses any own data.
Complicated, huh?
#4925315 Path Tracing BSDF
Posted by pcmaster
on 26 March 2012 - 05:59 AM
Regarding SSS you'll have to read some papers on that, I'm afraid. I could help just with realtime rasterised SSS (mostly for skin, which is quite fake but nice and fast
#4882151 GLSL get vec4 component
Posted by pcmaster
on 09 November 2011 - 09:48 AM
Just in case since it isn't obvious what you're asking, according to my knowledge, you cannot index your built-in vector/matrix types' components by variables (nor literals) in current shading language (such as vec3 v; float x = v[2]).
#4877209 [Dx11] InterlockedAdd on floats in Pixel Shader - Workaround?
Posted by pcmaster
on 26 October 2011 - 08:57 AM
[loop]do // critical section enter (alias mutex::lock())
{
uint orig;
InterlockedCompareExchange(mutex[x,y], 0, 1, orig);
if (orig == 0) // this means the exchange succeeded! you own the "mutex"
break; // mutex[x,y] now equals 1
} while (1);Then tamper the float4 texture at [x,y]. Read it. Modify the value. Write it back. Nobody else will touch it in the meantime. After you're done, call InterlockedCompareExchange(mutex[x,y], 1, 0, dummy); // critical section leave (alias mutex::unlock())Since we made sure that mutex[x,y]==1, this will exchange its value to 0. This is a signal for the other threads waiting in the loop for this location, that the mutex is "free" and one of them can enter the critical section. I claim this is actually the same serialisation that the GPU thread scheduler or whatever name would do anyway -- if many want to access the same critical location, they have to queue up.
I have not done this before, I mean not with DX11 (I did something similar with OpenCL). I have mixed experience with such "complex" shaders and DX11 (fxc.exe), so I have no idea whether this will actually work but to me it now seems legit :-) I'm NOOOOOOT sure whether this will work with Pixel Shader but in a Compute Shader (or OpenCL or CUDA), this really should work. The main problem might be in the eternal loop, which is something the optimiser doesn't seem to like at all
#4874988 Gamma correction in OpenGL
Posted by pcmaster
on 21 October 2011 - 03:27 AM
#4861520 geometry shader discard count stream out
Posted by pcmaster
on 14 September 2011 - 06:33 AM
One way is to use ID3D11Device::CreateQuery() with D3D11_QUERY_SO_STATISTICS_STREAM0, stream-out (i.e. issue a draw-call with stream-out), and finally ID3D11Device::GetData() and look into D3D11_QUERY_DATA_SO_STATISTICS::NumPrimitivesWritten. Other thing is to use ID3D11DeviceContext::DrawAuto(), which will automatically determine the amount of data in a buffer that was previously used for stream-out (you'll connect this buffer to input assembler stage).
A query might inflict a performance penalty, as the driver will have to finish some things and might let the CPU wait. On the other hand, drawAuto will not tell the CPU how much was rendered/generated. They are two completely different things but I thought it might have to do something with your question.
#4856198 OpenGL 1,2,3,4 general question
Posted by pcmaster
on 01 September 2011 - 02:47 AM
Truth is that extensions get into OpenGL quicker (in fact you have to wait for Microsoft until they make up their minds to use anything the new cards support!!!).
Khronos don't change the whole API as much as Microsoft every time, fortunately. New features (functions) become available, some are deprecated, some finally removed. The whole concept persists. Same goes for OpenCL.
Start learning directly OpenGL 4. No mather what, do NOT look at OpenGL 1.x, ever :-) That, unfortunately, disqualifies most of the famous NeHe tutorials, for example, hehe. Start with desktops, learn basics and do not touch OpenGL ES (mobile) before that much, if you plan to.
#4823592 Rectangle spreading blur (not only for DoF)
Posted by pcmaster
on 15 June 2011 - 07:30 AM
I wonder if anyone of you read the 2009-2010 papers from Kosloff and Barsky on rectangle spreading. I'm having problems with some small details in "Depth of Field Postprocessing For Layered Scenes Using Constant-Time Rectangle Spreading" paper (http://www.cs.berkel...lur/kosloff.pdf). Concretely, Fig 3 bottom, which represents the normalisation table and then (therefore) with variable per-pixel blur radii (e.g. coming from CoC), and then in general with arbitrary PSFs (but that's another story).
I need to understand, why is the normalisation image a pixel wider (in each direction) than the original input image, how will this change if a smaller or larger kernel is used and ultimately what will happen with these extra pixels, which are in fact out of the input image, when variable blur will be used (Fig 3 has a constant PSF 3x3 "kernel").
I'm unable to find any implementation of any spreading (scattering) blur algorithm, including their DX10 implementation, which they mention (DX, GL, C++, Matlab, ... anything would be helpful).
Anyone feeling like reading the paper and helping me out by discussing it here?
- Home
- » Viewing Profile: Reputation: pcmaster

Find content