Sign in to follow this  

OpenGL DX11 - Shaders - Writable Textures

This topic is 1577 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys, wink.png

 

I'm researching techniques for scene voxelization, and so far, only few papers has been found, but I found a great one! So now I'm in the process of creating the shader, but a problem stroke my mind, the paper is based on OpenGl (Theory is the same), and OpenGl has completely re-writable textures, in the shader, like so:

imageStore(voxelColor, voxelPos, outColor);

// voxelColor -> 3D Texture
// voxelPos   -> xyz Position of texture
// outColor, well, the data needed to be written

How can this be achieved with Directx 11 with HLSL? (Maybe I forgot something... Like usual tongue.png )

 

Thanks, as usual!

-MIGI0027

Share this post


Link to post
Share on other sites

Thanks for making me aware of that, but I thought that you could only output data from the computer shader, is it possible to write to it in the pixel shader?

Share this post


Link to post
Share on other sites

Damn, sorry, didn't fully read your comment, sorry about that... unsure.png

 

EDIT: Not solved yet!

Edited by Migi0027

Share this post


Link to post
Share on other sites

Sorry to annoy you further, but I have one last question regarding this topic.

 

Q: Do you send the 3D Texture (in this case), just like a read only shader resource [DeviceContext->XSSetShaderResources(....)], or do they require a different method?

 

-MIGI0027

Share this post


Link to post
Share on other sites

You don't use a shader resource view, you need to create an unordered access view (UAV) for the texture that you want to write to. To bind it to a compute shader you use CSSetUnorderedAccessViews, and to a pixel shader you use OMSetRenderTargetsAndUnorderedAccessViews.

Share this post


Link to post
Share on other sites

Hi again, thanks btw! happy.png

 

Another question...

 

How would one send such a resource to the GPU for reading to the PS, without setting any render targets. Here's my questions:

  • When calling OMSetRenderTargetsAndUnorderedAccessViews, if you set the backbuffer, numRTS, RTS to NULL, does it set the render targets to NULL, or does it skip the placement of the render targets.
  • Is there a way of treating the content of an UAV as a SRV?

Thanks

-MIGI0027

Share this post


Link to post
Share on other sites

When calling OMSetRenderTargetsAndUnorderedAccessViews, if you set the backbuffer, numRTS, RTS to NULL, does it set the render targets to NULL, or does it skip the placement of the render targets.

Not that I have tried (never used UAVs with pixel shaders), but docs are somewhat clear.

Specify NULL to set none.

If that doesn't work, use a array with explicit NULLs.
 

Is there a way of treating the content of an UAV as a SRV?


Nope, it's a different view, you have to create one. IIRC one can create an arbitrary amount of views for any resource. You only need to make sure that the corresponding bind flag was set when creating the resource and you don't have read/write hazards. So in this case don't bind both the UAV and SRV at once. You won't need to anyway, UAVs are read and write. Not saying that it is always a good idea to do, sometimes a ping-pong approach with two resources is better or even the only possibility.

By the way: You made me curious. You don't mind posting links to that papers, please ? Edited by unbird

Share this post


Link to post
Share on other sites

Btw. Why didn't they make another function of this, as they waste processing power by checking if the number of render targets is 0?

 

I know it's such a small operation, but everything counts. happy.png

 

Is there any specific reason for this?

 

-MIGI0027

Share this post


Link to post
Share on other sites
Ah yeah, Crassin et.al, thanks. You're in for some challenge then if you really wanna go octrees (I for one haven't done hierarchical structures on the GPU yet). Good luck.
 
I got you additional food for thought. The GPU Pro 4 books also has a chapter about voxelization. You can save you the money: The Ph.D. thesis from Mr. Gaitatzes about it is available on his site. Nice, since papers are usually a bit... brief.

Share this post


Link to post
Share on other sites

An error has come up, which I am for now unable to understand (I understand it, just don't know why tongue.png ).

 

Error:

maximum ps_5_0 UAV register index (8) exceeded - note that the minimum index is 5.

Relevant shader code: (Some resources are repeated or have some weird name, but that's because I'm moving stuff around)

RWTexture3D<float> t_gi0 : register(t0); // GD: These are all faces of a cube map
RWTexture3D<float> t_gi1 : register(t1); // Though i could construct them to one final cube map
RWTexture3D<float> t_gi2 : register(t2); // But I'm too lazy to do that now...
RWTexture3D<float> t_gi3 : register(t3); // After it works I might change it
RWTexture3D<float> t_gi4 : register(t4); // These are only for reading
RWTexture3D<float> t_gi5 : register(t5); // They have already been written to...

// pos and dir are in texture space
// dir is the direction from the center of the voxel outward
// dir should be normalized
float4 voxelFetch(float3 pos, float3 dir, float lod)
{
	float4 sampleX =
		dir.x < 0.0
		? t_gi0[pos]
		: t_gi1[pos];
	
	float4 sampleY =
		dir.y < 0.0
		? t_gi2[pos]
		: t_gi3[pos];
	
	float4 sampleZ =
		dir.z < 0.0
		? t_gi4[pos]
		: t_gi5[pos];
	
	float3 sampleWeights = abs(dir);
	float invSampleMag = 1.0 / (sampleWeights.x + sampleWeights.y + sampleWeights.z + .0001);
	sampleWeights *= invSampleMag;
	
	float4 filtered = 
		sampleX * sampleWeights.x
		+ sampleY * sampleWeights.y
		+ sampleZ * sampleWeights.z;
	
	return filtered;
}

// origin, dir, and maxDist are in texture space
// dir should be normalized
// coneRatio is the cone diameter to height ratio (2.0 for 90-degree cone)
float4 voxelTraceCone(float3 origin, float3 dir, float coneRatio, float maxDist)
{
	float3 samplePos = origin;
	float4 accum = float4(0, 0, 0, 0);

	// the starting sample diameter
	float minDiameter = minVoxelDiameter;

	// push out the starting point to avoid self-intersection
	float startDist = minDiameter;
	
	float dist = startDist;

	[loop]
    [allow_uav_condition]
	while (dist <= maxDist && accum.w < 1.0)
	{
		// ensure the sample diameter is no smaller than the min
		// desired diameter for this cone (ensuring we always
		// step at least minDiameter each iteration, even for tiny
		// cones - otherwise lots of overlapped samples)
		float sampleDiameter = max(minDiameter, coneRatio * dist);
		
		// convert diameter to LOD
		// for example:
		// log2(1/256 * 256) = 0
		// log2(1/128 * 256) = 1
		// log2(1/64 * 256) = 2
		float sampleLOD = log2(sampleDiameter * minVoxelDiameterInv);
		
		float3 samplePos = origin + dir * dist;
		
		float4 sampleValue = voxelFetch(samplePos, -dir, sampleLOD);
		
		float sampleWeight = (1.0 - accum.w);
		accum += sampleValue * sampleWeight;
		
		dist += sampleDiameter;
	}
	
	// decompress color range to decode limited HDR
	accum.xyz *= 2.0;
	
	return accum;
}

#endif

Texture2D t_alphamap : register(t0);
Texture2D t_dffalpha : register(t1);
Texture2D gBuffer_Shadows : register(t2);
Texture2D t_rFront : register(t3);
Texture2D t_rBack : register(t4);

#if COMPILENORMALMAP == 1
Texture2D t_norm : register(t5);
#endif
#if COMPILESPECULARMAP == 1
Texture2D t_spec : register(t6);
#endif
#if COMPILEMASKMAP == 1
Texture2D t_mask : register(t7);
#endif

SamplerState ss;

Now what on earth did I do wrong?

-MIGI0027

Share this post


Link to post
Share on other sites

You can't tell me it's impossible to send more than 6 UAVs, or maybe I'm setting it to the wrong slots, with the wrong range.

 

OpenGL GLSL for what I'm trying to achieve:

layout(binding = 3) uniform sampler3D voxelTex[6];

HLSL:

RWTexture3D<float> t_gi[6];

wacko.png

Share this post


Link to post
Share on other sites
Nope for SM5 it should be 8 UAVs. I checked with the compiler and the funny thing you are doing (at least two posts ago) is binding a read-write resource to a t# register, which would actually be a SRV. The compiler doesn't complain, it puts them into the UAV slots. Furthermore, a non RW resource goes into the SRV slots, even when marked with u# register. Again no compiler warning. It helps to look at the assembly. Example:

Buffer<float> In1 : register(u0);  // WRONG: It should be t#
RWBuffer<float> Out1: register(t0); // AGAIN WRONG: It should be u#
RWBuffer<float> Out2: register(t1);
RWBuffer<float> Out3: register(t2);
RWBuffer<float> Out4: register(t3);
RWBuffer<float> Out5: register(t4);
RWBuffer<float> Out6: register(t5);
RWBuffer<float> Out7: register(t6);
RWBuffer<float> Out8: register(t7);

[numthreads(4,4,4)]
void main(uint3 tid : SV_DispatchThreadID) 
{
    uint index = dot(tid, uint3(1, 8, 256)); // whatever
    Out1[index] = In1[index] + 
        Out2[index] +
        Out3[index] +
        Out4[index] +
        Out5[index] +
        Out6[index] +
        Out7[index] +
        Out8[index];        
}
 
This is where the slots end up:

// Generated by Microsoft (R) HLSL Shader Compiler 9.30.960.8229
//
//
//   fxc /Zi /T cs_5_0 /Fo stub.fxo /Fx stub.asm stub.fx
//
//
// Resource Bindings:
//
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// In1                               texture   float         buf    0        1
// Out1                                  UAV   float         buf    0        1
// Out2                                  UAV   float         buf    1        1
// Out3                                  UAV   float         buf    2        1
// Out4                                  UAV   float         buf    3        1
// Out5                                  UAV   float         buf    4        1
// Out6                                  UAV   float         buf    5        1
// Out7                                  UAV   float         buf    6        1
// Out8                                  UAV   float         buf    7        1
(Resource type [tt]texture[/tt] means t# register, hence the t).
Same happens if one doesn't use the register keyword at all.

So I expect other RW-resources on your side - where you probably only read from. If so, don't use RW.

Share this post


Link to post
Share on other sites

Thanks, first time using UAVs wacko.png .

 

Though I have a problem, the u's seemed to solve some problems, the pixel shader outputs to 5 different render targets (Can be boiled down to 3-4). So then the UAVs expect to start at an index of at least u5, but the max UAV register index is 8, and I need six slots, but this doesn't obey the laws.
 

UAV registers live in the same name space as outputs, so they must be bound to at least u5, manual bind to slot u0 failed

What would be your suggestion?

Edited by Migi0027

Share this post


Link to post
Share on other sites

Ah-ha, there's the culprit. Learned something new then - haven't used UAVs in pixel shaders so far.

 

Seems you have to tailor your setup accordingly. Multiple passes ? Combine two or more resources into one (say, use a bigger texture and "atlas"/split manually) ? How's it done in OpenGL ? Is there no such limit ?

 

For multiple passes look into both stream out (geometry shader) and append/consume buffers (special UAVs, hopefully they work in pixel shaders) maybe they can help you split the problem.

 

Admittedly, these are hints from a bird's eye view. This really sounds like a challenging problem. Look into alternatives: If something gets too cumbersome another approach is likely better than "shoehorning" it into a unsuitable API. That thesis I linked looks feasable with D3D11 (first variant using geometry shaders and three passes).

Share this post


Link to post
Share on other sites

Actually I might just be able to boil the number of render targets to 3, but that's the minimum, the thing is that I'm using the approach of deferred rendering, which consumes some render targets.

 

Btw. How on earth do you combine 6 3d cube map faces into one whole 3d cube map, never heard of this being done with 3d textures, maybe it's not possible yet...

Share this post


Link to post
Share on other sites

Well, this is what I do:

 

...

Do the Voxelisation (done, kinda)

...

Render the scene with Vox Data, output to different RTS (Deferred Rendering).

 

So I'm not actually writing to the UAV's anymore, that has already been done, can I send the UAVs without being in this writing stage?

Share this post


Link to post
Share on other sites

I guess I could have a separate pass, where only one color is being set in the pixel shader, the GI only color, then sample it on top to the scene.

Share this post


Link to post
Share on other sites

You can make a shader resource view for your UAV buffer and use that in the shader stage as a texture input (Texture3D, etc). There's no need to use the UAVs at this point anymore. :)

Share this post


Link to post
Share on other sites

This topic is 1577 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By xhcao
      Does sync be needed to read texture content after access texture image in compute shader?
      My simple code is as below,
      glUseProgram(program.get());
      glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
      glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
      glDispatchCompute(1, 1, 1);
      // Does sync be needed here?
      glUseProgram(0);
      glBindFramebuffer(GL_READ_FRAMEBUFFER, framebuffer);
      glFramebufferTexture2D(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                                     GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
      glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);
       
      Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?
    • By Jonathan2006
      My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:
      // This works quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); quat quaternion2 = GEMultiplyQuaternions(quaternion1, GEQuaternionFromAngleRadians(angleRadiansVector2)); quat quaternion3 = GEMultiplyQuaternions(quaternion2, GEQuaternionFromAngleRadians(angleRadiansVector3)); glMultMatrixf(GEMat4FromQuaternion(quaternion3).array); // The first two work fine but not the third. Why? quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); vec3 vector1 = GETransformQuaternionAndVector(quaternion1, angularVelocity1); quat quaternion2 = GEQuaternionFromAngleRadians(angleRadiansVector2); vec3 vector2 = GETransformQuaternionAndVector(quaternion2, angularVelocity2); // This doesn't work //quat quaternion3 = GEQuaternionFromAngleRadians(angleRadiansVector3); //vec3 vector3 = GETransformQuaternionAndVector(quaternion3, angularVelocity3); vec3 angleVelocity = GEAddVectors(vector1, vector2); // Does not work: vec3 angleVelocity = GEAddVectors(vector1, GEAddVectors(vector2, vector3)); static vec3 angleRadiansVector; vec3 angularAcceleration = GESetVector(0.0, 0.0, 0.0); // Sending it through one angular velocity later in my motion engine angleVelocity = GEAddVectors(angleVelocity, GEMultiplyVectorAndScalar(angularAcceleration, timeStep)); angleRadiansVector = GEAddVectors(angleRadiansVector, GEMultiplyVectorAndScalar(angleVelocity, timeStep)); glMultMatrixf(GEMat4FromEulerAngle(angleRadiansVector).array); Also how do I combine multiple angularAcceleration variables? Is there an easier way to transform the angular values?
    • By dpadam450
      I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.
       
      #define NUM_LIGHTS 2
      struct Light
      {
          vec3 position;
          vec3 diffuse;
          float attenuation;
      };
      uniform Light Lights[NUM_LIGHTS];
       
       
    • By pr033r
      Hello,
      I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article (http://natureofcode.com/book/chapter-6-autonomous-agents/) inspirate from another code (here: https://github.com/jyanar/Boids/tree/master/src) but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things. 
      I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help. 
      My code (optimalizing branch): https://github.com/pr033r/BachelorProject/tree/Optimalizing
      Exe file (if you want to look) and models folder (for those who will download the sources):
      http://leteckaposta.cz/367190436
      Thanks for any help...

    • By Andrija
      I am currently trying to implement shadow mapping into my project , but although i can render my depth map to the screen and it looks okay , when i sample it with shadowCoords there is no shadow.
      Here is my light space matrix calculation
      mat4x4 lightViewMatrix; vec3 sun_pos = {SUN_OFFSET * the_sun->direction[0], SUN_OFFSET * the_sun->direction[1], SUN_OFFSET * the_sun->direction[2]}; mat4x4_look_at(lightViewMatrix,sun_pos,player->pos,up); mat4x4_mul(lightSpaceMatrix,lightProjMatrix,lightViewMatrix); I will tweak the values for the size and frustum of the shadow map, but for now i just want to draw shadows around the player position
      the_sun->direction is a normalized vector so i multiply it by a constant to get the position.
      player->pos is the camera position in world space
      the light projection matrix is calculated like this:
      mat4x4_ortho(lightProjMatrix,-SHADOW_FAR,SHADOW_FAR,-SHADOW_FAR,SHADOW_FAR,NEAR,SHADOW_FAR); Shadow vertex shader:
      uniform mat4 light_space_matrix; void main() { gl_Position = light_space_matrix * transfMatrix * vec4(position, 1.0f); } Shadow fragment shader:
      out float fragDepth; void main() { fragDepth = gl_FragCoord.z; } I am using deferred rendering so i have all my world positions in the g_positions buffer
      My shadow calculation in the deferred fragment shader:
      float get_shadow_fac(vec4 light_space_pos) { vec3 shadow_coords = light_space_pos.xyz / light_space_pos.w; shadow_coords = shadow_coords * 0.5 + 0.5; float closest_depth = texture(shadow_map, shadow_coords.xy).r; float current_depth = shadow_coords.z; float shadow_fac = 1.0; if(closest_depth < current_depth) shadow_fac = 0.5; return shadow_fac; } I call the function like this:
      get_shadow_fac(light_space_matrix * vec4(position,1.0)); Where position is the value i got from sampling the g_position buffer
      Here is my depth texture (i know it will produce low quality shadows but i just want to get it working for now):
      sorry because of the compression , the black smudges are trees ... https://i.stack.imgur.com/T43aK.jpg
      EDIT: Depth texture attachment:
      glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT24,fbo->width,fbo->height,0,GL_DEPTH_COMPONENT,GL_FLOAT,NULL); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, fbo->depthTexture, 0);
  • Popular Now