• Advertisement
Sign in to follow this  

Max size of Matrix4x4 Array in HLSL

This topic is 1065 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all I use ID3DX::Effect of DX9 for my shaders.

 

I have a skinned mesh and coded my own .fx shader but I have this problem where some of my models have 64+ bones and if I declare a Matrix4x4 array of 64 indices or more, the shader doesn't work.

 

There's no error loading the .fx file, my models just won't appear on screen.

 

Doesn't work:

float4x4 Bones[64] : WORLDMATRIXARRAY;

Works:

float4x4 Bones[59] : WORLDMATRIXARRAY;

I declare some other variables in the shader aswell. But not many.

 

I declare VS_2_0 and PS_2_0 in the Technique part of the .fx file. (Maybe there's a limit of .fx size because of this lower version the Vertex Shader?)

 

Anyone has a suggestion on how to resolve this problem?

Edited by ProgrammerDX

Share this post


Link to post
Share on other sites
Advertisement

If I recall correctly, vs_2_0 only guarantees a minimum of 256 constant registers. So if each float4x4 uses 4 constant registers, then you would be able to safely use a maximum of 64 bone matrices if you have no other constants in your shader. But since you mentioned that you have some other variables, I'm guessing you do have some other constants and that's why you need to use less than 64 matrices.

 

FYI, the reason that the compiler doesn't complain about you going over the limit is because it's actually a runtime issue. D3D9 lets the driver specify the actual maximum number of vertex shader constants, which you can query via D3DCAPS9.MaxVertexShaderConst. However in practice, I'm pretty sure that almost every driver just sets that to 256.

 

If you want to get around that limit and you're targeting semi-recent (DX10 era or newer) hardware, then you can read your bone matrices from a texture instead of using shader constants.

Share this post


Link to post
Share on other sites

Yes, you can use it on D3D9 as long as the GPU supports vertex texture fetch.

 

By the way, the matrices used for skinning are actually 4x3 ones, so you can fit more like 80 of them in if you rearrange how the data is stored.

Share this post


Link to post
Share on other sites
Ah ok.

Isnt it a performance hit to update a texture buffer every frame with bone data?

And is it a good idea to keep a texture for each model with bones?

Share this post


Link to post
Share on other sites

Try compiling for vs_2_x or vs_3_0 instead. I don't remember if they'll give you higher register counts but it's worth a shot.

Share this post


Link to post
Share on other sites

Isnt it a performance hit to update a texture buffer every frame with bone data?

 

You can request a dynamic texture from the driver, which will be optimized for the case where the CPU frequently updates the contents of the texture.

 

 


And is it a good idea to keep a texture for each model with bones?

 

It should be fine if you do it that way. There will definitely be some driver overhead every time that you need to update a texture, since internally the driver will use buffer renaming techniques which will require allocating you a new region of memory.

Edited by MJP

Share this post


Link to post
Share on other sites

I thought I'd use this thread to ask a relevant question, but for DX11.

 

HLSL:

tbuffer SkinnedMatrices : register(t2)
{
	float4x4 Matrices[2097152];
};

When trying to compile (SM4.0) this the compiler gives the following error:

"vertexshader.hlsl(33,11-27): error X3059: array dimension must be between 1 and 65536"

 

So the maximum tbuffer size is 65536 x 16 x 4 = 4194304 Bytes, or 4MB.

 

But a 2048x2048 texture using 4 bytes per pixel is 16 MB large. So why can't tbuffers be as large as textures?

Share this post


Link to post
Share on other sites
You have a model with that many bones?

By the way changing to vs_3_0 didnt work.

float4x3 was causing issues because I am using SetMatrixArray of ID3DX::Effect which doesnt like float4x3's apparently.

Either way I will try it with texture buffers later and update this topic about it

Thanks

Share this post


Link to post
Share on other sites

You have a model with that many bones?

 

No, I am trying to write a skinned vertex shader which can index up to 16384 instances, where each instance may have up to 128 bones each. 16384 x 128 = 2097152.

tbuffer SkinnedMatrices : register(t2)
{
	float4x4 Matrices[2097152];
};

cbuffer InstanceIndices : register(b1)
{
	uint4 InstIdxArray[4096];
};

uint ReadInstanceIndex(uint instID)
{
	return InstIdxArray[instID >> 2][instID & 3];
}

VSOut VShader(in VSIn Input)
{
    VSOut output;
	uint Bone0 = Input.BoneIndices & 255;
	uint Bone1 = (Input.BoneIndices >> 8) & 255;
	uint Bone2 = (Input.BoneIndices >> 16) & 255;
	uint Bone3 = (Input.BoneIndices >> 24) & 255;
	uint InstID = ReadInstanceIndex(Input.instanceID);

	float4x4 MatWorld = Matrices[InstID];
	float4x4 MatWorldViewProj = Matrices[InstID + 1];
	float4x4 MatBone0 = Matrices[InstID + 2 + Bone0];
	float4x4 MatBone1 = Matrices[InstID + 2 + Bone1];
	float4x4 MatBone2 = Matrices[InstID + 2 + Bone2];
	float4x4 MatBone3 = Matrices[InstID + 2 + Bone3];

...
// Transform position and normal
...


    return output;
}
Edited by Tispe

Share this post


Link to post
Share on other sites
@ProgrammerDX: Use ID3DXEffect::SetRawValue or ID3DXBaseEffect::SetFloatArray. Also, vertex texture fetch (need to use tex2dlod, by the way) is a SM3 feature.

@Tispe: Use structured buffers:
StructuredBuffer<matrix> Matrices : register(t4);
No more need to tell the array size wink.png. I also recommend 4x3 matrices. According to the D3D11 resource limits this should give you 128M elements. Plenty.

Also, why this manual decode ?
uint Bone0 = Input.BoneIndices & 255;
uint Bone1 = (Input.BoneIndices >> 8) & 255;
uint Bone2 = (Input.BoneIndices >> 16) & 255;
uint Bone3 = (Input.BoneIndices >> 24) & 255;
Use DXGI_FORMAT_R8G8B8A8_UINT in your input layout. Your VS signature can then look this way directly:

struct VSInput
{
	...
	uint4 BoneIndices : BONEINDICES;
	...
};

Share this post


Link to post
Share on other sites
Erm, no, sorry. I own Practical Rendering and Computation, which talks a bit about them (which means: You could grab the Hieroglyph3 source and see how they are setup/used). Then again, I just experimented with buffer and SRV creation and closely watched the debug layer.
 
Buffer creation needs a non-zero StructureByteStride and D3D11_RESOURCE_MISC_BUFFER_STRUCTURED in the misc flags. The SRV needs D3D11_SRV_DIMENSION_BUFFEREX as ViewDimension and the corresponding D3D11_BUFFEREX_SRV filled.

Share this post


Link to post
Share on other sites


Buffer creation needs a non-zero StructureByteStride and D3D11_RESOURCE_MISC_BUFFER_STRUCTURED in the misc flags. The SRV needs D3D11_SRV_DIMENSION_BUFFEREX as ViewDimension and the corresponding D3D11_BUFFEREX_SRV filled.

 

Im getting run-time error:

 

Error Code: E_INVALIDARG (0x80070057)

Calling: m_pDevice->CreateShaderResourceView(pBuffer, &rd, &pShaderResourceView)
CComPtr<ID3D11ShaderResourceView> DXDevice::CreateStructuredBufferResource(const void* pDataSrc, UINT BufferSize)
{
	CComPtr<ID3D11ShaderResourceView> pShaderResourceView{ nullptr };
	CComPtr<ID3D11Buffer> pBuffer = CreateBufferResource(pDataSrc, BufferSize, D3D11_BIND_SHADER_RESOURCE, D3D11_USAGE_DEFAULT, D3D11_RESOURCE_MISC_BUFFER_STRUCTURED);

	if (pBuffer == nullptr)
		return nullptr;

	try
	{
		D3D11_SHADER_RESOURCE_VIEW_DESC rd;
		ZeroMemory(&rd, sizeof(rd));
		rd.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
		rd.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
		rd.BufferEx.NumElements = BufferSize / sizeof(DirectX::XMFLOAT4X4A);

		HR(m_pDevice->CreateShaderResourceView(pBuffer, &rd, &pShaderResourceView));
	}
	catch (std::exception &e)
	{
		WriteFile("error.log", e.what());
		return nullptr;
	}

	return pShaderResourceView;
}
CComPtr<ID3D11Buffer> DXDevice::CreateBufferResource(const void* pDataSrc, UINT BufferSize, UINT BindFlags, D3D11_USAGE Usage, UINT MiscFlags)
{
	CComPtr<ID3D11Buffer> pBuffer = nullptr;

	try
	{
		if (BufferSize == 0)
			throw std::exception("The requested buffer resource is of size 0");

		D3D11_SUBRESOURCE_DATA sd;
		ZeroMemory(&sd, sizeof(sd));
		sd.pSysMem = pDataSrc;

		D3D11_BUFFER_DESC bd;
		ZeroMemory(&bd, sizeof(bd));
		bd.Usage = Usage;
		bd.ByteWidth = BufferSize;
		bd.BindFlags = BindFlags;
		bd.MiscFlags = MiscFlags;
		if (MiscFlags == D3D11_RESOURCE_MISC_BUFFER_STRUCTURED)
			bd.StructureByteStride = sizeof(DirectX::XMFLOAT4X4A);

		HR(m_pDevice->CreateBuffer(&bd, pDataSrc ? &sd : nullptr, &pBuffer));
	}
	catch (std::exception &e)
	{
		WriteFile("error.log", e.what());
		return nullptr;
	}

	return pBuffer;
}
void DXDevice::SetMatrices(std::vector<DirectX::XMFLOAT4X4A> &Matrices)
{
	auto pBuf = CreateStructuredBufferResource(Matrices.data(), Matrices.size() * sizeof(DirectX::XMFLOAT4X4A));
	if (pBuf == nullptr)
		throw std::exception("CreateStructuredBufferResource failed");
	
	m_pImmediateContext->VSSetShaderResources(0, 1, &pBuf.p);
}
StructuredBuffer<float4x4> Matrices : register(t0);

Share this post


Link to post
Share on other sites
You got the raw view enabled in your SRV description, this is not allowed with structured buffers:

[4452] D3D11: ERROR: ID3D11Device::CreateShaderResourceView: When the D3D11_RESOURCE_MISC_BUFFER_STRUCTURED BindFlag is specified, the SRV Flag D3D11_BUFFEREX_SRV_FLAG_RAW cannot be specified. [ STATE_CREATION ERROR #127: CREATESHADERRESOURCEVIEW_INVALIDFORMAT ]
 
If you want to use raw views you need to use D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS at buffer creation (this is for ByteAddressBuffer in hlsl, by the way).

Edit: The latter also needs DXGI_FORMAT_R32_TYPELESS in the SRV description. I honestly repeatedly forget about what works and what doesn't. For that reason I wrote some convenience functions/classes which take care of that. I recommend to do the same wink.png

Share this post


Link to post
Share on other sites

Ah, I took it out and now it works fine now. I got confused by this:

 

 

D3D11_BUFFEREX_SRV_FLAG-typed value that identifies view options for the buffer. Currently, the only option is to identify a raw view of the buffer. For more info about raw viewing of buffers, see Raw Views of Buffers.
//rd.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;


Use DXGI_FORMAT_R8G8B8A8_UINT in your input layout. Your VS signature can then look this way directly:

Question:

If a vertex structure looks like this:

struct Vertex
{
	float px, py, pz;		// Position
	float nx, ny, nz;		// Normal
	float tu, tv;			// Texture UV
	BYTE bn1, bn2, bn3, bn4;
};
{ "BLENDINDICES", 0, DXGI_FORMAT_R8G8B8A8_UINT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 }, 
uint4 BoneIndices : BLENDINDICES; 

Does the HLSL uint4 have four ints totaling 16 Bytes? The vertex on the cpu side has only 4 bytes for bone indices. Will each BYTE on the CPU translate to a uint on the GPU?

Share this post


Link to post
Share on other sites

It will, like I alluded with this format. This can be confusing, since for constant buffers (or structured buffers) one needs a one-to-one (binary) match, whereas for data coming from the input assembler only the type needs to match (not the bit size, you don't have e.g. byte types in HLSL anyway).

Edit: Wait, that's not fully correct. E.g. R8_UNorm translates a byte to a float with 0..1 range . There's no such built-in type in C++ tongue.png

It's easier to do data compression with the input assembler, but not impossible otherwise (thanks to the "reinterpret_cast" like asint() and asfloat() or the bit hackery you did).

As for what happens internally on the GPU is another question. To my limited knowledge you do have indeed four-component 32 bit registers (either float or int). Likely depends on the hardware. Maybe it can make sense to pack data from shader stage to the next (if interpolation doesn't hinder you), I don't know. I wouldn't be surprised though biggrin.png

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement