Max size of Matrix4x4 Array in HLSL

ProgrammerDX · 2015-02-24T19:18:46

Hi all I use ID3DX::Effect of DX9 for my shaders. I have a skinned mesh and coded my own .fx shader but I have this problem where some of my models have 64+ bones and if I declare a Matrix4x4 array of 64 indices or more, the shader doesn't work. There's no error loading the .fx file, my models just won't appear on screen. Doesn't work: float4x4 Bones[64] : WORLDMATRIXARRAY; Works: float4x4 Bones[59] : WORLDMATRIXARRAY; I declare some other variables in the shader aswell. But not many. I declare VS_2_0 and PS_2_0 in the Technique part of the .fx file. (Maybe there's a limit of .fx size because of this lower version the Vertex Shader?) Anyone has a suggestion on how to resolve this problem?

Graphics and GPU Programming Programming

Started by ProgrammerDX February 22, 2015 07:25 AM

17 comments, last by unbird 9 years, 1 month ago

ProgrammerDX

395

Author

February 23, 2015 08:17 PM

You have a model with that many bones?

By the way changing to vs_3_0 didnt work.

float4x3 was causing issues because I am using SetMatrixArray of ID3DX::Effect which doesnt like float4x3's apparently.

Either way I will try it with texture buffers later and update this topic about it

Thanks

Tispe

1,468

February 23, 2015 09:31 PM

You have a model with that many bones?

No, I am trying to write a skinned vertex shader which can index up to 16384 instances, where each instance may have up to 128 bones each. 16384 x 128 = 2097152.


tbuffer SkinnedMatrices : register(t2)
{
	float4x4 Matrices[2097152];
};

cbuffer InstanceIndices : register(b1)
{
	uint4 InstIdxArray[4096];
};

uint ReadInstanceIndex(uint instID)
{
	return InstIdxArray[instID >> 2][instID & 3];
}

VSOut VShader(in VSIn Input)
{
    VSOut output;
	uint Bone0 = Input.BoneIndices & 255;
	uint Bone1 = (Input.BoneIndices >> 8) & 255;
	uint Bone2 = (Input.BoneIndices >> 16) & 255;
	uint Bone3 = (Input.BoneIndices >> 24) & 255;
	uint InstID = ReadInstanceIndex(Input.instanceID);

	float4x4 MatWorld = Matrices[InstID];
	float4x4 MatWorldViewProj = Matrices[InstID + 1];
	float4x4 MatBone0 = Matrices[InstID + 2 + Bone0];
	float4x4 MatBone1 = Matrices[InstID + 2 + Bone1];
	float4x4 MatBone2 = Matrices[InstID + 2 + Bone2];
	float4x4 MatBone3 = Matrices[InstID + 2 + Bone3];

...
// Transform position and normal
...


    return output;
}

unbird

8,359

February 24, 2015 09:29 AM

@ProgrammerDX: Use ID3DXEffect::SetRawValue or ID3DXBaseEffect::SetFloatArray. Also, vertex texture fetch (need to use tex2dlod, by the way) is a SM3 feature.

@Tispe: Use structured buffers:


StructuredBuffer<matrix> Matrices : register(t4);

No more need to tell the array size

. I also recommend 4x3 matrices. According to the D3D11 resource limits this should give you 128M elements. Plenty.

Also, why this manual decode ?


uint Bone0 = Input.BoneIndices & 255;
uint Bone1 = (Input.BoneIndices >> 8) & 255;
uint Bone2 = (Input.BoneIndices >> 16) & 255;
uint Bone3 = (Input.BoneIndices >> 24) & 255;

Use DXGI_FORMAT_R8G8B8A8_UINT in your input layout. Your VS signature can then look this way directly:


struct VSInput
{
	...
	uint4 BoneIndices : BONEINDICES;
	...
};

Tispe

1,468

February 24, 2015 09:40 AM

I've googled for days trying to find documentation/tutorials on structured buffers. Do you know some?

unbird

8,359

February 24, 2015 10:03 AM

Erm, no, sorry. I own Practical Rendering and Computation, which talks a bit about them (which means: You could grab the Hieroglyph3 source and see how they are setup/used). Then again, I just experimented with buffer and SRV creation and closely watched the debug layer.

Buffer creation needs a non-zero StructureByteStride and D3D11_RESOURCE_MISC_BUFFER_STRUCTURED in the misc flags. The SRV needs D3D11_SRV_DIMENSION_BUFFEREX as ViewDimension and the corresponding D3D11_BUFFEREX_SRV filled.

Tispe

1,468

February 24, 2015 06:27 PM

Buffer creation needs a non-zero StructureByteStride and D3D11_RESOURCE_MISC_BUFFER_STRUCTURED in the misc flags. The SRV needs D3D11_SRV_DIMENSION_BUFFEREX as ViewDimension and the corresponding D3D11_BUFFEREX_SRV filled.

Im getting run-time error:

Error Code: E_INVALIDARG (0x80070057)

Calling: m_pDevice->CreateShaderResourceView(pBuffer, &rd, &pShaderResourceView)


CComPtr<ID3D11ShaderResourceView> DXDevice::CreateStructuredBufferResource(const void* pDataSrc, UINT BufferSize)
{
	CComPtr<ID3D11ShaderResourceView> pShaderResourceView{ nullptr };
	CComPtr<ID3D11Buffer> pBuffer = CreateBufferResource(pDataSrc, BufferSize, D3D11_BIND_SHADER_RESOURCE, D3D11_USAGE_DEFAULT, D3D11_RESOURCE_MISC_BUFFER_STRUCTURED);

	if (pBuffer == nullptr)
		return nullptr;

	try
	{
		D3D11_SHADER_RESOURCE_VIEW_DESC rd;
		ZeroMemory(&rd, sizeof(rd));
		rd.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
		rd.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
		rd.BufferEx.NumElements = BufferSize / sizeof(DirectX::XMFLOAT4X4A);

		HR(m_pDevice->CreateShaderResourceView(pBuffer, &rd, &pShaderResourceView));
	}
	catch (std::exception &e)
	{
		WriteFile("error.log", e.what());
		return nullptr;
	}

	return pShaderResourceView;
}


CComPtr<ID3D11Buffer> DXDevice::CreateBufferResource(const void* pDataSrc, UINT BufferSize, UINT BindFlags, D3D11_USAGE Usage, UINT MiscFlags)
{
	CComPtr<ID3D11Buffer> pBuffer = nullptr;

	try
	{
		if (BufferSize == 0)
			throw std::exception("The requested buffer resource is of size 0");

		D3D11_SUBRESOURCE_DATA sd;
		ZeroMemory(&sd, sizeof(sd));
		sd.pSysMem = pDataSrc;

		D3D11_BUFFER_DESC bd;
		ZeroMemory(&bd, sizeof(bd));
		bd.Usage = Usage;
		bd.ByteWidth = BufferSize;
		bd.BindFlags = BindFlags;
		bd.MiscFlags = MiscFlags;
		if (MiscFlags == D3D11_RESOURCE_MISC_BUFFER_STRUCTURED)
			bd.StructureByteStride = sizeof(DirectX::XMFLOAT4X4A);

		HR(m_pDevice->CreateBuffer(&bd, pDataSrc ? &sd : nullptr, &pBuffer));
	}
	catch (std::exception &e)
	{
		WriteFile("error.log", e.what());
		return nullptr;
	}

	return pBuffer;
}


void DXDevice::SetMatrices(std::vector<DirectX::XMFLOAT4X4A> &Matrices)
{
	auto pBuf = CreateStructuredBufferResource(Matrices.data(), Matrices.size() * sizeof(DirectX::XMFLOAT4X4A));
	if (pBuf == nullptr)
		throw std::exception("CreateStructuredBufferResource failed");
	
	m_pImmediateContext->VSSetShaderResources(0, 1, &pBuf.p);
}


StructuredBuffer<float4x4> Matrices : register(t0);

unbird

8,359

February 24, 2015 06:47 PM

You got the raw view enabled in your SRV description, this is not allowed with structured buffers:

[4452] D3D11: ERROR: ID3D11Device::CreateShaderResourceView: When the D3D11_RESOURCE_MISC_BUFFER_STRUCTURED BindFlag is specified, the SRV Flag D3D11_BUFFEREX_SRV_FLAG_RAW cannot be specified. [ STATE_CREATION ERROR #127: CREATESHADERRESOURCEVIEW_INVALIDFORMAT ]

If you want to use raw views you need to use D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS at buffer creation (this is for ByteAddressBuffer in hlsl, by the way).

Edit: The latter also needs DXGI_FORMAT_R32_TYPELESS in the SRV description. I honestly repeatedly forget about what works and what doesn't. For that reason I wrote some convenience functions/classes which take care of that. I recommend to do the same

Tispe

1,468

February 24, 2015 07:00 PM

Ah, I took it out and now it works fine now. I got confused by this:

A D3D11_BUFFEREX_SRV_FLAG-typed value that identifies view options for the buffer. Currently, the only option is to identify a raw view of the buffer. For more info about raw viewing of buffers, see Raw Views of Buffers.


//rd.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;

Use DXGI_FORMAT_R8G8B8A8_UINT in your input layout. Your VS signature can then look this way directly:

Question:

If a vertex structure looks like this:


struct Vertex
{
	float px, py, pz;		// Position
	float nx, ny, nz;		// Normal
	float tu, tv;			// Texture UV
	BYTE bn1, bn2, bn3, bn4;
};


{ "BLENDINDICES", 0, DXGI_FORMAT_R8G8B8A8_UINT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },


uint4 BoneIndices : BLENDINDICES;

Does the HLSL uint4 have four ints totaling 16 Bytes? The vertex on the cpu side has only 4 bytes for bone indices. Will each BYTE on the CPU translate to a uint on the GPU?

unbird

8,359

February 24, 2015 07:18 PM

It will, like I alluded with this format. This can be confusing, since for constant buffers (or structured buffers) one needs a one-to-one (binary) match, whereas for data coming from the input assembler only the type needs to match (not the bit size, you don't have e.g. byte types in HLSL anyway).

Edit: Wait, that's not fully correct. E.g. R8_UNorm translates a byte to a float with 0..1 range . There's no such built-in type in C++

It's easier to do data compression with the input assembler, but not impossible otherwise (thanks to the "reinterpret_cast" like asint() and asfloat() or the bit hackery you did).

As for what happens internally on the GPU is another question. To my limited knowledge you do have indeed four-component 32 bit registers (either float or int). Likely depends on the hardware. Maybe it can make sense to pack data from shader stage to the next (if interpolation doesn't hinder you), I don't know. I wouldn't be surprised though

Max size of Matrix4x4 Array in HLSL

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Max size of Matrix4x4 Array in HLSL

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines