Jump to content

  • Log In with Google      Sign In   
  • Create Account

Increasing terrain performance (loading + drawing)

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
33 replies to this topic

#21 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 05:10 AM

I'll look what I can do with the compression thing (is it helping much or just a little?) and splitting the vertexBuffer.
I implemented your suggestion on the textures. I'm not only drawing terrain, I draw a full world with models, etc.
I can see that when I got 20 draws on models with the same texture, only setting it once then draw all of them will boost the FPS.
But I have to set the textures for terrain every frame, because I need to set the textures for all models too, right? So I dont really see how I can save calls here.

A question: How do I set an array in my shader, just with the usual constBuffer?
Are the vertices processed by the shader in the order you save them into the vertexBuffer?

Edited by gnomgrol, 28 July 2012 - 05:21 AM.


#22 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 05:36 AM

It is very simple.
As I said, make wrappers for all of those functions.
An appropriate class name might be CDirectX11.

This class keeps local copies of the last values sent to every DirectX 11 function (or at least a lot of them).
For example.

		/** Textures ready to be sent to the shaders (public). */
		static ID3D11ShaderResourceView *			m_psrvActiveTextures[LSG_MAX_TEXTURE_UNITS];

		/** Last textures (private). */
		static ID3D11ShaderResourceView *			m_psrvLastActiveTextures[LSG_MAX_TEXTURE_UNITS];

When your CTexture class wants to be activated into a slot (you do have a wrapper for your DirectX 11 textures, right? It is standard practice), it simply does this:

	 * Activate this texture in a given slot.
	 * \param _ui32Slot Slot in which to place this texture.
	 * \return Returns true if the texture is activated successfully.
	LSBOOL LSE_CALL CDirectX11StandardTexture::Activate( LSUINT32 _ui32Slot ) {
		CDirectX11::m_psrvActiveTextures[_ui32Slot] = m_psrvShaderView;
		return true;

Notice how nothing has been sent to DirectX 11 yet.
Textures, constant buffers, samplers, etc. only need to be sent when it is actually time to draw.
So there should be another function that is called just before an actual render:

	 * Called just before rendering to allow performing of any final tasks.
	LSVOID LSE_CALL CDirectX11::PreRender() {
		LSUINT32 ui32Index = 0UL, ui32Total = 0UL;
		LSUINT32 ui32Max = CStd::Min<LSUINT32>( LSG_MAX_TEXTURE_UNITS, CFndBase::m_mMetrics.ui32MaxTexSlot );
		for ( LSUINT32 I = 0UL; I < ui32Max; ++I ) {
			if ( m_psrvActiveTextures[I] != m_psrvLastActiveTextures[I] ) {
				m_psrvLastActiveTextures[I] = m_psrvActiveTextures[I];
			else {
				if ( ui32Total ) {
					m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );
				ui32Index = I + 1UL;
				ui32Total = 0UL;
		if ( ui32Total ) {
			m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] );

It is really not that complicated. All it is doing is checking for the fewest possible calls it can make to PSSetShaderResources() on each render call by comparing the currently active textures with the textures active on the last render. This is the only location where it is valid to call PSSetShaderResources(), so the local record of the last textures sent to DirectX 11 is accurate.

You basically need a similar system in place for everything. Samplers, textures, constant buffers, etc.
And then you need to make this actually useful by implementing a render queue to maximize the number of times the same texture, shader, etc. are used in repeated render calls.

L. Spiro

Edited by L. Spiro, 28 July 2012 - 06:08 AM.

#23 Erik Rufelt   Crossbones+   -  Reputation: 5709


Posted 28 July 2012 - 06:04 AM

I would like to ask again, what hardware are you running on?
If this is integrated laptop graphics or similar that explains the low FPS, and vertex count can become more important.

I certainly don't mean to argue against the good points about state changes, but considering the code posted they will probably have close to zero impact in this case. All chunks in one buffer is probably better, but if you get bad FPS from drawing 9 chunks of 256x256, and the terrain drawing is actually the bottleneck, then you'd need like a thousand state changes per chunk to notice a major difference.

Just to set a performance baseline, create a small test program that draws a 1024x1024 or similar terrain with a single vertexbuffer and a single draw call and absolutely nothing else, to determine what your computer is capable of.

Edited by Erik Rufelt, 28 July 2012 - 06:06 AM.

#24 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 06:16 AM

I'm using my gaming notebook which can run even BF3. So that shouldn't be the problem.

I'm very new to directx and all I can do I learned from books and tutorials, there never was anything mentioned about wrappers Posted Image
I simply create them with D3DX11CreateShaderResourceViewFromFile( d3d11Device, "tt1.jpg",NULL, NULL, &slopeTexture, NULL );
and set them later as in my code aboth.

EDIT: I think I have an idea where the low FPS could come from: The function which is called every frame to look where new chunks are needed, the terrain::update
Let me post it, I'm sure you will find tons of things to be fixed:

What it does is:
If the player is inside of the size of the terrain, look for each chunkposition around him if there already is one. And if not, create a new one and pass it the part of height and shadowmap. Then check if there are more chunks then I want to hold in memory and erase the first of them which are not visible.
Could it be that the checking is timeconsuming and therefor slowing everything down? (I'm just using one thread at the moment!)

void terrain::Update(float playerXin, float playerZin){

int playerChunkX = ((int)playerXin/(chunkSize-1))*(chunkSize-1);
	int playerChunkZ = ((int)playerZin/(chunkSize-1))*(chunkSize-1);
if(playerXin > startX && playerZin > startZ && playerXin < startX+terrainWidth && playerZin < startZ+terrainHeight){
for(int z =0;z<loadPerSide;z++){
for(int x=0;x<loadPerSide;x++){
  int xIndex, zIndex;
  xIndex = (x-((loadPerSide-1)/2))*(chunkSize-1);
  zIndex = (z-((loadPerSide-1)/2))*(chunkSize-1);

chunkNum = chunkList.size();
bool isThereAChunk = false;

for(int i=0;i<chunkNum;i++){
   if(playerXin+xIndex > chunkList[i]->startX && playerXin+xIndex < chunkList[i]->startX+chunkList[i]->width && playerZin+zIndex > chunkList[i]->startZ && playerZin+zIndex < chunkList[i]->startZ+chunkList[i]->height){
  isThereAChunk = true;
// test if there is a chunk in this zone to load
if(isThereAChunk == false && playerXin+xIndex > 0 && playerZin+zIndex > 0 && playerXin+xIndex < startX+terrainWidth && playerZin+zIndex < startZ+terrainHeight-chunkSize){
  //get heights from map
  for(int z=0;z<chunkSize;z++){
	for(int x=0;x<chunkSize;x++){
			heightsToPass[z*chunkSize+x] = heightMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*(chunkSize-1)*terrainWidth];
  //get normals from map
  for(int z=0;z<chunkSize;z++){
	for(int x=0;x<chunkSize;x++){
			normalsToPass[z*chunkSize+x] = normalMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)];

  int k;
  k = 0;
  //get shadows from map
  for(int z=0;z<chunkSize;z++){
	for(int x=0;x<chunkSize;x++){
			lightsToPass[k] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)) * 3];
   lightsToPass[k+1] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+1];
   lightsToPass[k+2] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+2];
   k += 3;

  int timebegin = timeGetTime();
  int toLoad = -1;
		for(int i=0;i<chunkCache;i++){
   if(bufferInUse[i] == false){
	toLoad = i;  
	bufferInUse[i] = true;
  if(toLoad == -1){     // NOT FINISHED!!!!
   printf("chunkCache buffers full!\n");

  chunkList.push_back(new chunk());
  chunkList[chunkNum]->preInit(d3d11Device, d3d11DevCon, chunkSize, chunkSize, playerChunkX+xIndex, playerChunkZ+zIndex, chunkAddIndex, vBuffers[toLoad], iBuffers[toLoad]);
  chunkList[chunkNum]->Init(heightsToPass, indices, normalsToPass, lightsToPass, verticesToLock);
  int timeend = timeGetTime();
  //printf("init chunk took %d ms\n", timeend-timebegin);
  } // X end!
}//  Z end!
}//if player > 0 end !

  if(chunkNum > chunkCache){
	int toEraseNum = chunkNum-chunkCache;
	for(int i=0;i<toEraseNum;i++){
	  if(chunkList[i]->isVisible == false){
		   chunkNum = chunkList.size();

visibleIndex = 0;

for(int i = 0; i<chunkList.size(); i++){ // calculate visible chunks
if(FCD->CheckRectangle(chunkList[i]->CenterX, chunkList[i]->CenterY, chunkList[i]->CenterZ, chunkList[i]->width, 256.0f, chunkList[i]->height) == true){
	  chunkList[i]->isVisible = true;
   currentlyVisible[visibleIndex] = i;
   chunkList[i]->isVisible = false;
playerX = playerXin;
	playerZ = playerZin;
for(int i=0;i<chunkList.size();i++){
  chunkList[i]->Update(playerX, playerZ);


chunk::Update() is only saving the playerX and playerZ to the chunk.

Edited by gnomgrol, 28 July 2012 - 06:38 AM.

#25 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 06:33 AM

Wrappers are only the next step up so it is nothing you can’t handle.

But Erik Rufelt is also correct. These redundancy optimizations are important, especially for the long run, but right now it seems clear that you have bigger issues at hand.

Put all of those chunks into one buffer and draw with one call. If the FPS remains mostly similar, it means you have a bandwidth problem, and the 16-bit vertex data optimization should be a large help.

Also, draw your terrain normally, but move the camera out so that it is only a small part of the screen.
If the FPS increases dramatically, you have a fill-rate problem related to your pixel shader. You could then start examining that.

L. Spiro

#26 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 06:47 AM

When I move the camera out, FPS remain the same.
The FPS are about the same on a single vertexBuffer-drawCall.
16-bit optimization could really help, but I dont have the slightest idea how to do that.

According to your blog, I have to find the size of my vertexBuffer and then pad it to 16,32 or 64, but how do I pad a vertexBuffer?
The size of each element of the vertexstructure (3x sizeof(float) * 2x sizeof(float).... + padding) = 32, eg?

Edited by gnomgrol, 28 July 2012 - 06:54 AM.

#27 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 07:05 AM

To be clear, are you saying that zooming out so far that the terrain occupies only about 100-500 total pixels on the screen results in a similar framerate?
And you are sure that the rest of whatever you are drawing is not causing this slow framerate?

If so, you definitely have a bus-transfer problem, and the 2 main optimizations would be to use 16-bit vertices and a second stream for the Y, and compressed textures.
Compressed textures are the easiest to implement so start there.

When my site talks about padding, yes, it is as in your example. Add some fake bytes so that the next element in the buffer is 32 bytes after the previous element.
But while this will help, it is not going to give you results you will find acceptable.
This and redundancy checks should be put on hold while you address your most major issue: bandwidth.

L. Spiro

#28 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 07:26 AM

Yes, when I zoom out very far, I just get ~10+ fps. I outcommented everything but the terriain, so it's the only thing to mess with right now.

By bandwidthproblem you mean that the stuff I pass from cpu to gpu is much to huge?
So I split the vertexBuffer and update only the Y (using a constantbuffer? I'm not sure what you mean by a stream)
I'll start with the compressed textures right now.
EDIT: Compressing the textures (using BC2) gave me a slight FPS increase of 5-10.
I added a screenshot of it to my first post!

Edited by gnomgrol, 28 July 2012 - 07:40 AM.

#29 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 07:38 AM

Bandwidth = the transfer from the CPU to the GPU.
It means the total amount you send to the GPU is too large. This includes textures, index buffers, vertex buffers, etc.
That is why using compressed textures can help.

How many elements are in your vertex buffer?
How many bits in your index buffer?

L. Spiro

#30 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 07:45 AM



D3DXVECTOR3 normal;

D3DXVECTOR2 texcoord;

D3DXVECTOR4 color;

D3DXVECTOR4 shadowColor;

//257*257 vertices per chunk

//indexbuffer: unsinged long 256*256*6

// as optimised as I was able to get it 

#31 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 07:55 AM

Looks like a bandwidth problem after all.

Your vertex buffer is huge. Why do you need so much data? Why shadow color?
Why is your index buffer * 6? You should be using a triangle strip, not a triangle list.
Since you are bandwidth limited, this is one of the major issues you need to handle.
If you switch to triangle strips, your index buffer should become 257 * 257 + 2.
That in itself is much smaller but restricting your index buffers to 16 bits (while increasing the number of draw calls) often proves worth the extra draw calls.
But before reducing your index buffer to 16 bits, start by using a triangle strip. You should see a noticeable gain in performance.

L. Spiro

#32 gnomgrol   Members   -  Reputation: 698


Posted 28 July 2012 - 08:22 AM

I can't really see how to cut the vertexBuffer shorter. I could cut the shadowColor and just use bool for shadowed/not shadowed, but that would be all.
I'm using a trianglelist because I was told that doing so reduces the number of vertices while increasing the number of indices, which is a good thing they said, I didnt know that a triangleleist is more performant.

A 16-bit indexbuffer can simply be created by, instead of unsinged long, using unsinged short and IASetIndexBuffer( indexBuffer, DXGI_FORMAT_R16_UINT, 0);?
To go with a trianglestrip, I have to rebuild the entire terraincode, which could take a while. But if you say that it is worth it, I'll go for that.

Thanks for your instense help!

2007 they said we should use triangleLists! Has so much changed since then?

Edited by gnomgrol, 28 July 2012 - 12:00 PM.

#33 L. Spiro   Crossbones+   -  Reputation: 23959


Posted 28 July 2012 - 08:00 PM

Cache-friendly vertex buffers is another issue worth investigating.
By the size of your vertex buffer it appears you have repeating vertex data. If you are using an index buffer, your vertex buffer should be much smaller (257 × 257).
Are you eliminating duplicate vertices? If not, this would be the first thing to do.

L. Spiro

#34 gnomgrol   Members   -  Reputation: 698


Posted 29 July 2012 - 01:09 AM

You got me wrong, I think. My vertex buffer is only 257*257. There are no doubled vertices.
Because I'm using a trianglelist, I need more indices. Most tutorials and posts said that would be worth it by far.

The problem I'm currently facing is that I cant figure out how to stream the y-values to the shader properly.
Another thing I just noticed is that, when I pass the y values per chunk to the shader, I have to do the same thing for shadows and normals. Will this still be a performance increase?

Edited by gnomgrol, 29 July 2012 - 05:04 AM.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.