Hey guys,
I got me a pretty nice terrainengine by now.
Then I split it up into chunks as soon as there isn't already a chunk(passing them their part of the two maps) and draw them (each with its own multiple vertexbuffers for different LODs).
The chunks are stored in a vector (push_back(new chunk(...)), and every frame I check if there are more chunks then the Max_chunks I want to have loaded, those are erased by vector.erase(the first one not visible).
So, that's all running fine, and thanks to the LOD-levels on 100-150 fps when I dont move.
But the loading/unloading and creating/destruction of chunks seems to take to much time. 3-9 ms currently, and i have to load/unload about 100 or more every frame. That makes the programm freezing a little when you move around, and now I have to get rid of this.

A thing I already got suggested in this Forum are memorypools. I just get me a cupple of raw memory when to programm starts, and then use new(myMemory)chunk(...) to allocate from it. But that appeared to be pretty unhandy to implement due to the method I create the chunks with, and I need to have access to this memory inside of each chunk to allocate the memory the LODlevels need (they delete[] it instantly after the vertexbuffer is set anyway).

So, are there any other ways to improve the performance of new/delete and how can I pass raw memory trough a function properly?
Any ways to increase the draw performance even more can be brought in too.

Feel free to ask for code, if it's needed anywhere.

--gnomgrol

http://imageshack.us/photo/my-images/215/terrainla.png/ Edited by gnomgrol

Do you recreate the vertex buffers every frame also?
You should probably have a static number of vertex buffers always created and then just refill them with new data when a chunk is switched out and another takes its place. How big is the heightmap?
Perhaps you can store all the vertices at least in RAM all the time, and just update vertex buffers when changes occur.
If your heightmap isn't very very large then you can probably even store all the vertices in a vertex-buffer statically, and just use different index buffers to draw different LOD levels to improve performance. If your heightmap is so large that you run out of memory, then look into reusing the same memory for a new chunk instead of reallocating things.

Every frame ? You should reconsider the size of your tiles. It seems to me, that your choosen size is too small compared to your view distance.

memorypools

You should use a cache, i.e. LRU .

Best to load and process the tiles in a separate thread, choose a decent size, not too small or too large and some kind of cache.

You are considering using faster memory allocators (etc.) to solve your problem, when the real issue is actually the concept of what you are doing itself.
Since the FPS is high when you are not moving, we can assume that your terrain system itself is overall fine.

When you move, it drops, and the only thing that happens when you move is that there is deallocation and allocation.
The solution is not to make deallocation and allocation faster, but to simply eliminate them from happening at all.

You should be reusing memory as much as possible for starters, but aside from that you should have a system in place that simply does not need that much memory reallocation or at least spread it out over a longer duration. Of course there are some worlds in which there is no possible way to keep everything necessary in memory, but when memory operations are needed they are put on hold until a certain major event happens, not done every frame.

This issue is not about efficiency, it is about planning. You need to reconceptualize what you are doing so that memory allocations are as little a part of your plans as possible.

L. Spiro

@Erik Rufelt
Yes, I'm createing a new one for each chunk. 4096x4096 heightmap right now, but I would like to go bigger.
Refilling seems to be a good idea, I'll see what I can do with it.

@ Ashaman73
chunksize is 65x65 vertices, if I go larger the freezes are getting bigger. I'll look into the cachestuff.
Multithreading appread to be difficult, because I can't use nonstatic stuff in a threadfunction which would be necessary, cause terrain is a own class, as well as chunk.

@ Spiro
So basicly, what you are suggesting is to rework everything so that I don't need to 'new' and 'delete' everytime, reusing memory for me to start with.
Since the terrainclass is controlling and holding the chunks, what I could do would be allocate a 'chunkVerticesNum * maxNumChunks' array at start and then pass a part of it to the chunk to fill. That should work out fine, I'll try as soon as possible.

Why is it that memory de/allocs are taking so long anyway?

Thanks for the quick reply! Edited by gnomgrol

65*65 is still far far too small for the loaded data.

I worked on an open world game and our 'chunk' sizes were much bigger than that.
The system had a 3x3 high resolution chunk grid around the player meaning we had 9 high resolution chunks loaded at any given time (as well as a number of lower low chunks beyond that). The view distance was 330 meters and our choice of loading in a new chunk was when the chunk bounding box was viewdistance + 20% away from the player to give us some time to stream in the world.

The chunk buffers we also pre-allocated so we could maintain a freelist of them and just load directly into it.

So trying to load and drop 64*64 'chunks' is just too much work - you also won't be able to stream without some form of multi-threading otherwise you'll just end up stalling the thread while you copy memory about.

So, I managed to set up everything as you mentioned aboth. It helped a little, but it keeps beeing pretty laggy. I'm pretty sure thats because I put all chunks in a vector by using push_back(new chunk(...)), and erase the old ones using chunkList.erease(chunkList.begin()). I tried some other things, like maps and deques, but vector was the only one which was really working, the rest of them brought it down to 5fps.

So I need some better stuff to manage the chunks, any suggestions? Edited by gnomgrol

I would say maybe you have to work on threading. In my C# engine I added a update workerthread for chunkupdating and doublebuffering. So in your case I suppose you have to make an vectorarray with two vectors inside. So the update thread can update vector1 while the mainthread can draw out of vector2. After updating you switch it so it will continue updating vector2 and drawing from vector1...

Are you putting things in a vector using push_back at runtime? And also clearing the vector at runtime? And has this vector got any memory reserved or does it need to reallocate all the time? This is definitely not an optimal use case, but at the same time it should not be giving the kind of trouble you're experiencing. 100 objects per frame is nothing - or at least it should be nothing. Have you a constructor for the objects you're creating, and - if so - what's going on in the constructor?

##### Share on other sites
Yes, I'm using push_back and erease at runtime, as well as allocateing at runtime. I'm not calling delete, someone said erease is doing this for me.
The construktor is passing a few things in like the size, the d3d11Device, etc. And I call their init(...) function, which mainly is creating their vertexbuffer.

erase is not calling delete for you, you need to do that.
However, if you're still calling new every frame then you haven't fixed much. Especially if you recreate vertex-buffers. They should probably only be created at the start of the program with the device, and never again.

Also, as others have already pointed out, you shouldn't need to do anything every frame.
If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD, and keep 9 of those loaded at full LOD at a time, the ones around the current player position. Then only ever unload/reload chunks when you cross a chunk boundary, that is when the player enters a new chunk, not every time you move.

When that's working correctly and you want to make it even better, look into loading a little bit at a time and not many chunks at once, and not unloading old chunks exactly when you cross the boundary but add some safety distance so things aren't reloaded constantly by moving back and forth across a boundary. Edited by Erik Rufelt

It's basically a form of death by 1000 cuts. An alternative approach would be to pre-create a pool of objects at startup and draw from that pool rather than re-initializing everything every frame. Either way, and especially with D3D11 (where object creation is documented as being so expensive) you do need to move away from run-time creation.

EDIT: I just figured that you can update EVERY ID3D11Buffer with UpdateSubresource. I didn't knew that, so with that I can try only updateing the buffers instead of creating new once. I'll try this and reply if it works fine.

----- ----- -----
I get it that createBuffer should only be called at the beginning. Problem is, I have to call it when I want to create a new chunk, because I am filling it there. Maybe I'm getting something wrong, at the moment that is what I'm calling every time I create a chunk:

I only create chunks if they are needed when I move, of corse.

 // in chunk.h ID3D11Buffer* vertexBuffer; D3D11_BUFFER_DESC vertexBufferDesc; // in chunk.cpp ZeroMemory( &vertexBufferDesc, sizeof(vertexBufferDesc) ); vertexBufferDesc.Usage = D3D11_USAGE_DEFAULT; vertexBufferDesc.ByteWidth = sizeof(VertexPosNormalTexColorColor2) * (lowwidth*lowheight); vertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER; vertexBufferDesc.CPUAccessFlags = 0; vertexBufferDesc.MiscFlags = 0; D3D11_SUBRESOURCE_DATA vertexBufferData; ZeroMemory( &vertexBufferData, sizeof(vertexBufferData) ); vertexBufferData.pSysMem = verticesToLock; vertexBufferData.SysMemPitch = 0; vertexBufferData.SysMemSlicePitch = 0; d3d11Device->CreateBuffer( &vertexBufferDesc, &vertexBufferData, &vertexBufferIn); 

How can I get rid of this? I have to call createBuffer, because the verticesToLock-data is created in ever chunks Init() function. Edited by gnomgrol

Why would you need to create vertex buffers at all, except in the beginning when you load your height map?

When you move you just decide for every frame which vertex buffers to use, depending on LoD (and set it with a call to IASetVertexBuffers).

If you can't load all buffers at load time, you should probably, as others have pointed out, let a worker thread carry that out in advance. Edited by SamiHuutoniemi

If your map is 4096x4096, try for example 16x16 chunks of size 256x256 at full LOD

Just to come back to this point; many many terrain tutorials out there are old or based on old tutorials when GPUs weren't as powerful, vertex processing was slower or done on the CPU and processing some work on the CPU to reduce the vertices drawn and processed was worth while.

However technology has progressed, GPUs need to be feed with large chunks of work to get parallism with the CPU and CPU cost (and associated memory accesses) are the bottleneck in many many cases so don't be afraid to throw large patches at the GPU beyond what old wisdom use to say was the norm.

So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.

So what I need to do now is increasing the FPS.
(I already got FrustumCulling and LOD) Edited by gnomgrol

There are many general-purpose techniques for increasing framerate.

For one, sorting by render-state/shaders/textures is one of the most important.
3D Performance Tips

Since each chunk is drawn with the same shader and many—if not all—of the same textures, you will see a large boost from just this (unless you are already doing it).

You should also be able to share vertex buffers. Switching vertex buffers is also costly, and terrain provides you with many ways to share vertex data.
For example, X and Z in one buffer that is shared across all chunks and let the Y be in a separate buffer, with only that buffer being swapped between calls.
This also allows you to heavily compress the X and Y values into 16-bit values each which saves bandwidth.
You should also be using only one index buffer for all chunks.

Draw near-to-far to reduce overdraw.

Use compressed textures, especially on terrain.

L. Spiro

So, I managed to do everything you mentioned here. Memory is only allocated at the start, and buffers are only created there too, then reused. That works fine, but when I set the chunkSize to 257, the problem is that, when only 9 chunks are there, FPS already drop to ~55. I haven't reimplemented LOD yet, but those 9 chunks shall be set to full detail anyway.

What GPU etc. are you running this on?
What happens if you decrease the size of your chunks to similar of what you had before, but still use these other techniques to not reload every frame?

Also I think we need to see the code for drawing one chunk in order to give more tips.

What do you mean by compressed textures? Using them in .jpg format?
You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I only set one set of textures once per frame for the terrain at the moment.

It seems to run smoother with smaller chunks, but I'm pretty sure thats only subjective, FPS are the same.
Here's the code for drawing:

 // in terrain.cpp ( called once per frame ) d3d11DevCon->VSSetShader(VS, 0, 0); d3d11DevCon->PSSetShader(PS, 0, 0); d3d11DevCon->OMSetBlendState(0, 0, 0xffffffff); LightBufferType* dataPtr2; D3D11_MAPPED_SUBRESOURCE mappedResource; d3d11DevCon->Map(lightBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource); // Get a pointer to the data in the constant buffer. dataPtr2 = (LightBufferType*)mappedResource.pData; // Copy the lighting variables into the constant buffer. dataPtr2->ambientColor = D3DXVECTOR4(0.3f, 0.3f, 0.3f, 1.0f); //Everythingcolor dataPtr2->diffuseColor = D3DXVECTOR4(1.0f, 1.0f, 1.0f, 1.0f); //Lightcolor dataPtr2->lightDirection = D3DXVECTOR3(0.5f, -0.5f, 0.5f); dataPtr2->padding = 0.0f; // Just Filler // Unlock the constant buffer. d3d11DevCon->Unmap(lightBuffer, 0); // Finally set the light constant buffer in the pixel shader with the updated values. d3d11DevCon->PSSetConstantBuffers(0, 1, &lightBuffer); // For Texture d3d11DevCon->PSSetShaderResources(0, 1, &slopeTexture); d3d11DevCon->PSSetShaderResources(1, 1, &rockTexture); d3d11DevCon->PSSetShaderResources(2, 1, &grassTexture); d3d11DevCon->PSSetSamplers( 0, 1, &SamplerState ); d3d11DevCon->IASetInputLayout( vertLayout ); d3d11DevCon->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); d3d11DevCon->UpdateSubresource( cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0 ); d3d11DevCon->VSSetConstantBuffers( 0, 1, &cbPerObjectBuffer ); d3d11DevCon->RSSetState(RSCullNormal); int numVisChunks; numVisChunks = 0; for(int i=0;i<chunkList.size();i++){ if(chunkList->isVisible == true){ chunkList->Draw(d3d11DevCon); numVisChunks++; } } // chunk.cpp Draw() ( called once per chunk ) d3d11DevCon->IASetIndexBuffer( indexBuffer, DXGI_FORMAT_R32_UINT, 0); // I need different indexBuffers for different LODs, right? d3d11DevCon->IASetVertexBuffers( 0, 1, &vertexBuffer, &stride, &offset ); d3d11DevCon->DrawIndexed(numIndices, 0, 0 );  Edited by gnomgrol

What do you mean by compressed textures? Using them in .jpg format?

That is a disk compression compression format.
I am talking about run-time compression formats such as DXT, BCn, etc.
http://wiki.polycount.com/DXT
http://msdn.microsof...531(VS.85).aspx

You suggest to spilt the vertexbuffer, setting one only once (x + z), then pass a y-buffer every chunk?

I do. In addition to compressing the X and Z values to 16 bits.

I only set one set of textures once per frame for the terrain at the moment.

The way you are setting them is not efficient.
Why not:
ID3D11ShaderResourceView * psrvViews[] = { slopeTexture, rockTexture, grassTexture }; d3d11DevCon->PSSetShaderResources( 0, 3, psrvViews );

And furthermore, if terrain is the only thing you are drawing, those textures will already be set, so setting them again and again is just wasting time.
You need to make wrappers for basically all of the DirectX 11 calls and check the last values you sent to them, and when matching, don’t call the DirectX 11 function.
Sorting by render states, textures, and shaders only has meaning if you are doing this.

L. Spiro Edited by L. Spiro

I'll look what I can do with the compression thing (is it helping much or just a little?) and splitting the vertexBuffer.
I implemented your suggestion on the textures. I'm not only drawing terrain, I draw a full world with models, etc.
I can see that when I got 20 draws on models with the same texture, only setting it once then draw all of them will boost the FPS.
But I have to set the textures for terrain every frame, because I need to set the textures for all models too, right? So I dont really see how I can save calls here.

A question: How do I set an array in my shader, just with the usual constBuffer?
Are the vertices processed by the shader in the order you save them into the vertexBuffer? Edited by gnomgrol

It is very simple.
As I said, make wrappers for all of those functions.
An appropriate class name might be CDirectX11.

This class keeps local copies of the last values sent to every DirectX 11 function (or at least a lot of them).
For example.

 /** Textures ready to be sent to the shaders (public). */ static ID3D11ShaderResourceView * m_psrvActiveTextures[LSG_MAX_TEXTURE_UNITS]; /** Last textures (private). */ static ID3D11ShaderResourceView * m_psrvLastActiveTextures[LSG_MAX_TEXTURE_UNITS]; 

When your CTexture class wants to be activated into a slot (you do have a wrapper for your DirectX 11 textures, right? It is standard practice), it simply does this:

 /** * Activate this texture in a given slot. * * \param _ui32Slot Slot in which to place this texture. * \return Returns true if the texture is activated successfully. */ LSBOOL LSE_CALL CDirectX11StandardTexture::Activate( LSUINT32 _ui32Slot ) { CDirectX11::m_psrvActiveTextures[_ui32Slot] = m_psrvShaderView; return true; }

Notice how nothing has been sent to DirectX 11 yet.
Textures, constant buffers, samplers, etc. only need to be sent when it is actually time to draw.
So there should be another function that is called just before an actual render:

 /** * Called just before rendering to allow performing of any final tasks. */ LSVOID LSE_CALL CDirectX11::PreRender() { LSUINT32 ui32Index = 0UL, ui32Total = 0UL; LSUINT32 ui32Max = CStd::Min<LSUINT32>( LSG_MAX_TEXTURE_UNITS, CFndBase::m_mMetrics.ui32MaxTexSlot ); for ( LSUINT32 I = 0UL; I < ui32Max; ++I ) { if ( m_psrvActiveTextures != m_psrvLastActiveTextures ) { ++ui32Total; m_psrvLastActiveTextures = m_psrvActiveTextures; } else { if ( ui32Total ) { m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] ); } ui32Index = I + 1UL; ui32Total = 0UL; } } if ( ui32Total ) { m_pdDevice->PSSetShaderResources( ui32Index, ui32Total, &m_psrvActiveTextures[ui32Index] ); } }

It is really not that complicated. All it is doing is checking for the fewest possible calls it can make to PSSetShaderResources() on each render call by comparing the currently active textures with the textures active on the last render. This is the only location where it is valid to call PSSetShaderResources(), so the local record of the last textures sent to DirectX 11 is accurate.

You basically need a similar system in place for everything. Samplers, textures, constant buffers, etc.
And then you need to make this actually useful by implementing a render queue to maximize the number of times the same texture, shader, etc. are used in repeated render calls.

L. Spiro Edited by L. Spiro

I would like to ask again, what hardware are you running on?
If this is integrated laptop graphics or similar that explains the low FPS, and vertex count can become more important.

I certainly don't mean to argue against the good points about state changes, but considering the code posted they will probably have close to zero impact in this case. All chunks in one buffer is probably better, but if you get bad FPS from drawing 9 chunks of 256x256, and the terrain drawing is actually the bottleneck, then you'd need like a thousand state changes per chunk to notice a major difference.

Just to set a performance baseline, create a small test program that draws a 1024x1024 or similar terrain with a single vertexbuffer and a single draw call and absolutely nothing else, to determine what your computer is capable of. Edited by Erik Rufelt

I'm using my gaming notebook which can run even BF3. So that shouldn't be the problem.

I'm very new to directx and all I can do I learned from books and tutorials, there never was anything mentioned about wrappers
I simply create them with D3DX11CreateShaderResourceViewFromFile( d3d11Device, "tt1.jpg",NULL, NULL, &slopeTexture, NULL );
and set them later as in my code aboth.

EDIT: I think I have an idea where the low FPS could come from: The function which is called every frame to look where new chunks are needed, the terrain::update
Let me post it, I'm sure you will find tons of things to be fixed:

What it does is:
If the player is inside of the size of the terrain, look for each chunkposition around him if there already is one. And if not, create a new one and pass it the part of height and shadowmap. Then check if there are more chunks then I want to hold in memory and erase the first of them which are not visible.
Could it be that the checking is timeconsuming and therefor slowing everything down? (I'm just using one thread at the moment!)

 void terrain::Update(float playerXin, float playerZin){ int playerChunkX = ((int)playerXin/(chunkSize-1))*(chunkSize-1); int playerChunkZ = ((int)playerZin/(chunkSize-1))*(chunkSize-1); // LOAD NEW // UNLOAD OLD CHUNKS!! if(playerXin > startX && playerZin > startZ && playerXin < startX+terrainWidth && playerZin < startZ+terrainHeight){ for(int z =0;z<loadPerSide;z++){ for(int x=0;x<loadPerSide;x++){ int xIndex, zIndex; xIndex = (x-((loadPerSide-1)/2))*(chunkSize-1); zIndex = (z-((loadPerSide-1)/2))*(chunkSize-1); chunkNum = chunkList.size(); bool isThereAChunk = false; //ADD NEW CHUNKS for(int i=0;i<chunkNum;i++){ if(playerXin+xIndex > chunkList->startX && playerXin+xIndex < chunkList->startX+chunkList->width && playerZin+zIndex > chunkList->startZ && playerZin+zIndex < chunkList->startZ+chunkList->height){ isThereAChunk = true; continue; } } // test if there is a chunk in this zone to load if(isThereAChunk == false && playerXin+xIndex > 0 && playerZin+zIndex > 0 && playerXin+xIndex < startX+terrainWidth && playerZin+zIndex < startZ+terrainHeight-chunkSize){ //get heights from map for(int z=0;z<chunkSize;z++){ for(int x=0;x<chunkSize;x++){ heightsToPass[z*chunkSize+x] = heightMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*(chunkSize-1)*terrainWidth]; } } //get normals from map for(int z=0;z<chunkSize;z++){ for(int x=0;x<chunkSize;x++){ normalsToPass[z*chunkSize+x] = normalMap[(z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)]; } } int k; k = 0; //get shadows from map for(int z=0;z<chunkSize;z++){ for(int x=0;x<chunkSize;x++){ lightsToPass[k] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1)) * 3]; lightsToPass[k+1] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+1]; lightsToPass[k+2] = lightMapImage[((z*terrainWidth+x) + ((int)playerXin+xIndex)/(chunkSize-1)*(chunkSize-1) + ((int)playerZin+zIndex)/(chunkSize-1)*terrainWidth*(chunkSize-1))*3+2]; k += 3; } } int timebegin = timeGetTime(); chunkAddIndex++; int toLoad = -1; for(int i=0;i<chunkCache;i++){ if(bufferInUse == false){ toLoad = i; bufferInUse = true; break; } } if(toLoad == -1){ // NOT FINISHED!!!! printf("chunkCache buffers full!\n"); } chunkList.push_back(new chunk()); chunkList[chunkNum]->preInit(d3d11Device, d3d11DevCon, chunkSize, chunkSize, playerChunkX+xIndex, playerChunkZ+zIndex, chunkAddIndex, vBuffers[toLoad], iBuffers[toLoad]); chunkList[chunkNum]->Init(heightsToPass, indices, normalsToPass, lightsToPass, verticesToLock); int timeend = timeGetTime(); //printf("init chunk took %d ms\n", timeend-timebegin); } } // X end! }// Z end! }//if player > 0 end ! if(chunkNum > chunkCache){ int toEraseNum = chunkNum-chunkCache; for(int i=0;i<toEraseNum;i++){ if(chunkList->isVisible == false){ chunkList->CleanUp(); chunkList.erase(chunkList.begin()+i); chunkNum = chunkList.size(); }else{ toEraseNum--; } } } visibleIndex = 0; for(int i = 0; i<chunkList.size(); i++){ // calculate visible chunks if(FCD->CheckRectangle(chunkList->CenterX, chunkList->CenterY, chunkList->CenterZ, chunkList->width, 256.0f, chunkList->height) == true){ chunkList->isVisible = true; currentlyVisible[visibleIndex] = i; visibleIndex++; }else{ chunkList->isVisible = false; } } playerX = playerXin; playerZ = playerZin; for(int i=0;i<chunkList.size();i++){ chunkList->Update(playerX, playerZ); } } 

chunk::Update() is only saving the playerX and playerZ to the chunk. Edited by gnomgrol

Wrappers are only the next step up so it is nothing you can’t handle.

But Erik Rufelt is also correct. These redundancy optimizations are important, especially for the long run, but right now it seems clear that you have bigger issues at hand.

Put all of those chunks into one buffer and draw with one call. If the FPS remains mostly similar, it means you have a bandwidth problem, and the 16-bit vertex data optimization should be a large help.

Also, draw your terrain normally, but move the camera out so that it is only a small part of the screen.
If the FPS increases dramatically, you have a fill-rate problem related to your pixel shader. You could then start examining that.

L. Spiro