I had tried profiling the Release build back when the STL vector usage was still unoptimized (63 FPS in Debug), and the Release build resulted in framerate of 75 FPS.
I just tried the Release build again as many of you suggested, now with the optimized code, and it resulted in framerate of around 200 FPS! I can tell you, only optimizations I have done were that of STL vector usage. Summary of the results:
Before: Debug 63 FPS/Release 75 FPS
After: Debug 133 FPS/Release 200 FPS
This makes me draw a conclusion that Debug build can be indeed used for preliminary profiling. Because if FPS goes up in Debug mode, it will also go up in Release mode (though not necessarily in same proportion, as seen in the results above).
Just in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode. BSP tree for the test map has 172 I-nodes and 173 leaves (well balanced), making this code called 345 times per frame.
tree->m_nodes is a std::vector<BSPNode>.
Before:
void Renderer::RecursiveRender(BSPTree *tree, int node){ if(tree->m_nodes[node].m_isLeaf) { std::vector<Surface_t>::iterator iter; for(iter = tree->m_nodes[node].m_surfaces.begin(); iter != tree->m_nodes[node].m_surfaces.end(); iter++) { glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap); glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count); } return; } RecursiveRender(tree, tree->m_nodes[node].m_iFrontChild); RecursiveRender(tree, tree->m_nodes[node].m_iBackChild);}
After:
void Renderer::RecursiveRender(BSPTree *tree, int node){ BSPNode *pNode = &tree->m_nodes[node]; // Store pointer to the node and use it - much faster if(pNode->m_isLeaf) { std::vector<Surface_t>::iterator iter; for(iter = pNode->m_surfaces.begin(); iter != pNode->m_surfaces.end(); iter++) { glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap); glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count); } return; } RecursiveRender(tree, pNode->m_iFrontChild); RecursiveRender(tree, pNode->m_iBackChild);}