Jump to content
  • Advertisement


This topic is now archived and is closed to further replies.


need to optimize this code...badly

This topic is 5244 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am working on a program whose main function is to read in a large (possibly VERY large) array of floating point data and then create a bitmap from this data so that it can be displayed to the screen. Unfortunately, my code is SLOW. One of the files with the floating point values has over 16 million pieces of data...converting all of these to bytes that can be used to generate a bitmap is time consuming indeed. Anyway, here is what I'm currently doing: 1) Read the floating point data from file into rawData. 2) Calculate the max and min values in rawData. 3) Using max and min, determine the slope and y-intercept of a line that maps min to 0 and max to 255 and all the other points in between. (i.e., byteData[ i ] = rawData[ i ] * slope + b) 4) Using a loop, apply the above formula to every single piece of data in rawData to generate the corresponding byteData equivalent. 5) Create a bitmap using byteData so that the user can view a graphical representation of the original rawData. One thing to keep in mind is that I have no control over the format in which the original data is stored (it is always in float format). Also, in my loop, rawData[ i ], slope, and b are all floating point format, which is very computationally expensive. I have used a Linux version of a similar program, and it can load and display the same 16 million data point file practically instaneously. My program currently takes on the order of about 2 or 3 minutes on my 1.8G. My program is fine as far as what it does, but horrible in terms of speed and efficiency. Any suggestions on ways to speed this up are appreciated. [edited by - StonieJ on June 8, 2004 8:15:29 AM]

Share this post

Link to post
Share on other sites
Well the first thing is to profile you code to find out where the bottlenecks are.

Share this post

Link to post
Share on other sites
Is your image 256 pixels high and 16 million pixels wide ????

Well anyway :

- do you use the full optimization options ?

If it takes that much time I doubt. This gives 180 seconds * 1,8 billion cycles for 16 million iterations. roughly 200 cycles per iteration. It should be 20 times faster.

- make sure the compiler does not call ftoi() which is one of the most reknown perf killers on any compiler. Normally some optimizations replace it by much faster code. Big number trick or a single asm instruction.

and b are all floating point format, which is very computationally expensive.
Now this is very fast on any machine since 1995, you have no better option except using SIMD.

Your potential performance leaks come from, sorted by importance :
- ftoi
- debug info, killing optimizations.
- no loop unrolling
- cache (I doubt)

Next what you can do is unroll by 4, and accumulate four bytes into a 32 bit int with ors and shifts. Then write 4 bytes at once. Storing bytes one by one is a known perf killer too.

Share this post

Link to post
Share on other sites
Hey all, thanks for the replies. Give me about an hour or so and I''ll get back with you on the questions. Need to get somewhere right quick though.

Share this post

Link to post
Share on other sites
Do you really need to calculate the full array at once ?
Since you display a bitmap, you may calculate only the part that is displayed. At resolution 800*600, this is only 480000 pieces of data instead of 16 million.

Do you really need to calculate the min and max for the full map ?
- Yes if each data is mapped against a color to represent levels against an absolute (like representation of a geographical map with color indication of the sea level). (1)
- No if each data is only coded to give a relative indication of levels. (2)

(1)- You calculate the min and max for the full map, and each time you render a bitmap, you convert only the chunck you see.
float* DataArray //full array size
uchar* DispArray //full array size
Conv2Byte(DispArray, Starting Coord, Width, Height, DataArray)
RenderBitmap(DispArray, Starting Coord, Width, Height)

(2)- You calculate the min and max for the local area and each time you render a bitmap you convert on the fly.
float* DataArray //full array size
uchar* DispArray //bitmap size only
GetMinMax(Starting Coord, Width, Height, DataArray)
Conv2Bitmap(DispArray, Starting Coord, Width, Height, DataArray)
If the area is scrollable, there is just a little more things to do like checking if the newly scrolled area is still within the min and max value. If Min and Max are redefined, convert back the full bitmap area.

Hope that helps.

Ghostly yours,

[edited by - Red Ghost on June 8, 2004 8:52:29 AM]

Share this post

Link to post
Share on other sites
I threw this together in a few minutes while doing a long build to simulate the described problem. It''s barely optimized at all and the transformation is quite quick (< 2 seconds on 1.7GHz pentium 3). In this case the output is:

Time to generate data 64 seconds
Time to read data 48 seconds
Time to calc min/max of data 0 seconds
Time to convert data to bytes 1 seconds

In this code the bottleneck is definately the I/O. I strongly suspect that the OP''s problem probably is IO also.

#include <algorithm>
#include <iostream>
#include <fstream>
#include <vector>
#include <cstdlib>
#include <ctime>

void GenerateData( std::string filename, int numberOfFloats )
unlink( filename.c_str() );

std::ofstream output( filename.c_str() );

// First number in the file is the number of floats

output << numberOfFloats << "\n";

// Write random numbers to the file

for ( int i = 0; i < numberOfFloats; i++ )
float value = rand() / (float)RAND_MAX;
output << value << "\n";


std::vector<float>* ReadData( std::string filename )
std::vector<float>* data = new std::vector<float>();

std::ifstream input( filename.c_str() );
int numberOfFloats = 0;

input >> numberOfFloats;

data->reserve( numberOfFloats );

float value = 0.0F;
for ( int i = 0; i < numberOfFloats; i++ )
input >> value;
data->push_back( value );

return data;

typedef unsigned char byte;

// Generate

time_t timeStart = time( 0 );
GenerateData( "Data.dat", 16000000 );
std::cout << "Time to generate data " << (time(0) - timeStart) << " seconds\n";

// Read

timeStart = time( 0 );
std::vector<float>* floatData = ReadData( "Data.dat" );
std::cout << "Time to read data " << (time(0) - timeStart) << " seconds\n";

// Min/Max

timeStart = time( 0 );
float dataMin = *(std::min_element( floatData->begin(), floatData->end() ));
float dataMax = *(std::max_element( floatData->begin(), floatData->end() ));
std::cout << "Time to calc min/max of data " << (time(0) - timeStart) << " seconds\n";

// Tranform floats to bytes

timeStart = time( 0 );
std::vector<byte> byteData;
byteData.reserve( floatData->size() );
for ( std::vector<float>::iterator it = floatData->begin();
it != floatData->end();
++it )
byte value = (byte)(255.F * (*it - dataMin) / dataMax);
byteData.push_back( value );
std::cout << "Time to convert data to bytes " << (time(0) - timeStart) << " seconds\n";

return 0;

Share this post

Link to post
Share on other sites
Has the data been saved in binary format? I''m not sure if it would make any difference on performance, but try it out.

Share this post

Link to post
Share on other sites
Thanks guys, I appreciate all the feedback. Now to answer some questions.

@ Charles B:

I have to program optimization set to Full Debug...i.e., no optimizations are used. However, even after rebuilding the program with Release status (i.e. for speed), I see no change in performance.

@ Red Ghost:

I think that only doing calculations for the currently viewable area is the way to go. Originally, the program did exactly that, however it was too slow for some reason. For example, every time the user scrolled the image on the screen, the new coordinates would be sent to a handler function that would convert the appropriate area of float data to byte data and then generate the bitmap out of that. This took too long however, and the result was very laggy. It took at least a full second to convert just the 500 by 500 area or so of flat data to byte data. Therefore, I decided to calculate all the byte data and create the entire bitmap offscreen beforehand...then I could just blit this offscreen bitmap to the screen at whatever coordinates were necessary.

@ MauMan

That''s a pretty impressive float->byte conversion rate. For some reason it is taking me .8 seconds just to convert 500 x 500. Let me try using your way of converting to decrease the time and see what happens.

Share this post

Link to post
Share on other sites

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!