Difference In Performance

Started by
8 comments, last by Nitage 18 years, 1 month ago
I have an app. It runs with 30% cpu util on a Pentium M 1.73 with 512 MB. The same app runs at 4% on a P4 3.39 with 2 gig. Is the difference in magnitude of performance standard? Is this what one would expect? Thanks.
Advertisement
No, the difference in performance would be in how much it could get done at 100% cpu utilization. In your case, something else is keeping the process from using all the cpu time, and it's a bit hard for us to tell what it is. It *might* be that it just stops once it's done what it needed to (but that's unlikely, because there's not such a big difference in performance between the two cpu's), or, well, something else is determining the cpu usage. Could be anything.

If you want a useful answer, trying giving us some info. Which program, would be a good start. What does it do? How? And how do you measure the cpu usage, and when?
The app I'm using is a small throw away app that I wrote. It uses an algorithm that will be used inside my OpenGL game. Basically, it is just crunching numbers without drawing anything to the screen. The algorithm takes pixel values stored in arrays and performs an average and places into a result table of arrays. I don't have access to any good benchmarking tools. If you know of any good free ones, please suggest them. I use the task manager window. I know kind of lame. I'm trying to determine if it's the code itself. I don't see why the same code would cause such a difference. My guess would be another app in the background?? Thanks for the reply.
Probably heavily blocked by memory IO. How big are these arrays of pixels that it's processing, and how much work does it do on them? How is the code accessing the pixels? For maximum speed and cache usage your loops should be looping through them contiguously, hopefully you aren't doing alot of random access.
Quote:Original post by DrEvil
Probably heavily blocked by memory IO. How big are these arrays of pixels that it's processing, and how much work does it do on them? How is the code accessing the pixels? For maximum speed and cache usage your loops should be looping through them contiguously, hopefully you aren't doing alot of random access.


True. On the pentium m, everything to do with the I/O performance is related to the cache. Because the memory is (relatively) slow on the pentium m, you have to utilize the huge 2MB cache in order to make up for it.
My table contains 2.5 million unsigned chars total. To do my calculations, I copy 3 rows at the point of processing (3 rows to get top, bottom, right, left, pixels; think of it like grid paper; the middle row is my row of interest). I copy the result into a result array which is the length of a single row in the table. Then the drawing routine normally would take the result array and plot to the back buffer. For the most part, I loop contiguously unless the image I'm processing is clipped. In this case, I might do table[x][y] to table[x->N][y->N] instead of table[x][y] to table[x->MAX][y->MAX].

To etothex, I'm not sure what you mean by "utilize the huge 2MB cache in order to make up for it." Could you please explain further. Thanks.
Whoa.

Can you get away without copying the data to a different array? Can you do the calculations and plot them together without the copy of the array. Avoiding the STOR of the temporaries would likely help.

I think he means you should be extra concious of cache misses on a Pentium M due to the slower memory, since it could cost more, time-wise.

Also, this sounds like something well suited for doing in parallel with SSE and such.
Have you tried prefetching the memory, so that the data you will need next can be loaded into the cache will the cpu performs the computation ?

Quote:
To do my calculations, I copy 3 rows at the point of processing (3 rows to get top, bottom, right, left, pixels; think of it like grid paper; the middle row is my row of interest).

Why do you copy the data ? Won't three pointers for the input and one for the output work ?
Thanks for the replies.

>>Also, this sounds like something well suited for doing in >>parallel with SSE and such.

Others have suggested the same thing. Will it help much? I only have a single CPU. Will the work really be done in parallel? Or will SSE simply create more efficient assembly? At any rate, I'll look into it further.

As for why I copy, the table is actually inside a data model object. I get data by doing MyDataModel->GetPixelTable()->GetRawData(char *LocalRow, int start, int end). I guess I can get add a member func to return a pointer to the actual char table.

>>Have you tried prefetching the memory, so that the data you >>will need next can be loaded into the cache will the cpu >>performs the computation ?

I feel stupid for asking, but I'm not sure what you mean. Could you please explain? Thanks.

VC++
SSE doesn't require a multiple CPU machine - SSE is a SIMD instruction set (Single Instruction Multple Data).

Basically, the SSE instructions do 2 or 4 floating point operations in parallel.

edit: spelling of parallel

This topic is closed to new replies.

Advertisement