Back to General and Gameplay Programming

Difference In Performance

General and Gameplay Programming Programming

Started by fathom88 March 01, 2006 03:01 PM

8 comments, last by Nitage 18 years, 1 month ago

fathom88

180

Author

March 01, 2006 03:01 PM

I have an app. It runs with 30% cpu util on a Pentium M 1.73 with 512 MB. The same app runs at 4% on a P4 3.39 with 2 gig. Is the difference in magnitude of performance standard? Is this what one would expect? Thanks.

Spoonbender

1,258

March 01, 2006 03:07 PM

No, the difference in performance would be in how much it could get done at 100% cpu utilization. In your case, something else is keeping the process from using all the cpu time, and it's a bit hard for us to tell what it is. It *might* be that it just stops once it's done what it needed to (but that's unlikely, because there's not such a big difference in performance between the two cpu's), or, well, something else is determining the cpu usage. Could be anything.

If you want a useful answer, trying giving us some info. Which program, would be a good start. What does it do? How? And how do you measure the cpu usage, and when?

fathom88

180

Author

March 02, 2006 06:22 AM

The app I'm using is a small throw away app that I wrote. It uses an algorithm that will be used inside my OpenGL game. Basically, it is just crunching numbers without drawing anything to the screen. The algorithm takes pixel values stored in arrays and performs an average and places into a result table of arrays. I don't have access to any good benchmarking tools. If you know of any good free ones, please suggest them. I use the task manager window. I know kind of lame. I'm trying to determine if it's the code itself. I don't see why the same code would cause such a difference. My guess would be another app in the background?? Thanks for the reply.

DrEvil

1,151

March 02, 2006 06:28 AM

Probably heavily blocked by memory IO. How big are these arrays of pixels that it's processing, and how much work does it do on them? How is the code accessing the pixels? For maximum speed and cache usage your loops should be looping through them contiguously, hopefully you aren't doing alot of random access.

Omni-Bot: My game/mod independant AI bot framework.

etothex

728

March 02, 2006 06:32 AM

Quote:Original post by DrEvil
Probably heavily blocked by memory IO. How big are these arrays of pixels that it's processing, and how much work does it do on them? How is the code accessing the pixels? For maximum speed and cache usage your loops should be looping through them contiguously, hopefully you aren't doing alot of random access.

True. On the pentium m, everything to do with the I/O performance is related to the cache. Because the memory is (relatively) slow on the pentium m, you have to utilize the huge 2MB cache in order to make up for it.

fathom88

180

Author

March 02, 2006 07:19 AM

My table contains 2.5 million unsigned chars total. To do my calculations, I copy 3 rows at the point of processing (3 rows to get top, bottom, right, left, pixels; think of it like grid paper; the middle row is my row of interest). I copy the result into a result array which is the length of a single row in the table. Then the drawing routine normally would take the result array and plot to the back buffer. For the most part, I loop contiguously unless the image I'm processing is clipped. In this case, I might do table[x][y] to table[x->N][y->N] instead of table[x][y] to table[x->MAX][y->MAX].

To etothex, I'm not sure what you mean by "utilize the huge 2MB cache in order to make up for it." Could you please explain further. Thanks.

DrEvil

1,151

March 02, 2006 07:26 AM

Whoa.

Can you get away without copying the data to a different array? Can you do the calculations and plot them together without the copy of the array. Avoiding the STOR of the temporaries would likely help.

I think he means you should be extra concious of cache misses on a Pentium M due to the slower memory, since it could cost more, time-wise.

Also, this sounds like something well suited for doing in parallel with SSE and such.

Omni-Bot: My game/mod independant AI bot framework.

nmi

978

March 02, 2006 07:27 AM

Have you tried prefetching the memory, so that the data you will need next can be loaded into the cache will the cpu performs the computation ?

Quote:
To do my calculations, I copy 3 rows at the point of processing (3 rows to get top, bottom, right, left, pixels; think of it like grid paper; the middle row is my row of interest).

Why do you copy the data ? Won't three pointers for the input and one for the output work ?

fathom88

180

Author

March 02, 2006 07:43 AM

Thanks for the replies.

>>Also, this sounds like something well suited for doing in >>parallel with SSE and such.

Others have suggested the same thing. Will it help much? I only have a single CPU. Will the work really be done in parallel? Or will SSE simply create more efficient assembly? At any rate, I'll look into it further.

As for why I copy, the table is actually inside a data model object. I get data by doing MyDataModel->GetPixelTable()->GetRawData(char *LocalRow, int start, int end). I guess I can get add a member func to return a pointer to the actual char table.

>>Have you tried prefetching the memory, so that the data you >>will need next can be loaded into the cache will the cpu >>performs the computation ?

I feel stupid for asking, but I'm not sure what you mean. Could you please explain? Thanks.

VC++

Nitage

1,107

March 02, 2006 08:31 AM

SSE doesn't require a multiple CPU machine - SSE is a SIMD instruction set (Single Instruction Multple Data).

Basically, the SSE instructions do 2 or 4 floating point operations in parallel.

edit: spelling of parallel

Difference In Performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Difference In Performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines