Back to General and Gameplay Programming

vector is very slow

General and Gameplay Programming Programming

Started by derek7 April 30, 2006 01:48 AM

29 comments, last by Nitage 17 years, 11 months ago

derek7

100

Author

April 30, 2006 01:48 AM


	                        std::vector<DWORD> pixellist(1024 * 1024);
				D3DLOCKED_RECT  lockrect;
				
				destsurface->LockRect(&lockrect, 0, 0);
				
				DWORD *dword= (DWORD*)lockrect.pBits;
				
				for(int i =0;i<desc.Height;i++)
				{
					for(int j=0;j<desc.Width;j++)
					{
 
						int index = i*lockrect.Pitch/4 + j;

						pixellist[index] = dword[index];
						

					}
				}

pixellist[index] is called many time. it is very slow. if you use DWORD pixellist[index] , it is very quick. where is the problem?

Washu

7,836

April 30, 2006 01:51 AM

Well, if its "very" slow, then chances are you're working in Debug mode, which means that std::vector::operator[] will be checking your bounds and doing many other things to ensure that you don't do something stupid, like access outside of the bounds of the vector.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Adam Hamilton

271

April 30, 2006 02:28 AM

Why don't you use new to allocate the DWORDS and then use a series of memcpy() calls to fill it in row by row. If your lucky the compiler will inline the memcpy calls because the compiler can treat these as intrinsic functions.

On the plus side, you remove one loop and I think the compiler will move up to 4 bytes at a time.

Edit: Also, if you align the staring address of your pixel list, you can gain a little more performance because every write will be on an aligned address.

etothex

728

April 30, 2006 03:17 AM

In general STL with optimizations off (especially inlining), as in debug mode, there is considerable overhead and your program will run really, really slow even doing the simplest things. Turn optimizations on and most of the overhea goes away.

Quote:Original post by Washu
std::vector::operator[] will be checking your bounds and doing many other things to ensure that you don't do something stupid, like access outside of the bounds of the vector.

Depends on implementation. The C++ standard only requires vector<T>::at() to do bounds checking, not necessarily vector<T>::operator[]. Doing v with i > v.size() is undefined behavior.

While some STL implementaton just make [] the same function as .at(), some cut corners and don't, so then [] is just as efficient as unchecked C-arrays.

Deyja

920

April 30, 2006 08:28 AM

It would be a pretty lame implementation if it didn't do bounds checking in operator[] in debug mode. And since the implementations that come with both VC++ and GCC do it, it's pretty same to assume his does too.

http://jemgine.omnisu.com

Anonymous

April 30, 2006 08:51 AM

It's clearly the division you've got inside your index calculation, move this outside of your loops, it's a complete waste of cpu cycles to calculate it ever and ever again.

Thelo

139

April 30, 2006 09:06 AM

Quote:Original post by Anonymous Poster
It's clearly the division you've got inside your index calculation, move this outside of your loops, it's a complete waste of cpu cycles to calculate it ever and ever again.

A division by 4 is likely to be optimized to a bitshift, which is unlikely to be a great cause of performance loss (one cycle for every iteration).

Extrarius

1,412

April 30, 2006 10:23 AM

Quote:Original post by Thelo
Quote:Original post by Anonymous Poster
It's clearly the division you've got inside your index calculation, move this outside of your loops, it's a complete waste of cpu cycles to calculate it ever and ever again.

A division by 4 is likely to be optimized to a bitshift, which is unlikely to be a great cause of performance loss (one cycle for every iteration).

Calculating something that is loop invariant more than a single time would be a waste of CPU time. Using a compiler that doesn't optimize such things and/or optimizing it manually would be a waste of programmer time. Trying to optimize without profiling would also be a complete waste of programmer time, and considering such details as to whether it's implemented as a multiply, a shift, a load effective address, or anything else is a worse waste of programmer time.

"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

_swx_

1,138

April 30, 2006 10:28 AM

Is that run every frame? In that case, you might want to make the vector static... your will allocate new memory every time the function is run.

Thelo

139

April 30, 2006 11:53 AM

I've done some benchmarks using similar code to the above, and the numbers are about in these proportions:

Using the original method: 460~480
Putting the vector static: 38~46
Using a pure array: 26~28

Putting the division inside the loop or outside it doesn't affect anything.

Since this is a very large list, that this code snippet will probably be used pretty often and potentially be a performance bottleneck (as implied by the names referring to pixels and surfaces), and that the size of the vector/array will be known at creation and fixed for the duration of the operation (again, it's a list of the pixels of a given surface), I'd simply recommend using a pure array in this situation.

vector is very slow

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

vector is very slow

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines