Sign in to follow this  
NewUser13

Make function faster (RGB-YUV)

Recommended Posts

NewUser13    100
Hello, I have a function which converts pixels from an RGB surface to a YUV surface. What the fuck. Where are the code tags. if (m_overlay_back_surface->Lock(NULL,&ddsd,DDLOCK_WAIT,NULL) == DD_OK ) { if (m_offscreen_surface->Lock(NULL,&ddsd_offscreen,DDLOCK_WAIT,NULL) == DD_OK ) { CopyRGBSurfaceToYUVSurface(&ddsd_offscreen, &ddsd, UYVY); m_offscreen_surface->Unlock(NULL); } m_overlay_back_surface->Unlock(NULL); } bool overlay_renderer_t::CopyRGBSurfaceToYUVSurface( LPDDSURFACEDESC2 pddsd1, LPDDSURFACEDESC2 pddsd2, fourcc_enum eOverlayFormat) { if (pddsd1->dwWidth != pddsd2->dwWidth) return false; if (pddsd1->dwHeight != pddsd2->dwHeight) return false; DWORD w = pddsd1->dwWidth; DWORD h = pddsd1->dwHeight; LONG pitch1 = pddsd1->lPitch; LONG pitch2 = pddsd2->lPitch; unsigned __int32 *pPixels1 = (unsigned __int32 *)pddsd1->lpSurface; unsigned __int32 *pPixels2 = (unsigned __int32 *)pddsd2->lpSurface; unsigned __int32 color1; LONG offset1 = 0; LONG offset2 = 0; unsigned int R, G, B, i1, i2, i3, i4; BYTE yuv[4]; if (eOverlayFormat == UYVY) // U Y V Y { i1 = 1; i2 = 0; i3 = 3; i4 = 2; } else // Y U Y 2 { i1 = 0; i2 = 1; i3 = 2; i4 = 3; } // Go through the image 2 pixels at a time and convert to YUV for (unsigned int y=0; y<h; y++) { offset1 = y*pitch1/4; offset2 = y*pitch2/4; for (unsigned int x=0; x<w; x+=2) { color1 = pPixels1[offset1++]; B = (color1) & 0xFF; G = (color1 >> 8) & 0xFF; R = (color1 >> 16) & 0xFF; yuv[i1] = (( 66*R + 129*G + 25*B + 128)>>8)+ 16; yuv[i2] = ((-38*R - 74*G + 112*B + 128)>>8)+128; color1 = pPixels1[offset1++]; B = (color1) & 0xFF; G = (color1 >> 8) & 0xFF; R = (color1 >> 16) & 0xFF; yuv[i3] = (( 66*R + 129*G + 25*B + 128)>>8)+ 16; yuv[i4] = ((112*R - 94*G - 18*B + 128)>>8)+128; pPixels2[offset2++] = *((unsigned __int32 *)yuv); } } return true; } This function is too slow for my needs (depending on the screen size up to a second is required to convert all pixels to YUV). Are there any tricks to make it faster?

Share this post


Link to post
Share on other sites
Hinch    244
First thing you need to do is put timing code around the different part of your functions to see which parts are slowest - you might find the Locks are actually a lot slower than the conversion process itself, you might find that one of the Locks is much slower than the other.

Doing the conversion in a shader will speed up the conversion part, but if you need to recover the converted buffer back to system memory then you'll still have the cost of the Locks to contend with. If you don't need the converted buffer in system memory, you should definitely use a shader, otherwise you're copying from GPU to system memory, doing the conversion, then copying from system memory back to GPU memory again, which is nuts from a performance perspective.

Share this post


Link to post
Share on other sites
Sneftel    1788
Break the swizzling into its own function. Writing to those non-constant indices is going to slow you down a lot. Oh, and you can move that +16, +28 inside the left hand side of the shift. Finally, marking pPixels1/2 as restricted could possibly help, though it isn't a slam dunk here.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this