Archived

This topic is now archived and is closed to further replies.

dr_slash_uh

Shifting vs multiplication

Recommended Posts

I tried to use << for my multiplication in my game and for some reason, I get lower performance. I read that its faster to shift(3 years ago). Is that true. Instead of me multiplying x * 32; I am shifting x << 5;

Share this post


Link to post
Share on other sites
On modern compilers the performance will be the same. It is true that a shift is faster, but the compiler will optimze a multiplication by 2^n to a << n. So use whatever makes more sense in your code.

Share this post


Link to post
Share on other sites
Bitshifting is MUCH faster than conventional multiplication even if you''re running on a 3 GHz P4 or an AMD64. When you know you''re gonna be multiplying/dividing by powers of 2, use the bitshift operator because you can''t always expect the compiler to do all these optimizations for you (it might, it might not but I''d rather not leave things to chance).

Share this post


Link to post
Share on other sites
quote:
Original post by Odoacer
..because you can''t always expect the compiler to do all these optimizations for you


It really ought to. That''s one of the simplest and easiest optimisation techniques around, and if your compiler doesn''t do this one then perhaps you should consider changing compilers.

Share this post


Link to post
Share on other sites
I''m doing a tutorial, and it defines RGB16(r,g,b) as:

((b%32) + ((g%64) << 6) + ((r%32) << 11))

Can someone please explain how this gives you a color? Start from the ground up (probably necessary).

I tried plugging in some pretend values for r,g and b. (255,0,0) and got the answer of 71712 (doing the math myself). Is that how it''s supposed to work? What does 71712 mean?

Share this post


Link to post
Share on other sites
quote:
Original post by True Edge
I''m doing a tutorial, and it defines RGB16(r,g,b) as:

((b%32) + ((g%64) << 6) + ((r%32) << 11))

Can someone please explain how this gives you a color? Start from the ground up (probably necessary).

I tried plugging in some pretend values for r,g and b. (255,0,0) and got the answer of 71712 (doing the math myself). Is that how it''s supposed to work? What does 71712 mean?


This formula is designed to "pack" your r,g,b values into a 16-bit value: rrrrrggggggbbbbb. The result should thus be in the range 0-65535. The lowest-order bits of each input value are used, because of the modulus values.

Bit-shifting is used here instead of multiplication when writing the expression, because the purpose is indeed to shift the bits around. I would write the modulos using bitwise AND, personally; that should again be faster (but again an optimization the compiler should be able to do) and clearer about the intent. The +''s could even be bitwise ORs instead, since the intent is that each input (r,g,b) has its own ''field'' in the 16-bit value with no overlap. So we can instead write

((b & 31) | ((g & 63) << 6) | ((r & 32) << 11))

Common 16-bit colour models assign 5 bits of precision to the red level, 6 to green and 5 to blue. I couldn''t tell you why green is picked to be favoured, but I assume it has something to do with typical display characteristics.

I will work through the example with your values of (255,0,0).

1) b % 32 (b & 31) = 0.
2) g % 64 (g & 63) = 0.
3) r % 32 (r & 31) = 31. The low-order 5 bits are kept.
4) The arithmetic shift results in 31 * 2^11 = 31 * 2048 = 63488.
5) The addition (or logical OR) makes no change. I have no idea how you came up with 71712.

Showing the individual bits:

The function takes in r: 111[11111] g: 00[000000] b: 000[00000]
and squishes the bracketed parts together: [11111][000000][00000].

HTH!



Share this post


Link to post
Share on other sites
Thanks a lot. Now, I have another problem. I know it looked like I was further along, asking what the RGB16 was. But the truth is I''m still trying to initialize DirectDraw, and it keeps on hanging on me giving me this error:

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

I have no idea what it''s talking about, here''s my code:

#define INITGUID
#define WIN32_LEAN_AND_MEAN
#define CLASS_NAME "DirectX Game"
#define WINDOW_NAME "DirectX Game"
#define SCREEN_WIDTH 640
#define SCREEN_HEIGHT 480
#define SCREEN_BPP 16
#define _RGB16BIT565(r,g,b) ((b%32) + ((g%64) << 6) + ((r%32) << 11))

#include <ddraw.h>
#include <windows.h>
#include <windowsx.h>
#include "MainFctns.h"

//Direct X globals
LPDIRECTDRAW lpDD = NULL;
LPDIRECTDRAW lpDD4 = NULL;
LPDIRECTDRAWSURFACE7 lpDDsPrimary = NULL;
LPDIRECTDRAWSURFACE7 lpDDsBack = NULL;
DDSURFACEDESC2 ddsd;

//Windows Storage Variables
WNDCLASSEX winclass;
HWND hwnd;
MSG msg;

bool flagInit = false;
bool flagGameInit = false;

LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wparam, LPARAM lparam)
{
PAINTSTRUCT ps;
HDC hdc;

switch(msg)
{

case WM_CREATE:
{
return(0);
} break;

case WM_PAINT:
{
hdc = BeginPaint(hwnd, &ps);
EndPaint(hwnd, &ps);
return(0);
} break;

case WM_DESTROY:
{
Game_Shutdown();
PostQuitMessage(0);
return(0);
} break;
}
return (DefWindowProc(hwnd, msg, wparam, lparam));
}

int WINAPI WinMain(HINSTANCE hinstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int ncmdshow)
{

//set up class
winclass.cbSize = sizeof(WNDCLASSEX);
winclass.cbClsExtra = 0;
winclass.cbWndExtra = 0;
winclass.hbrBackground = (HBRUSH)(GetStockObject(BLACK_BRUSH));
winclass.hIcon = LoadIcon(hinstance, IDI_APPLICATION);
winclass.hCursor = LoadCursor(hinstance, IDC_ARROW);
winclass.hIconSm = LoadIcon(hinstance, IDI_APPLICATION);
winclass.hInstance = hinstance;
winclass.lpfnWndProc = WndProc;
winclass.lpszClassName = CLASS_NAME;
winclass.lpszMenuName = NULL;
winclass.style = CS_HREDRAW | CS_VREDRAW | CS_DBLCLKS | CS_OWNDC;

//register class
RegisterClassEx(&winclass);

//create window
hwnd = CreateWindowEx(NULL,
CLASS_NAME,
WINDOW_NAME,
WS_POPUP | WS_VISIBLE,
0,0, //window position
640,480, //window size
NULL, //parent
NULL, //menu
hinstance,
NULL); //extra
ShowWindow(hwnd, ncmdshow);

flagInit = true;

while(flagInit == false)
{;}
Game_Init();
while(flagGameInit == false)
{;}

while(1)
{
if (PeekMessage(&msg,NULL,0,0,PM_REMOVE))
{
// test if this is a quit
if (msg.message == WM_QUIT)
break;

// translate any accelerator keys
TranslateMessage(&msg);
// send the message to the window proc
DispatchMessage(&msg);
}
Game_Main();
} //end while

return(0);
}

void Game_Init()
{
if(FAILED(DirectDrawCreate(NULL, &lpDD, NULL)))
{
MessageBox(hwnd, "Can''t create direct draw object", "Error", MB_OK);
}

if(FAILED(lpDD->QueryInterface(IID_IDirectDraw4, (LPVOID *) &lpDD4)))
{
MessageBox(hwnd, "Can''t upgrade direct draw object", "Error", MB_OK);
}

if(FAILED(lpDD4->SetCooperativeLevel(hwnd, DDSCL_EXCLUSIVE | DDSCL_FULLSCREEN | DDSCL_ALLOWREBOOT | DDSCL_ALLOWMODEX)))
{
MessageBox(hwnd, "Could not set cooperative level.","Error",MB_OK);
}

if(FAILED(lpDD4->SetDisplayMode(SCREEN_WIDTH, SCREEN_HEIGHT, SCREEN_BPP)))
{
MessageBox(hwnd, "Could not set display mode","Error",MB_OK);
}

memset(&ddsd,0,sizeof(ddsd));
ddsd.dwSize = sizeof(ddsd);

ddsd.dwFlags = DDSD_CAPS | DDSD_BACKBUFFERCOUNT;
ddsd.dwBackBufferCount = 1;

ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE | DDSCAPS_FLIP | DDSCAPS_COMPLEX;


if(FAILED(lpDD4->CreateSurface((LPDDSURFACEDESC)&ddsd,(LPDIRECTDRAWSURFACE *)&lpDDsPrimary,NULL)))
{
MessageBox(hwnd,"Could not create primary surface","Error",MB_OK);
}

flagGameInit = true;
}

void Game_Shutdown()
{
if(lpDD4)
lpDD4->Release();

if(lpDDsPrimary)
lpDDsPrimary->Release();
}

void Game_Main()
{
//does nothing right now
}

Share this post


Link to post
Share on other sites
I can''t help you on the error thing (sorry, still a newbie myself) but i can clarify some things...

((b%32) + ((g%64) << 6) + ((r%32) << 11))

doesn''t work. Try using 128,0,0; and you''ll end up with 0. (128%32) << 11 = 0). Not sure about the bitwise AND and OR operators, not familiar enough with those.
This is how DirectX''s solution works (for 32bit color), maybe that helps:
D3DCOLOR_ARGB(a,r,g,b) \
((D3DCOLOR)((((a)&0xff)<<24)|(((r)&0xff)<<16)|(((g)&0xff)<<8)|((b)&0xff)))

There''s plenty of 16bit formats including D3DFMT_R5G6B5, D3DFMT_X1R5G5B5, D3DFMT_A1R5G5B5, and D3DFMT_A4R4G4B4.
Also, the reason 6 bits are used for green and only 5 for red and blue (in R5G6B5) is because studies have proven the human eyes are more sensitive to green (or so i''ve read... or heard... yea).

Share this post


Link to post
Share on other sites
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.



“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
— Brian W. Kernighan (C programming language co-inventor)

Share this post


Link to post
Share on other sites
quote:
Original post by Fruny
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.


Now why did they go and do that?

Share this post


Link to post
Share on other sites
quote:
Original post by Fruny
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.


"Barrel" shifters?

Share this post


Link to post
Share on other sites
A barrel shifter is a small digital circuit inside the CPU that can shifts its input left or right in a single cycle regardless of the distance shifted. A conventional shifter would look like:

for (i = shift_amount ; i ; --i)
{
value >>= 1; // or <<=, and there may be wrapping as well
}

where each iteration of the loop would be one clock tick (you may get several clock ticks per CPU cycle). The main disadvantage to the barrel shifter is that it takes up much more space on the silicon die.

On the P4, the instruction pipeline is much longer so even though the traditional method is used, the extra time to compute the answer can be negated by good use of the integer pipelines (there''s two working simultaneously don''t forget). Of course, you''d need a P4 aware optimisming C/C++ compiler to get this. Thus, the space saved on the silicon die by using the traditional method would allow for other improvements to be added.

Skizz

Share this post


Link to post
Share on other sites