#### Archived

This topic is now archived and is closed to further replies.

# Shifting vs multiplication

## Recommended Posts

dr_slash_uh    122
I tried to use << for my multiplication in my game and for some reason, I get lower performance. I read that its faster to shift(3 years ago). Is that true. Instead of me multiplying x * 32; I am shifting x << 5;

##### Share on other sites
twanvl    512
On modern compilers the performance will be the same. It is true that a shift is faster, but the compiler will optimze a multiplication by 2^n to a << n. So use whatever makes more sense in your code.

##### Share on other sites
Odoacer    140
Bitshifting is MUCH faster than conventional multiplication even if you''re running on a 3 GHz P4 or an AMD64. When you know you''re gonna be multiplying/dividing by powers of 2, use the bitshift operator because you can''t always expect the compiler to do all these optimizations for you (it might, it might not but I''d rather not leave things to chance).

##### Share on other sites
Doc    586
quote:
Original post by Odoacer
..because you can''t always expect the compiler to do all these optimizations for you

It really ought to. That''s one of the simplest and easiest optimisation techniques around, and if your compiler doesn''t do this one then perhaps you should consider changing compilers.

##### Share on other sites
dr_slash_uh    122
I have a xp2600 and I fet about 350 frames with regualr multiplicationa and with shifting it drops to around 300. Maybe It my compiler options?

Bless

##### Share on other sites
True Edge    122
I''m doing a tutorial, and it defines RGB16(r,g,b) as:

((b%32) + ((g%64) << 6) + ((r%32) << 11))

Can someone please explain how this gives you a color? Start from the ground up (probably necessary).

I tried plugging in some pretend values for r,g and b. (255,0,0) and got the answer of 71712 (doing the math myself). Is that how it''s supposed to work? What does 71712 mean?

##### Share on other sites
Zahlman    1682
quote:
Original post by True Edge
I''m doing a tutorial, and it defines RGB16(r,g,b) as:

((b%32) + ((g%64) << 6) + ((r%32) << 11))

Can someone please explain how this gives you a color? Start from the ground up (probably necessary).

I tried plugging in some pretend values for r,g and b. (255,0,0) and got the answer of 71712 (doing the math myself). Is that how it''s supposed to work? What does 71712 mean?

This formula is designed to "pack" your r,g,b values into a 16-bit value: rrrrrggggggbbbbb. The result should thus be in the range 0-65535. The lowest-order bits of each input value are used, because of the modulus values.

Bit-shifting is used here instead of multiplication when writing the expression, because the purpose is indeed to shift the bits around. I would write the modulos using bitwise AND, personally; that should again be faster (but again an optimization the compiler should be able to do) and clearer about the intent. The +''s could even be bitwise ORs instead, since the intent is that each input (r,g,b) has its own ''field'' in the 16-bit value with no overlap. So we can instead write

((b & 31) | ((g & 63) << 6) | ((r & 32) << 11))

Common 16-bit colour models assign 5 bits of precision to the red level, 6 to green and 5 to blue. I couldn''t tell you why green is picked to be favoured, but I assume it has something to do with typical display characteristics.

I will work through the example with your values of (255,0,0).

1) b % 32 (b & 31) = 0.
2) g % 64 (g & 63) = 0.
3) r % 32 (r & 31) = 31. The low-order 5 bits are kept.
4) The arithmetic shift results in 31 * 2^11 = 31 * 2048 = 63488.
5) The addition (or logical OR) makes no change. I have no idea how you came up with 71712.

Showing the individual bits:

The function takes in r: 111[11111] g: 00[000000] b: 000[00000]
and squishes the bracketed parts together: [11111][000000][00000].

HTH!

##### Share on other sites
True Edge    122
Thanks a lot. Now, I have another problem. I know it looked like I was further along, asking what the RGB16 was. But the truth is I''m still trying to initialize DirectDraw, and it keeps on hanging on me giving me this error:

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

I have no idea what it''s talking about, here''s my code:

#define INITGUID
#define WIN32_LEAN_AND_MEAN
#define CLASS_NAME "DirectX Game"
#define WINDOW_NAME "DirectX Game"
#define SCREEN_WIDTH 640
#define SCREEN_HEIGHT 480
#define SCREEN_BPP 16
#define _RGB16BIT565(r,g,b) ((b%32) + ((g%64) << 6) + ((r%32) << 11))
#include <ddraw.h>#include <windows.h>#include <windowsx.h>#include "MainFctns.h"//Direct X globalsLPDIRECTDRAW lpDD = NULL;LPDIRECTDRAW lpDD4 = NULL;LPDIRECTDRAWSURFACE7 lpDDsPrimary = NULL;LPDIRECTDRAWSURFACE7 lpDDsBack = NULL;DDSURFACEDESC2 ddsd;//Windows Storage VariablesWNDCLASSEX winclass;HWND hwnd;MSG msg;bool flagInit = false;bool flagGameInit = false;LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wparam, LPARAM lparam){	PAINTSTRUCT ps;	HDC hdc;	switch(msg)	{	case WM_CREATE:		{			return(0);		} break;	case WM_PAINT:		{			hdc = BeginPaint(hwnd, &ps);			EndPaint(hwnd, &ps);			return(0);		} break;	case WM_DESTROY:		{			Game_Shutdown();			PostQuitMessage(0);			return(0);		} break;	}	return (DefWindowProc(hwnd, msg, wparam, lparam));}int WINAPI WinMain(HINSTANCE hinstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int ncmdshow){		//set up class	winclass.cbSize = sizeof(WNDCLASSEX);	winclass.cbClsExtra = 0;	winclass.cbWndExtra = 0;	winclass.hbrBackground = (HBRUSH)(GetStockObject(BLACK_BRUSH));	winclass.hIcon = LoadIcon(hinstance, IDI_APPLICATION);	winclass.hCursor = LoadCursor(hinstance, IDC_ARROW);	winclass.hIconSm = LoadIcon(hinstance, IDI_APPLICATION);	winclass.hInstance = hinstance;	winclass.lpfnWndProc = WndProc;	winclass.lpszClassName = CLASS_NAME;	winclass.lpszMenuName = NULL;	winclass.style = CS_HREDRAW | CS_VREDRAW | CS_DBLCLKS | CS_OWNDC;		//register class	RegisterClassEx(&winclass);	//create window	hwnd = CreateWindowEx(NULL,										 CLASS_NAME,										 WINDOW_NAME,										 WS_POPUP | WS_VISIBLE,										 0,0, //window position										 640,480, //window size										 NULL, //parent										 NULL, //menu										 hinstance,										 NULL); //extra	ShowWindow(hwnd, ncmdshow);		flagInit = true;		while(flagInit == false)	{;}	Game_Init();	while(flagGameInit == false)	{;}	while(1)	{		if (PeekMessage(&msg,NULL,0,0,PM_REMOVE))		{			// test if this is a quit			if (msg.message == WM_QUIT)				break;			// translate any accelerator keys			TranslateMessage(&msg);			// send the message to the window proc			DispatchMessage(&msg);		} 		Game_Main();	} //end while	return(0);}void Game_Init(){	if(FAILED(DirectDrawCreate(NULL, &lpDD, NULL)))	{		MessageBox(hwnd, "Can''t create direct draw object", "Error", MB_OK);	}	if(FAILED(lpDD->QueryInterface(IID_IDirectDraw4, (LPVOID *) &lpDD4)))	{		MessageBox(hwnd, "Can''t upgrade direct draw object", "Error", MB_OK);	}	if(FAILED(lpDD4->SetCooperativeLevel(hwnd, DDSCL_EXCLUSIVE | DDSCL_FULLSCREEN | DDSCL_ALLOWREBOOT | DDSCL_ALLOWMODEX)))	{		MessageBox(hwnd, "Could not set cooperative level.","Error",MB_OK);	}	if(FAILED(lpDD4->SetDisplayMode(SCREEN_WIDTH, SCREEN_HEIGHT, SCREEN_BPP)))	{		MessageBox(hwnd, "Could not set display mode","Error",MB_OK);	}	memset(&ddsd,0,sizeof(ddsd));	ddsd.dwSize = sizeof(ddsd);	ddsd.dwFlags = DDSD_CAPS | DDSD_BACKBUFFERCOUNT;	ddsd.dwBackBufferCount = 1;	ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE | DDSCAPS_FLIP | DDSCAPS_COMPLEX;		if(FAILED(lpDD4->CreateSurface((LPDDSURFACEDESC)&ddsd,(LPDIRECTDRAWSURFACE *)&lpDDsPrimary,NULL)))	{		MessageBox(hwnd,"Could not create primary surface","Error",MB_OK);	}	flagGameInit = true;}void Game_Shutdown(){	if(lpDD4)		lpDD4->Release();	if(lpDDsPrimary)		lpDDsPrimary->Release();}void Game_Main(){	//does nothing right now}

##### Share on other sites
True Edge    122
Oh I missed another error that says Direct3D9: (ERROR) :Invalid flags

I guess i should check my flags...

##### Share on other sites
kVandaele    138
I can''t help you on the error thing (sorry, still a newbie myself) but i can clarify some things...

((b%32) + ((g%64) << 6) + ((r%32) << 11))

doesn''t work. Try using 128,0,0; and you''ll end up with 0. (128%32) << 11 = 0). Not sure about the bitwise AND and OR operators, not familiar enough with those.
This is how DirectX''s solution works (for 32bit color), maybe that helps:
D3DCOLOR_ARGB(a,r,g,b) \
((D3DCOLOR)((((a)&0xff)<<24)|(((r)&0xff)<<16)|(((g)&0xff)<<8)|((b)&0xff)))

There''s plenty of 16bit formats including D3DFMT_R5G6B5, D3DFMT_X1R5G5B5, D3DFMT_A1R5G5B5, and D3DFMT_A4R4G4B4.
Also, the reason 6 bits are used for green and only 5 for red and blue (in R5G6B5) is because studies have proven the human eyes are more sensitive to green (or so i''ve read... or heard... yea).

quasar3d    814

My Site

##### Share on other sites
dr_slash_uh    122
don''t woory bout it, I am learning something.

##### Share on other sites
kVandaele    138
actually the question "Is that true" (which is the only question unless i''m mistaken) was answered with the very first reply...

##### Share on other sites
Fruny    1658
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
— Brian W. Kernighan (C programming language co-inventor)

##### Share on other sites
Doc    586
quote:
Original post by Fruny
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.

Now why did they go and do that?

##### Share on other sites
Zahlman    1682
quote:
Original post by Fruny
Watch out, Pentium 4s don''t have barrel shifters. A shift operation that would take 1 cycle on a P3 ends up taking 4-7 cycles on a P4, cancelling any advantage shifts had over multiplies.

"Barrel" shifters?

##### Share on other sites
Skizz    794
A barrel shifter is a small digital circuit inside the CPU that can shifts its input left or right in a single cycle regardless of the distance shifted. A conventional shifter would look like:
for (i = shift_amount ; i ; --i){    value >>= 1; // or <<=, and there may be wrapping as well}

where each iteration of the loop would be one clock tick (you may get several clock ticks per CPU cycle). The main disadvantage to the barrel shifter is that it takes up much more space on the silicon die.

On the P4, the instruction pipeline is much longer so even though the traditional method is used, the extra time to compute the answer can be negated by good use of the integer pipelines (there''s two working simultaneously don''t forget). Of course, you''d need a P4 aware optimisming C/C++ compiler to get this. Thus, the space saved on the silicon die by using the traditional method would allow for other improvements to be added.

Skizz