• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

113 Neutral

About bitshifter

  • Rank
  1. Fast double word toupper

    Maybe just using brute force and unrolling the loop is fast enough? static char gch[4] = {'a','A','z','Z'}; void StepSimulation() { __asm cmp byte ptr[gch+0],'a' __asm jl L1 __asm cmp byte ptr[gch+0],'z' __asm jg L1 __asm sub byte ptr[gch+0],32 L1: __asm cmp byte ptr[gch+1],'a' __asm jl L2 __asm cmp byte ptr[gch+1],'z' __asm jg L2 __asm sub byte ptr[gch+1],32 L2: __asm cmp byte ptr[gch+2],'a' __asm jl L3 __asm cmp byte ptr[gch+2],'z' __asm jg L3 __asm sub byte ptr[gch+2],32 L3: __asm cmp byte ptr[gch+3],'a' __asm jl L4 __asm cmp byte ptr[gch+3],'z' __asm jg L4 __asm sub byte ptr[gch+3],32 L4: ; // I hate this stupid crap! } On my Pentium-4, best case is 28 clocks, while worst case is 44 clocks. This includes the procedure overhead also... But there are many tricks still... Here are a few i have derived in the past... is_upper: sub eax,'A' sub eax,26 sbb eax,eax is_lower: sub eax,'a' sub eax,26 sbb eax,eax When TRUE: EAX = -1 When FALSE: EAX = 0 By using the flags you could do a jcc or cmov. Just a little food for your imagination...
  2. Win32 Hello World App problems...

    If you want to use multi-byte chars then include <TCHAR.H> Windows headers use macro's to use diff versions of some routines... if defined unicode define MessageBox MessageBoxW else define MessageBox MessageBoxA endif Then you use _T or _TEXT macro for strings... MessageBox(0,_T("RegisterClass - Failed"),0,0);
  3. OpenGL Weird Bahavior Scaling

    I share some code :) void PerspectiveMatrix( float matrix[16], float fovy, int viewportWidth, int viewportHeight, float znear, float zfar) { float aspect = float(viewportWidth) / float(viewportHeight); float fovy2r = fovy * MATH_PI_360; float cotan = cosf(fovy2r) / sinf(fovy2r); matrix[0] = cotan / aspect; matrix[1] = 0.0F; matrix[2] = 0.0F; matrix[3] = 0.0F; matrix[4] = 0.0F; matrix[5] = cotan; matrix[6] = 0.0F; matrix[7] = 0.0F; matrix[8] = 0.0F; matrix[9] = 0.0F; matrix[10] = (zfar + znear) / (znear - zfar); matrix[11] = -1.0F; matrix[12] = 0.0F; matrix[13] = 0.0F; matrix[14] = (2.0F * znear * zfar) / (znear - zfar); matrix[15] = 0.0F; } void ScaleMatrix(float m[16], const float v[3]) { m[0] *= v[0]; m[1] *= v[0]; m[2] *= v[0]; m[3] *= v[0]; m[4] *= v[1]; m[5] *= v[1]; m[6] *= v[1]; m[7] *= v[1]; m[8] *= v[2]; m[9] *= v[2]; m[10] *= v[2]; m[11] *= v[2]; }
  4. Calling SetCursorPos within WM_MOUSEMOVE is causing problem. Instead, use OnMouseMove to acquire position. void OnMouseMove(int x, int y) { _mouseX = x; _mouseY = y; } Then SetCursorPos during your update proc. void UpdateScene() { int diffX = _mouseX - _centerX; int diffY = _mouseY - _centerY; SetCursorPos(_centerX,_centerY); ... }
  5. WM_DESTROY is sent by DestroyWindow before it returns. You can just provide an empty handler that returns zero. And to PostQuitMessage(0) within your WM_CLOSE instead. By default the DefWindowProc calls DestroyWindow when it receives a WM_CLOSE message and you want different:) I learned a good deal of how win messages work by first creating a file in which every time a message is found by PeekMessage/GetMessage it is written to the file. Then (a short time) later you can see what happened. To recap: Use the message pump provided by Neilo in prev post... Then... case WM_CLOSE: PostQuitMessage(0); return 0; case WM_DESTROY: return 0; Handling the WM_DESTROY may seem redundant since the pump is not running but remember, internally that SendMessage is used to access the callback...
  6. It is not late to start programming?

    Spend the next two years learning assembly language. By the time youre 18 years old you will know what/why/how any other language works under the hood. Or spend two years learning HLL and never know whats under the hood. Im not saying ASM should be used for coding anything/everything but it gives you the most solid foundation you could ever get.
  7. OpenGL openGL FPS Demos?

    While learning to warm up to the FPU i wrote this one... (You must register/login to download the source files) http://board.flatassembler.net/topic.php?t=9262 I have also made SSE2 hand-rolled matrix version.
  8. Just a quick observation.. glListBase( id-32 ); glCallLists( strlen( text ), GL_UNSIGNED_BYTE, text ); You can sub 32 from id after creation and add 32 before deletion. That way to save a couple of clocks, but the real killer is strlen. (Besides all the hidden code of course) I wrote a couple of font rendering method tests in assembler... (You need to register/logon to download source files) http://board.flatassembler.net/topic.php?t=9885
  9. Outgrowing rand()

    Some good reference... http://www.agner.org/random/
  10. SSE Performance

    Maybe its worth trying a bit of inline assembly? Im away from my dev PC, but here a few snippets 4 now... http://board.flatassembler.net/topic.php?t=10928
  11. SSE Performance

    Hello I have written a matrix library in SSE2 with assembler. Maybe if you post some samples i can help improve them. Also, optimization for each processor type is quite different. What chipset are you targeting?
  12. OpenGL help with glMatrixMode

    Lets assume nothing, and start from there... glMatrixMode(GL_PROJECTION); glLoadIdentity(); gluPerspective(...); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); // TODO: Render 3D stuff here... glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrtho(...); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); // TODO: Render 2D stuff here...
  13. Depth Sorting Algorithms

    Seems like i will have to implement a few ideas and benchmark them... I work with polygons (not always triangles) so splitting tri's wastes time. Basically i am creating a span array from n-sided convex polyhedra. I am reading Abrash's article on span sorting at the moment. Seems like they abandoned the idea for Quake and stuck with BSP sorting. But my engine is for mostly outdoor so i choose AACube-tree design. Maybe to have the best of both worlds my engine needs both methods. To use BSP/portal when indoor and AACube-tree when outdoor. So many choices to play with in such a short lifetime... I take all the advice i can get :) An old wiseman once told me: Experience is cheap when it is second hand. Now i see what he means :P
  14. Hello I am writing 3D software rendering engine for my hobby OS. So far i have world cut and sorted within AACube tree. And i use viewing frustum to determine visible parts. Visible edges are clipped to frustum in 3D space for speed. My polygon filler builds and renders list of scan lines. Now i need to depth sort for proper rendering order. I use high rez (1024x768x32bpp) so z-buffer is no option. First i thought of span/edge sorting but it seem very slow. If you have any ideas or refs i would be grateful to know them. Thanks for your time.
  15. Note: There is a difference between (transform/scale) and (scale/transform) I have seen some funny stuff before and swapping the order fixed it right.
  • Advertisement