Jump to content

  • Log In with Google      Sign In   
  • Create Account

Jan Wassenberg

Member Since 16 Sep 2002
Offline Last Active Aug 01 2014 06:38 AM

Topics I've Started

Wonderfully simple OpenGL UI

17 May 2014 - 01:57 AM

Here's a brief writeup of a simple (9 KLOC, 340 KB binary) but capable and portable OpenGL GUI:



Obligatory screenshot: Attached File  ui.png   42.07KB   3 downloads


Header files are attached: Attached File  ui_headers.zip   5.62KB   47 downloads


I'd love to get a discussion going on any perceived shortcomings of this approach, missing features,

alternative libraries that didn't show up in my search, and your thoughts in general.

Faster Sorting - at 88% of memory bandwidth

21 October 2010 - 08:31 AM

I figure GameDev often calls for highly efficient algorithms, and am happy to to present Faster Radix Sort via Virtual Memory and Write-Combining (6 page PDF). It works for 8..64 bit keys and also supports associated values. The kicker is that the throughput of each pass (which sorts 8 bits) is at least 88% of the system's theoretical peak memory bandwidth!
This means 8-bit keys can be copied into sorted order much faster than some memcpy() implementations.

Since healthy skepticism is warranted for "new sorting algorithms", I will note that this is a variant of counting sort with two improvements. The first avoids the initial counting phase; each value to be sorted is read and written exactly once, which is made possible by aggressive use of huge amounts of virtual address space. The second and more important improvement concerns the radix sort that allows sorting more than 8-bit keys. The old `reverse sorting' trick from distributed-memory algorithms first applies an MSD pass followed by several LSD passes, which reduces communication and NUMA overhead. Indeed, the algorithm scales perfectly on a dual-socket i7 system.

This makes for a 1.5-fold speedup over Intel's recently published radix sort. The measured throughput also exceeds results reported for Knight's Ferry (32-core) and Fermi GPUs (when sorting in main memory).
Unfortunately, the code is not publicly available, but reimplementation is an option, and my employer (Fraunhofer applied research labs) would be happy to discuss licensing.

Comments and discussion are welcome :)

RFC: lossless RGB image compression for speeding up file loads

07 June 2008 - 02:12 AM

I am thinking about image compression for the purpose of speeding up file loads. Presuppose the following: 1) large (100 MB) RGB images must be loaded for certain processing tasks; 2) we have a means of decompressing chunks of the image in parallel with other IOs; 3) the processing task may take more or less time than IO and potentially needs random image access; 4) compression must be entirely lossless to avoid changing the statistical properties of the image (important). Therefore, compression would not cost any CPU time (since decompression is hidden behind IOs) but reduces the amount of data to be loaded. Now the question is: what kinds of compression can we afford while still matching the 100 MB/s throughput of modern drives? LZ77 is quite suitable; ZLib sustains such a processing rate and even faster variants exist. However, it only gives us a compression ratio of about 1.4x for natural images (think satellite scans). What other image compression schemes exist? A few aren't completely slow (LOCO-I, JPEG-LS, PNG) but most appear to be heavily biased towards high compression (lossless JPEG 2000, FELICS, CALIC, SPIHT and particularly GLICBAWLS). Is there really no middle ground for real-time lossless decompression of RGB that outperforms simple dictionary compression while maintaining throughput in the tens of MB per second? Since I can't really justify implementing or even using custom entropy coders, the intent is to try a simple reversible color transform followed by general purpose compression. In a quick test, I've applied JPEG2000's RCT and seen BWT compression of a 12.1 MB file go from 6.8 MB to 5.1 MB. Seems promising. Has anyone tried this before, or have any experience with other reversible color space transforms?

Timing Pitfalls and Solutions: new developments

10 June 2007 - 02:23 AM

I've previously written an article about timing, but unfortunately the situation has gotten even worse since then. RDTSC is more broken than before and QPC now uses it instead of only marginally broken alternatives, so that leaves WinXP users on dual-core systems with a total lack of a decent timer. The newly updated article (PDF) again describes the timing hardware, APIs, their problems, and workarounds. An important new development is a driver for the HPET timer that is a fairly clean solution to this entire mess. The modalities of publication/distribution are not yet clear, but I'd love to hear your comments on the draft. Hopefully this text will help people avoid all-too-frequent timing glitches :)

Publication: Optimizing File Access -- Achieved speedup: 1000%

11 April 2006 - 12:09 PM

Hey folks. My study thesis [PDF; 25 pages] is complete and can be shown here. It covers file I/O, specifically for games, and presents an introduction to the problem, explanation of my approach and resulting performance analysis. To all who resent loading screens and wonder if it can't be done faster: give it a look! :) Questions and comments welcome. [Edited by - Jan Wassenberg on April 11, 2006 6:41:08 PM]