Efficient way to find same pixel?

Started by
25 comments, last by Melekor 14 years, 1 month ago
Just take Brother Bob's solution, but use Radix Sort (or some other non-comparison-based method) instead of std::sort. That's O(n) complexity.
Advertisement
map(String key, Image i): for each pixel p in i:   EmitIntermediate(p, "1"); reduce(String key, Iterator values):int result = 0;for each color in values:  result += ParseInt(color);Emit(AsString(result)); 

Supports petabyte-sized inputs.
Quote:Original post by Melekor
Just take Brother Bob's solution, but use Radix Sort (or some other non-comparison-based method) instead of std::sort. That's O(n) complexity.


Radix sort is slower than std::sort (a heavily optimized quicksort).

See http://twistaz.net/sorts/
Quote:Original post by pbryant
Quote:Original post by Melekor
Just take Brother Bob's solution, but use Radix Sort (or some other non-comparison-based method) instead of std::sort. That's O(n) complexity.


Radix sort is slower than std::sort (a heavily optimized quicksort).

See http://twistaz.net/sorts/


This is untrue. For random 32-bit integers, the only reason radix would be slower is with a poor implementation. But don't take my word for it. I have some optimized radix code lying around, let me just whip up a small benchmark to prove my point.
OK, here are the results:

Quote:Total time elapsed to sort 64 MB of random 32-bit integers 5 times:

std::sort: 3.71s
RadixSort: 1.50s

Conclusion: RadixSort is the winner (2.48x as fast as std::sort)


And the source code:

#include <cstdlib>#include <cstring>#include <cstdio>#include <algorithm>typedef unsigned int uint;typedef unsigned int uint32;typedef const char *cpointer;#include <windows.h>// radix sort code based on// http://www.stereopsis.com/radix.html#define _0(x)	(x & 0x7FF)#define _1(x)	(x >> 11 & 0x7FF)#define _2(x)	(x >> 22)static void RadixSort11(uint32 *data, uint32 *sorted, uint32 count){	unsigned int i;	// 3 histograms on the stack:	const uint32 kHist = 2048;	uint32 b0[kHist * 3];	uint32 *b1 = b0 + kHist;	uint32 *b2 = b1 + kHist;	memset(b0, 0, sizeof(b0));	// 1.  parallel histogramming pass	//	for (i = 0; i < count; i++)	{		uint32 x = data;		b0[_0(x)] ++;		b1[_1(x)] ++;		b2[_2(x)] ++;	}	// 2.  Sum the histograms -- each histogram entry records the number of values preceding itself.	{		uint32 sum0 = 0, sum1 = 0, sum2 = 0;		uint32 tsum;		for (i = 0; i < kHist; i++)		{			tsum = b0 + sum0;			b0 = sum0 - 1;			sum0 = tsum;			tsum = b1 + sum1;			b1 = sum1 - 1;			sum1 = tsum;			tsum = b2 + sum2;			b2 = sum2 - 1;			sum2 = tsum;		}	}	// byte 0: read/write histogram, write out flipped	for (i = 0; i < count; i++)	{		uint32 x = data;		uint32 pos = _0(x);		sorted[++b0[pos]] = x;	}	// byte 1: read/write histogram, copy	//   sorted -> data	for (i = 0; i < count; i++)	{		uint32 x = sorted;		uint32 pos = _1(x);		data[++b1[pos]] = x;	}	// byte 2: read/write histogram, copy	//   data -> sorted	for (i = 0; i < count; i++)	{		uint32 x = data;		uint32 pos = _2(x);		sorted[++b2[pos]] = x;	}}void CheckSorted(const char *method, uint32 *data, uint32 count){	for(uint i = 1; i < count; i++)	{		if(data[i-1] > data)		{			printf("%s is incorrect!", method);			exit(0);			return;		}	}}struct Timer{	LARGE_INTEGER t0;	double elapsed;	Timer() : elapsed(0) {}	void start()	{		QueryPerformanceCounter(&t0);	}	void stop()	{		LARGE_INTEGER freq, t1;		QueryPerformanceCounter(&t1);		QueryPerformanceFrequency(&freq);		elapsed += double(t1.QuadPart - t0.QuadPart) / freq.QuadPart;	}};// Simple test of radixint main(int argc, char* argv[]){	// Make sure background processes don't mess with our test results.	SetPriorityClass(GetCurrentProcess(), ABOVE_NORMAL_PRIORITY_CLASS);	uint32 i;	const uint32 count = 1<<24; // 64 megabytes worth of 32-bit integers.	const uint32 trials = 5;	uint32 *data = new uint32[count];	uint32 *a = new uint32[count];	uint32 *b = new uint32[count];	enum	{		STDSORT,		RADIXSORT	};	Timer timers[2];	for (i = 0; i < trials; i++)	{		// generate a random array		for (i = 0; i < count; i++)		{			data = rand() ^ (rand()<<15) ^ (rand()<<30);		}		// test both methods on the array		for(uint test = 0; test < 2; ++test)		{			memcpy(a, data, count*sizeof(uint32));			if(test == STDSORT)			{				timers[STDSORT].start();				std::sort(a, a+count);				timers[STDSORT].stop();				CheckSorted("std::sort", a, count);			}			else			{				timers[RADIXSORT].start();				RadixSort11(a, b, count);				timers[RADIXSORT].stop();				CheckSorted("RadixSort", b, count);			}		}	}	printf(		"Total time elapsed to sort %d MB of random 32-bit integers %d times:\n\n"		"std::sort:\t%.2lfs\n"		"RadixSort:\t%.2lfs\n\n",		count*sizeof(uint32)/(1024*1024), trials, timers[STDSORT].elapsed, timers[RADIXSORT].elapsed);	if(timers[STDSORT].elapsed < timers[RADIXSORT].elapsed)	{		printf("Conclusion: std::sort is the winner (%.2lfx as fast as RadixSort)\n",			timers[RADIXSORT].elapsed/timers[STDSORT].elapsed);	}	else if(timers[STDSORT].elapsed > timers[RADIXSORT].elapsed)	{		printf("Conclusion: RadixSort is the winner (%.2lfx as fast as std::sort)\n",			timers[STDSORT].elapsed/timers[RADIXSORT].elapsed);	}	else	{		printf("It's a tie! That's quite a coincidence.\n");	}	return 0;}
Quote:Original post by Melekor
Quote:Original post by pbryant
Radix sort is slower than std::sort (a heavily optimized quicksort).
This is untrue. For random 32-bit integers, the only reason radix would be slower is with a poor implementation.
The typical std::sort implementation is quicksort [with an average case of O(N log N)], and fallbacks to heap sort for bad cases, yielding a worst case of O(N log N) as well.

By contrast, radix sort is always O(k N), where k is the number of bits in the key.

Therefore Radix sort performs better than std::sort then when log(N) is greater than k. For 8-bit RGBA pixel data, k=32, so we need log(N) > 32. For a given base, b, than means we need N > b^32, and for pretty much any value of b, that is a darn big number.

Correct me if I am reasoning incorrectly, but that makes std::sort faster for most any reasonable image sizes.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Original post by swiftcoder
The typical std::sort implementation is quicksort [with an average case of O(N log N)], and fallbacks to heap sort for bad cases, yielding a worst case of O(N log N) as well.

By contrast, radix sort is always O(k N), where k is the number of bits in the key.

Therefore Radix sort performs better than std::sort then when log(N) is greater than k. For 8-bit RGBA pixel data, k=32, so we need log(N) > 32. For a given base, b, than means we need N > b^32, and for pretty much any value of b, that is a darn big number.

Correct me if I am reasoning incorrectly, but that makes std::sort faster for most any reasonable image sizes.


Well, strictly speaking radix sort performs better when c * log(N) > k, for some unknown constant c. But let's ignore that for now.

The most important thing is that k in O(k N) isn't the number of bits, it's the number of digits. We can split the keys into any number of digits we think would be most advantageous. For example with 32 bit integers, we can write them in base 2048 (11 bits for each digit) so that k = 3. (This is what is implemented in the above code.)

Now radix sort is starting to look much better isn't it?

This topic is closed to new replies.

Advertisement