Jump to content
  • Advertisement
Sign in to follow this  
Anfaenger

A quick way to determine if 8 small integers are all equal to each other

This topic is 632 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Given eight integers, what is the fastest way to check if they are all the same or not?

Maybe, some clever bit manipulation?

(This is used in a voxel engine - integers are material IDs: if a cell has different material IDs at corners, then it must intersect a surface.)

 

I've implemented this simple solution, but it only works with 8-bit values, and 255 materials (0 == empty air) won't be enough:

/// returns true if all eight bytes are equal to each other
template< typename T >
bool AreAllTheSame8( const T (&_array)[8] )
{
    const T v0 = _array[ 0 ];
    const T v1 = _array[ 1 ];
    const T v2 = _array[ 2 ];
    const T v3 = _array[ 3 ];
    const T v4 = _array[ 4 ];
    const T v5 = _array[ 5 ];
    const T v6 = _array[ 6 ];
    const T v7 = _array[ 7 ];

    static_assert(sizeof(T) == sizeof(uint8_t));
    uint8_t    counts[256];
    mxZERO_OUT(counts);

    counts[ v0 ]++;
    counts[ v1 ]++;
    counts[ v2 ]++;
    counts[ v3 ]++;
    counts[ v4 ]++;
    counts[ v5 ]++;
    counts[ v6 ]++;
    counts[ v7 ]++;
    return counts[ v0 ] == 8;
}

In reality, the zero materials are not 'counted' (i can explain why if you're curious), and the above solution results in this ugly and inefficient code:

/// returns true if all eight materials are equal to each other
bool IsCellHomogeneous( const MaterialID (&_voxels)[8] )
{
    const MaterialID v0 = _voxels[ 0 ];
    const MaterialID v1 = _voxels[ 1 ];
    const MaterialID v2 = _voxels[ 2 ];
    const MaterialID v3 = _voxels[ 3 ];
    const MaterialID v4 = _voxels[ 4 ];
    const MaterialID v5 = _voxels[ 5 ];
    const MaterialID v6 = _voxels[ 6 ];
    const MaterialID v7 = _voxels[ 7 ];

    U8    counts[256];
    mxZERO_OUT(counts);

    UINT numSolidMaterials = 0;
    UINT lastSolidMaterial = 0;

    // zero material IDs correspond to empty space/air
    if( v0 ) {
        counts[ v0 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v0;
    }
    if( v1 ) {
        counts[ v1 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v1;
    }
    if( v2 ) {
        counts[ v2 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v2;
    }
    if( v3 ) {
        counts[ v3 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v3;
    }
    if( v4 ) {
        counts[ v4 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v4;
    }
    if( v5 ) {
        counts[ v5 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v5;
    }
    if( v6 ) {
        counts[ v6 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v6;
    }
    if( v7 ) {
        counts[ v7 ]++;
        numSolidMaterials++;
        lastSolidMaterial = v7;
    }
    return counts[ lastSolidMaterial ] == numSolidMaterials;
}

Share this post


Link to post
Share on other sites
Advertisement

Wrote in N++ without testing so I'm not sure if it works correctly.

bool IsCellHomogeneous( const MaterialID (&_voxels)[8] ) {
	size_t idx = 0;
	
	// skip all 0
	for(; idx < 8 && _voxels[idx] == 0; ++idx); // intentionally empty
	
	if(idx == 8)
		return true; // everything is air
	
	const MaterialID expectedVal = _voxels[idx];
	
	for(; idx < 8 && (_voxels[idx] == 0 || _voxels[idx] == expectedVal); ++idx); // intentionally empty
	
	if(idx == 8)
		return true; // everything is same or 0
	
	return false;
}
Edited by Zaoshi Kaba

Share this post


Link to post
Share on other sites
That's a special case if you ignore the templated type of the function. For 8-bit values and even 16-bit values there are some bitwise hacks, but they don't really buy anything on a modern processor, and can even cause pipeline stalls if you aren't careful. Introducing data dependencies slows things down a tiny bit.

Again, this is something so small it isn't a performance concern. Given the way current processors work it is quite literally two cycles through the decoder -- each CPU can decode up to 4 instructions per cycle -- and whatever cost to get the data from cache.

Share this post


Link to post
Share on other sites

So, I take it you want to avoid branch mispredictions messing up things. As long as you only do an assignment in the condition block, the compiler (assuming a release build) will be able to do it without branches.

I just tried the following with GCC (you would remove the noinline in production code obviously)
 

template< typename T >
bool __attribute__ ((noinline)) AreAllTheSame8( const T (&array)[8] ) {
    bool same = true;
    if (array[0]!=array[1]) same = false;
    if (array[0]!=array[2]) same = false;
    if (array[0]!=array[3]) same = false;
    if (array[0]!=array[4]) same = false;
    if (array[0]!=array[5]) same = false;
    if (array[0]!=array[6]) same = false;
    if (array[0]!=array[7]) same = false;
    return same;
}

And it produced the following function with gcc -O3 -S for an int array
 

__Z14AreAllTheSame8IjEbRA8_KT_:         ## @_Z14AreAllTheSame8IjEbRA8_KT_
	.cfi_startproc
## BB#0:
	pushq	%rbp
Ltmp4:
	.cfi_def_cfa_offset 16
Ltmp5:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Ltmp6:
	.cfi_def_cfa_register %rbp
	movl	(%rdi), %eax
	cmpl	4(%rdi), %eax
	sete	%sil
	cmpl	8(%rdi), %eax
	sete	%dl
	cmpl	12(%rdi), %eax
	sete	%cl
	andb	%dl, %cl
	cmpl	16(%rdi), %eax
	sete	%dl
	andb	%cl, %dl
	cmpl	20(%rdi), %eax
	sete	%cl
	andb	%dl, %cl
	cmpl	24(%rdi), %eax
	sete	%dl
	andb	%cl, %dl
	cmpl	28(%rdi), %eax
	sete	%al
	andb	%dl, %al
	andb	%sil, %al
	popq	%rbp
	retq
	.cfi_endproc

For 8bit and 16bit integers it generated an additional movzwl or movzbl per array entry respectively, which you could manually try to optimize by building a wider comparison value (well, why not for 32bit values too if compiling for 64bit platform). Using the idea from Álvaro's post we get this which reduces the function down to only a handful of instructions:
 

template<>
bool __attribute__ ((noinline)) AreAllTheSame8( const uint8_t (&array)[8] ) {
    uint64_t v = *reinterpret_cast<const uint64_t*>(&array);
    return v == ((v & 0xffull) * 0x0101010101010101ull);
}
__Z14AreAllTheSame8IhEbRA8_KT_:         ## @_Z14AreAllTheSame8IhEbRA8_KT_
	.cfi_startproc
## BB#0:
	pushq	%rbp
Ltmp0:
	.cfi_def_cfa_offset 16
Ltmp1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Ltmp2:
	.cfi_def_cfa_register %rbp
	movq	(%rdi), %rax
	movzbl	%al, %ecx
	movabsq	$72340172838076673, %rdx ## imm = 0x101010101010101
	imulq	%rcx, %rdx
	cmpq	%rdx, %rax
	sete	%al
	popq	%rbp
	retq
	.cfi_endproc

As you can see, no branches, no need for silly trickery. Just avoid the short-circuiting conditional && and || and the compiler will probably do fine.

 

EDIT: For the 64bit trick, the array better be alignas(uint64_t), and I'm not 100% sure if that reinterpret_cast is actually legit in C++ (it would be undefined behavior in C at least due to strict aliasing requirements). It might be safer to work around it using unions, but the generated code seems fine.

Edited by mzbear

Share this post


Link to post
Share on other sites
As you can see, no branches, no need for silly trickery. Just avoid the short-circuiting conditional && and || and the compiler will probably do fine.

Does the shortcircuiting matter here actually? I imagine the compiler should be able to optimize out cases where it notices that operations can be done in either order and it not affect the outcome.

 

EDIT: did some quick tests myself and... well, GCC does indeed fail to optimize that out. Whoops. Also pretty disappointed overall this time, GCC really didn't try to do anything clever regardless of what I threw at it. I guess there isn't much else to be done about it (assuming the code is already cache coherent). Then again I recall reading that CPUs now try to keep track of where recent branches tend to go? On code that's run on a tight loop that should already help reduce branch mispredictions after a few iterations.

Edited by Sik_the_hedgehog

Share this post


Link to post
Share on other sites

EDIT: For the 64bit trick, the array better be alignas(uint64_t), and I'm not 100% sure if that reinterpret_cast is actually legit in C++ (it would be undefined behavior in C at least due to strict aliasing requirements). It might be safer to work around it using unions, but the generated code seems fine.


Yes, memory alignment is an issue, and getting reinterpret_cast or unions in the mix is almost certainly going to invoke undefined behavior. But if you often need to read 8 8-bit integers in your voxel engine, perhaps you should consider packing the 8 values in a 64-bit integer anyway, so you can best use your hardware.

 

That's a special case if you ignore the templated type of the function. For 8-bit values and even 16-bit values there are some bitwise hacks, but they don't really buy anything on a modern processor, and can even cause pipeline stalls if you aren't careful. Introducing data dependencies slows things down a tiny bit. Again, this is something so small it isn't a performance concern. Given the way current processors work it is quite literally two cycles through the decoder -- each CPU can decode up to 4 instructions per cycle -- and whatever cost to get the data from cache.


I happen to have quite a bit of experience in chess programming, and I can assure you that 64-bit trickery is very fast. Chess is a perfect example of a situation where using 64-bit integers to make good use of your processor is a good idea (see bitboards). This is also the case for checkers, Reversi and Four in a Row. If you are writing an engine for a board game, you can expect this is what your processor is going to be doing all day long, so performance does matter, and being able to use parallelization at the level of the ALU is huge.

The OP explicitly wanted to know if there is "some clever bit manipulation". So yes, there is. Whether it's worth doing in his program or not is impossible for me to determine.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!