Archived

This topic is now archived and is closed to further replies.

Timkin

fast binary-trinary string comparisons

Recommended Posts

Timkin    864
Given a binary string and a trinary string of equal length (length is arbitrary but known a priori ) I''m looking for a fast way to do string comparisons. That is, to compare bits in one string with trits in the other, according to the following truth table,
           bit
       | 0 | 1 |
       ---------
     0 | 1 | 0 |
trit 1 | 0 | 1 |
     2 | 1 | 1 |
 
and to return the number of positions matched (if all positions match, then obviously the strings match given the truth table). Note, I only need to perform binary-trinary tests, not trinary-trinary. So far, the best I have been able to come up with is iterating along the length of the string, comparing individual characters as follows: For the trinary string, I thought to use a dual binary string representation. So, for example, the trinary string of 0200112202 could be represented by the pair of strings: 0100111101 and 0000110000. Denote a character in the trinary string as ti and thus its twin binary characters are t1i and t2i. The comparison can then be tested for an arbitrary binary string - with characters bi - against both t1i and t2i and a match occurs if (bi AND t1i) OR (bi> AND t2i). My question is... can anyone think of a faster way of doing this, either using a better representation of the trinary string, the comparison operation, both, or perhaps just a really fast and funky way to implement this with bitwise operations and bit-packed integers (as opposed to arrays of values). I''d like to be able to do at least one million such string comparisons each second, but obviously I''m limited by algorithmic complexity and hardware. I''m starting with algorithmic complexity... I can worry about hardware later. (Of course, if someone can present a way of achieving more than a million such comparisons per second on current hardware, then huge qudos to you). Thanks, Timkin

Share this post


Link to post
Share on other sites
SiCrane    11839
This is probably un-intuitive, but if you encoded your trinary strings as a string of normal binary numbers, and encoded your binary strings as a string of binary numbers, you can do an bitwise xor with the trinary number and 1 minus the binary number on each pair to get the matching. example in C++ code:

int main(int, char **) {
char trinary_string[] = { 0, 1, 2, 0, 1, 2 };
char binary_string[] = { 0, 0, 0, 1, 1, 1 };
bool match_string[] = { 0, 0, 0, 0, 0, 0 };

for (int i = 0; i < sizeof(binary_string); i++) {
match_string[i] = (trinary_string[i] ^ (1 - binary_string[i]));
}

return 0;
}

You can probably bitpack this in tighter to get more comparisons per clock, but I'm not sure what your input/output interface is.

edit: a rough benchmark on my computer (Athlon 1800+) puts the number of comparisons that can be done at around 11 million elements per second. This was done with a pre-filled trinary, binary and match string buffers of a 10 million elements each, so it takes into account memory access. The actual number of comparisons will probably be bounded by the process of encoding the strings from whatever source you're retriving the data from.

[edited by - SiCrane on March 26, 2004 12:02:29 AM]

Share this post


Link to post
Share on other sites
Palidine    1315
or even faster encode the trinary as follows in binary (i.e. in an int)

trinary:
011
001
___
012

in other words it's additive with the second binary only coming in to play to make the 2.

then you can do the following to get your truth results

011 011
001 001 -> the trinary

000 111 -> the binary

take the 2 binary parts of the trinary and do a bitwise or:

011 011
| 001 001
_________
011 011

then take that and do a XOR with the binary

011 011
^ 000 111
_________
010 100

that is the OPPOSITE of your truth table. so you can either just reverse the meanings of 0 and 1 for your truth table or simply do another XOR with 0xFFFFFFFF

so in c++:


struct Trinary
{
int part1, part2;
};

Trinary tri;
int binary;

//with reversed meanings for 1 and 0 in the truth table

int truths = (tri.part1 | tri.part2) ^ binary;

//with normal meanings

int truths = ((tri.part1 | tri.part2) ^ binary) ^ 0xFFFFFFFF;


that should be plenty fast

-me

[edited by - Palidine on March 27, 2004 1:08:01 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
quote:
Original post by Palidine
take the 2 binary parts of the trinary and do a bitwise or:

011 011
| 001 001
_________
011 011

then take that and do a XOR with the binary

011 011
^ 000 111
_________
010 100



I''m not sure why you OR together the two parts of the ternary string at the start. It has no effect. Also, your XOR is wrong. You should''ve gotten 011 100. Plus a function closer Timkin''s original is XAND not XOR. Also, I''m not trying to be mean. :\

I would''ve done it by using the same two-part representation for the ternary string, but using

(t.low XAND b) OR t.high

where the ternary representation is the same as yours.

Share this post


Link to post
Share on other sites
Krumble    151
With the AP on this.


template<typename T>bool compare(T b, T t_1, T t_2) {
return ~((t_1 & b) | t_2) == 0;
}


This will allow you to use any integral type, or a homebrew bit vector if you need something longer.

Share this post


Link to post
Share on other sites
Palidine    1315
quote:
Original post by Anonymous Poster
I''m not sure why you OR together the two parts of the ternary string at the start. It has no effect. Also, your XOR is wrong. You should''ve gotten 011 100. Plus a function closer Timkin''s original is XAND not XOR. Also, I''m not trying to be mean. :\

I would''ve done it by using the same two-part representation for the ternary string, but using

(t.low XAND b) OR t.high

where the ternary representation is the same as yours.


yeah, i just had that realization now. i was running it through my head and it woke me out of bed. alas. i usually make small mistakes like that. no offence taken, and good implementation. pissed that you beat me to it.

-me

Share this post


Link to post
Share on other sites
Timkin    864
Thanks for the replies...

I''m going to have to sit down and go through them properly to make sure I have my head around them (I''ve only had two minutes to read the thread quickly while looking after a crying baby)!

If I understand the AP''s and Krumble''s replies, you''re advocating using a dual binary representation of the trinary string, where the trinary is the sum of the two binaries. Correct?

I''ll code up the different responses either later tonight (when my daughter finally sleeps) or tomorrow night. I''ll certainly report back the results. If SiCrane''s results are indicative of the performance that can be expected, then fantastic! Admittedly, I''ll be doing these comparisons on strings with lengths of the order of 102, so I expect some slow down from the string length.

Thanks again...

Timkin

Share this post


Link to post
Share on other sites
Timkin    864
okay, just to report back on my results, in case anyone was interested...

I implemented my bitstring (bitstr) as a 32bit unsigned long integer and the trit strings as dual 32bit unsigned long integers. The first of the two strings (tritstr.low) encoded the zeros and ones (with random zero or one at the position of the '2' characters). The second of the two strings (tritstr.high) encoded the position of the 2's in the trinary string. The comparison function was

result = ~(bitstr ^ tritstr.low) | tritstr.high

On an Athlon 2400+ (running at 2GHz), I completed 1 billion such comparisons in around 2.1 seconds, suggesting that I could perform around 500 million such comparisons per second on 32 bit integers. That's WAY more than I had ever hoped... so I'm very happy (and there's no need for me to profile my code in any more detail, since I've achieved about 2 orders of magnitude better performance than I had desired)!

Thanks for the help folks,

Timkin

[edited by - Timkin on March 29, 2004 10:43:47 AM]

Share this post


Link to post
Share on other sites