# [Haskell, SOLVED] Why are these sorting criteria buggy?

This topic is 3626 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I'm trying to implement Bencoding in Haskell, but I've run into a problem. For some background, Bencoding is the data format of the BitTorrent protocol, and can contain ints, byte strings, dictionaries and lists. The specs state that not only dictionaries, but even lists must be sorted in alphabetical order. Since a list can contain any data type (strings, ints, lists and dictionaries,) the ordering among different types is somewhat non-obvious. I understood it to mean, that for different types, the type's prefix is what we sort by. For example, strings, which have no prefix except their length, go first, the dictionaries (prefix 'd',) then ints (prefix 'i',) and finally lists (prefix 'l'.) Then, we sort the lists and dictionaries based on length (a list or dictionary with three items comes before one with five.) If I've misunderstood this, I'd be happy if someone could point it out, because the specs are really unclear about this. However, the big problem is currently another one. To sort the lists, I use the following sorting function, along with the built-in sortBy:
compBValue :: BType -> BType -> Ordering
compBValue a b                       | a == b              = EQ
compBValue (BString a) (BString b)   | a < b               = LT
| otherwise           = GT
compBValue (BString a) _                                   = LT
compBValue _ (BString a)                                   = GT
compBValue (BDict a) (BDict b)       | size a < size b     = LT
| otherwise           = GT
compBValue (BDict a) _                                     = LT
compBValue _ (BDict a)                                     = GT
compBValue (BInteger a) (BInteger b) | a < b               = LT
| otherwise           = GT
compBValue (BInteger a) _                                  = LT
compBValue _ (BInteger a)                                  = GT
compBValue (BList a) (BList b)       | length a < length b = LT
compBValue (BList a) _                                     = GT


I've tested this code over and over with the QuickCheck unit testing library, and it seems to work fine, except in one strange corner case: about once in 2000 tests, the sorting function mixes up the order of some elements, mostly (only?) zero-length strings and empty lists when the internal representation of a Bencoded data structure is printed, parsed, printed and then parsed again. The output reveals that the individual data elements are indeed parsed correctly, but for some reason placed in the wrong order. If the resulting output is then parsed and printed again, the order is restored. I have no idea why this happens, and even though the bug doesn't seem to affect anything that's likely to actually ever happen, it still bothers me. Can somebody see any problem with those criteria? EDIT: nevermind the code, now I know the problem. Whenever there was two dictionaries or lists of equal length, their sorting order would become irrelevant, as any order between them is fine. Since there can be only one Bencoding for any one chunk of data, my assumptions about ordering were wrong. It seems that lists and dictionaries should be differentiated during sorting not by size, but strictly alphabetically. [Edited by - Valderman on January 11, 2009 2:38:42 PM]

1. 1
2. 2
Rutin
19
3. 3
4. 4
5. 5

• 13
• 26
• 10
• 11
• 9
• ### Forum Statistics

• Total Topics
633736
• Total Posts
3013598
×