Sign in to follow this  

Simple N00b C# Question

This topic is 3729 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm in need of a data structure that stores a set of objects of a given type without allowing duplicates, and supports fast insert, delete, and test for a specific item in the collection. In the general case I can't rely on things like the values of each object being unique (e.g. for a GetHashCode implementation or assigning a unique key to each object). Is there some stock way to do this? If not, I could make one easy enough if there was a way to either order raw references (to make a binary search tree; Object.ReferenceEquals is similar, but only tests for equality) or get a hash of a reference (to make a hash table). Do either of those things exist?

Share this post


Link to post
Share on other sites
Just inherit for List<T>, and use it as a base for what you are doing. Add the checking for duplicates on add functions. Should be pretty easy.

You will need to create a compare method to check to see if they are the same, but that is pretty easy.

theTroll

Share this post


Link to post
Share on other sites
Uhh... List? That's an array. In other words O(N) insert, delete, and search (without a way to compare elements the list couldn't be sorted; and if we could compare elements there are faster data structures). Not exactly fast.

Share this post


Link to post
Share on other sites
Last I checked the .NET framework doesn't have any implemented trees save for the one associated with a TreeView. Even then I don't think that you can get at that outside of a TreeView control.

You'll have to implement your own tree data structure.

Share this post


Link to post
Share on other sites
Sorted dictionary is a binary search tree. Though even if it wasn't, as I mentioned in the opening post, such a tree would be trivial to implement if it were possible to compare references for greater/less than (which I'm hoping one of you will know how to do). There's also Dictionary (hash table), but that requires the ability to get reliable hashes for an object based just on its reference (also mentioned, and also not something I know how to do).

Share this post


Link to post
Share on other sites
Quote:
Original post by Catafriggm
Uhh... List? That's an array. In other words O(N) insert, delete, and search (without a way to compare elements the list couldn't be sorted; and if we could compare elements there are faster data structures). Not exactly fast.


Well if you know the answer then why are you asking us mere mortals?

theTroll

Share this post


Link to post
Share on other sites
If you were using C++, I'd have recommended std::set<>. Most implementations of it use a Red-Black tree and it doesn't allow duplicates.

In .NET you could misuse one of the provided associative collections (Hashtable or Dictionary<>) with a dummy as the value part. It's not optimal, but I'd expect it to be reasonably fast.

Another option would be to use one of the classes from the PowerCollections project. The library is free and fills many of the gaps left in the collection classes and other data structures from the .NET framework. I think there's a hash table based 'Set' class and a 'RedBlack' class implementing a Red-Black tree.

-Markus-

Share this post


Link to post
Share on other sites
Quote:
Original post by TheTroll
Well if you know the answer then why are you asking us mere mortals?


What I don't know:
- Is there a way to do <, > on references (rather than the contents of the objects themselves)? The <, > don't need to have any particular meaning other than that if A < B at one point, A < B at some later point, after various members of both have changed.
- Does GetHashCode (or some other hash provider) return something that can reliably be used in a hash table, and is based on the reference itself, rather than the contents of the item? E.g. garbage collection will move things around; will the hash code change on GC? Is the hash code more or less evenly distributed among values for raw references? Will the hash code change if the members in the object referenced change?

What I do know:
- Dictionary is fast (O(1)). But I would need the ability to generate a hash code for a raw reference.
- SortedDictionary is also fast (O(log N)). But I would need a way to do <, > for a raw reference.

Cygon: Ehhh. I suppose if I can't find anything better I could assign each entry in the list a random number to use as the hash key. It would be ugly, but I suppose it would work. Thx for the info.

Share this post


Link to post
Share on other sites
ICompare if what you might be looking for. It give you the ability to make one object equal to, less then or greater then another object. It will give you the ability to sort the objects also.

Now might the comparisons be a bit arbitrary? Yes, but it goes give you are basis to understand where things should be.

Now what I don't understand is why you can't come up with a unique hash. In your first post you said that the object you are storing needs to make sure it is not a duplicate, so if it is not the same then you should be able to make unique hashes.

theTroll

Share this post


Link to post
Share on other sites
Quote:
Original post by TheTroll
Now what I don't understand is why you can't come up with a unique hash. In your first post you said that the object you are storing needs to make sure it is not a duplicate, so if it is not the same then you should be able to make unique hashes.


The values of the object will be changing. A hash code of an object suddenly changing while it's in the table would break any hash table implementation. Besides that, "duplicate", in this case, is referring to multiple references to the same instance of a class. There is no data in the class itself that is unique to one instance compared to any other. These are the same reasons why I can't simply compare them member-wise.

Share this post


Link to post
Share on other sites
Then I think you are going to have to go with a unique identifier method. So impliment an "auto-index" like a database would use. Just create a static member set at zero to start. Have a static method give you an int that is the current value of the member and then increment the value one. This will make sure you will alway have a unique value. If you need more you can alway go with a double. Use the "auto-index" as your dictionary look up value and that should make it pretty simple.

This would protect you from duplicates that are just references, because they would all have the same "auto-index".

theTroll

Share this post


Link to post
Share on other sites
Quote:
Original post by Catafriggm
The values of the object will be changing. A hash code of an object suddenly changing while it's in the table would break any hash table implementation.


Wouldn't a value of an object changing - to match the value of another one in the table - break any implementation anyway?

Quote:
Besides that, "duplicate", in this case, is referring to multiple references to the same instance of a class. There is no data in the class itself that is unique to one instance compared to any other.


... Oh. Well, in that case, it sounds like you want "reference semantics" rather than "value semantics" (Google). In C#, AFAIK, you do this by using 'class' instead of 'struct', and then simply not providing the functions for equality and hash-code creation, instead relying on the base Object implementation. (The default "hash code" is, basically, the "location in memory" of the object - although that concept is heavily abstracted by both the virtual machine and the OS. Anyway, the point is that two objects which are not *the same object* will always be considered not-equal - and further, one will be consistently considered "less than" the other, although the decision is somewhat arbitrary - even if they hold the same *data*.)

Share this post


Link to post
Share on other sites
Quote:
Original post by Catafriggm
Quote:
Original post by TheTroll
Well if you know the answer then why are you asking us mere mortals?


What I don't know:
- Is there a way to do <, > on references (rather than the contents of the objects themselves)? The <, > don't need to have any particular meaning other than that if A < B at one point, A < B at some later point, after various members of both have changed.
- Does GetHashCode (or some other hash provider) return something that can reliably be used in a hash table, and is based on the reference itself, rather than the contents of the item? E.g. garbage collection will move things around; will the hash code change on GC? Is the hash code more or less evenly distributed among values for raw references? Will the hash code change if the members in the object referenced change?

What I do know:
- Dictionary is fast (O(1)). But I would need the ability to generate a hash code for a raw reference.
- SortedDictionary is also fast (O(log N)). But I would need a way to do <, > for a raw reference.

Cygon: Ehhh. I suppose if I can't find anything better I could assign each entry in the list a random number to use as the hash key. It would be ugly, but I suppose it would work. Thx for the info.


I don't get it - if you're using the dictionary, why do you need a hash? It will do unique key checks on its own, deletes, insert, and test.

Edit - I missed the whole structure thing. I'd say as Zahlman suggested, use a class instead.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
(The default "hash code" is, basically, the "location in memory" of the object - although that concept is heavily abstracted by both the virtual machine and the OS. Anyway, the point is that two objects which are not *the same object* will always be considered not-equal - and further, one will be consistently considered "less than" the other, although the decision is somewhat arbitrary - even if they hold the same *data*.)


That's exactly what I needed to know. Thanks!

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
(The default "hash code" is, basically, the "location in memory" of the object - although that concept is heavily abstracted by both the virtual machine and the OS. Anyway, the point is that two objects which are not *the same object* will always be considered not-equal - and further, one will be consistently considered "less than" the other, although the decision is somewhat arbitrary - even if they hold the same *data*.)


You do realize that this completely disagrees with what the MSDN says?

You can not be sure that the default hashes of objects will be unique. It blatantly states that in the MSDN.

So you could end up with the same has for different object instances, thus breaking your system.

The only way I can see if for you to create your own unique identifier for each object.

theTroll

Share this post


Link to post
Share on other sites
Quote:
Original post by Catafriggm
Cygon: Ehhh. I suppose if I can't find anything better I could assign each entry in the list a random number to use as the hash key. It would be ugly, but I suppose it would work. Thx for the info.


Usually, you would override the GetHashCode() and Equals() methods of your class. GetHashCode() doesn't need to be unique, but you'll achieve better performance with Hashtables and likewise containers if you don't just return '0', so if you've got any members that are likely to be the main sources of difference between two objects, just XOR their hashcodes and return that.

In Equals() you will have to do a manual one-by-one comparison of all fields in your object.

Take note, however, that the containers do not expect their keys to change. If you modify one of your keys, you'll have to remove the old key from the container, change it and reinsert it into the container.

-Markus-

Share this post


Link to post
Share on other sites

This topic is 3729 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this