[Java] Override equals method for use in HashSet

This topic is 3459 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

So I want to put some things in a HashSet to prevent duplicates. According to me, 2 things are duplicate if they have the same ID member variable. To achieve this, I override the Object.equals method to simply check if the IDs of the two things in question are the same. This seems straightforward, yet HashSet is not calling my equals method. Here's the code:
        public static class edge {
int id;

public edge(int i){ id = i; }

public boolean equals(Object obj){
System.err.println("called"); // Never happens
return id == ((edge)obj).id;
}
}

public static void main(String[] args) {

HashSet<edge> set = new HashSet<edge>();

edge a = new edge(1);
edge b = new edge(2);
edge c = new edge(1);

System.out.println(set.size()); // Returns 3
}



Share on other sites
In Java, whenever you override .equals(), you should also override .hashCode() (which returns an int). The hash containers use that function instead - hence the name.

The proper semantics are that objects that are equal should have the same hashCode, and objects that are not equal should be as likely as possible to have a different hashCode. In your case, you can simply return id;.

Share on other sites
You shouldn't cast obj to edge unchecked, because testing an edge for equality with for example a string should return false, not throw an exception at runtime. Here's how I would do it:

public boolean equals(Object obj){	return obj instanceof edge && id == ((edge) obj).id;}public int hashCode(){	return id;}

Share on other sites
Well, I would like to point out other problem that I see in your code/codes. First of all, passing null into equals should not throw NullPointerException so you should check this explicitly (excerpt from the contract: For any non-null reference value x, x.equals(null) should return false).

Secondly, using "instanceof" is maybe fine, but will return true, also when you subclass your class and pass it to the .equals() method using the same ID. And this might cause .equals() to violate one of the conditions of Object.equals() method contract - symmetricity. ( parent.equals(child) == true, but child.equals(parent) == false)

The most robust implementation of 'equals()' in my opinion would be:

public boolean equals(Object obj) {    if (obj == null) {        return false;    }    if (!obj.getClass().equals(Edge.class)) {        return false    }    return id == ((Edge) obj).id;}

Notice, that I've used Edge with capital E - I strongly recommend to obey this code convention 'rule'.

EDIT: Added references to Object.equals() contract

[Edited by - mouserSVK on August 2, 2008 6:30:33 AM]

Share on other sites
Quote:
 Original post by mouserSVKSecondly, using "instanceof" is maybe fine, but will return true, also when you subclass your class and pass it to the .equals() method using the same ID.

Ooo, interesting one. If all I've done is inherit from Edge to add extra logging, shouldn't Edge and LoggingEdge with the same id count as equal?

I can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.

Share on other sites
Quote:
 Original post by OrangyTangI can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.

Yeah, I can imagine those arguments, too... but if we want to keep the contract of the Object.equals() method, which says:
Quote:
 The equals method implements an equivalence relation on non-null object references: ...It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true. ...

we have to implement it this way.

Share on other sites
You can still keep the symetric property, just as long as no derived classes redefine equals().

Share on other sites
Quote:
 Original post by OrangyTangYou can still keep the symetric property, just as long as no derived classes redefine equals().

Yes, that's right, but still this case can be quite common, when you have a little more complex hierarchy of objects, where it is needed to override the equals() method in the subclass because of e.g. additional field.

Then you lose the symmetricity. If you really need your subclasses' instances to be equal to their upper classes, then it is fine. But I would not create a subclass just for logging, I would rather use log4j or smth - maybe only if the performance would be really hit seriously by "isDebugEnabled()" kind of methods.

(Even in that case I would rather use AspectJ for logging, but that is something too out of topic ;))

Share on other sites
Quote:
 Original post by OrangyTangOoo, interesting one. If all I've done is inherit from Edge to add extra logging, shouldn't Edge and LoggingEdge with the same id count as equal?I can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.

Equals compares equality of state, not equality of identity.

Comparing derived classes is tricky, and should generally be avoided - it's possible if classes have exactly the same members. When comparing classes which contain different members, equals should always return false.

Equals is defined very very clearly, so is its relation to hashCode. While I'm sure it's possible to break it in many ways and get adequate results, there's always those corner cases which result in problems.

Quote:
 According to me, 2 things are duplicate if they have the same ID member variable.

Then use Comparable or Comparator interface.

Quote:
 This seems straightforward, yet HashSet is not calling my equals method.

It doesn't need to, since hashCode() is faster. Two equal objects will always have identical hashCode. When manipulating containers, that one will be used as primary test.

If two identical hashCode()s are found, then equals will be used to resolve equality. Then - either of the two objects will be stored. This is it's invalid to make arbitrary assumptions about what is equal and what isn't.

Share on other sites
Is it just me, or does Java make custom comparison overly complicated? You have hashCode(), equals(), the Comparable interface, if not more. It always seemed simpler to do this stuff in C++ to me.

Share on other sites
Quote:
 Original post by d000hgIs it just me, or does Java make custom comparison overly complicated? You have hashCode(), equals(), the Comparable interface, if not more. It always seemed simpler to do this stuff in C++ to me.

I don't think it is "overly complicated" at all. I even do think that it is a very nice concept and not more complicated than C++ way to do these things.

a) You have equals() method defined for the class 'Object' which has a clearly defined contract. You override this method when you want to make decisions whether instance A equals to instance B of the same class. Usually you just follow the pattern described above. In C++ you can e.g. define operator, which I think is basically the same (but not so clearly defined, since there is no "Object" class in C++).

b) You have hashCode() method which should be overriden always when equals() is overriden because it is used by all those HashMaps, HashSets, etc. Again, I think very nice concept and very simple - it is clear, that when you put objects into HashMap, hashCode() will be used for hash functions (in C++, you have to define it for a hash map). There is one simple condition this method should fulfill: It has to return different results for different objects.

c) You don't have to use Comparable interface unless you want to e.g. sort objects. Then - nothing can be easier than using Comparable and implementing a method of this interface that can compare two instances A, B - it returns negative number, 0, or positive number for A < B, A = B and A > B respectively. Then you just use standard library methods for sorting of collections or arrays (Collections.sort, or Arrays.sort) and that's it. Or you can use these objects as keys in SortedMap if you want.

- and yes, there is even more, you can e.g. create your own Comparator class which can be responsible for comparing of two objects in general and you can pass this method to Collections.sort, or Arrays.sort methods as additional argument.

Share on other sites
Quote:
 Original post by mouserSVKb) You have hashCode() method [...] It has to return different results for different objects.

Wrong. It has to return the same value for equal objects.

[Edited by - DevFred on August 2, 2008 8:42:41 PM]

Share on other sites
Quote:
 Original post by d000hgIs it just me, or does Java make custom comparison overly complicated? You have hashCode(), equals(), the Comparable interface, if not more. It always seemed simpler to do this stuff in C++ to me.

equals - ==
Comparable - predicate less
Comparator - free binary function
hashCode - hash function as applicable to some containers
toString() - operator<<(std::ostream, T)

The difference is, due to simpler syntactic nature of Java, all those are naturally expressed as methods of every class, rather than external functions, or other constructs.

The difference comes from simple fact that Java was built on top of standard library, whereas in C++ that one was tacked on. For better or worse, this design makes a lot of sense given the context.

PS: Few bother to write these functions. There's tools that auto generate them.

Share on other sites
Quote:
Original post by DevFred
Quote:
 Original post by mouserSVKb) You have hashCode() method [...] It has to return different results for different objects.

Wrong. It has to return the same value for equal objects.

You are absolutely right, it was probably too late at the time I was writing that post [smile]

Share on other sites
Quote:
 Original post by d000hgIs it just me, or does Java make custom comparison overly complicated? You have hashCode(), equals(), the Comparable interface, if not more. It always seemed simpler to do this stuff in C++ to me.

C++ doesn't have a standard equivalent to .hashCode(), but that's only because it doesn't have hash containers in its standard library (yet).

.equals() is Java-speak for what C++ programmers call 'operator=='.

The Comparable interface is Java-speak for what C++ programmers call "that optional template parameter for standard library containers where you supply a comparison Predicate".

It's actually all quite analogous. Neither way is more complicated than the other; the Java way is appropriate for an object-oriented language, and the C++ way is appropriate for a multi-paradigm language where the standard library is based heavily on templates (parametric, compile-time polymorphism).