[Java] Override equals method for use in HashSet

Started by
13 comments, last by Zahlman 15 years, 8 months ago
So I want to put some things in a HashSet to prevent duplicates. According to me, 2 things are duplicate if they have the same ID member variable. To achieve this, I override the Object.equals method to simply check if the IDs of the two things in question are the same. This seems straightforward, yet HashSet is not calling my equals method. Here's the code:

        public static class edge {
		int id;

		public edge(int i){ id = i; }

		public boolean equals(Object obj){
			System.err.println("called"); // Never happens
			return id == ((edge)obj).id;
		}
	}

	public static void main(String[] args) {
		
		HashSet<edge> set = new HashSet<edge>();
		
		edge a = new edge(1);
		edge b = new edge(2);
		edge c = new edge(1);
		
		set.add(a);
		set.add(b);
		set.add(c);
		
		System.out.println(set.size()); // Returns 3
	}


Advertisement
In Java, whenever you override .equals(), you should also override .hashCode() (which returns an int). The hash containers use that function instead - hence the name.

The proper semantics are that objects that are equal should have the same hashCode, and objects that are not equal should be as likely as possible to have a different hashCode. In your case, you can simply return id;.
You shouldn't cast obj to edge unchecked, because testing an edge for equality with for example a string should return false, not throw an exception at runtime. Here's how I would do it:

public boolean equals(Object obj){	return obj instanceof edge && id == ((edge) obj).id;}public int hashCode(){	return id;}
Well, I would like to point out other problem that I see in your code/codes. First of all, passing null into equals should not throw NullPointerException so you should check this explicitly (excerpt from the contract: For any non-null reference value x, x.equals(null) should return false).

Secondly, using "instanceof" is maybe fine, but will return true, also when you subclass your class and pass it to the .equals() method using the same ID. And this might cause .equals() to violate one of the conditions of Object.equals() method contract - symmetricity. ( parent.equals(child) == true, but child.equals(parent) == false)

The most robust implementation of 'equals()' in my opinion would be:

public boolean equals(Object obj) {    if (obj == null) {        return false;    }    if (!obj.getClass().equals(Edge.class)) {        return false    }    return id == ((Edge) obj).id;}

Notice, that I've used Edge with capital E - I strongly recommend to obey this code convention 'rule'.

EDIT: Added references to Object.equals() contract

[Edited by - mouserSVK on August 2, 2008 6:30:33 AM]
Quote:Original post by mouserSVK
Secondly, using "instanceof" is maybe fine, but will return true, also when you subclass your class and pass it to the .equals() method using the same ID.

Ooo, interesting one. If all I've done is inherit from Edge to add extra logging, shouldn't Edge and LoggingEdge with the same id count as equal?

I can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.
Quote:Original post by OrangyTang
I can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.


Yeah, I can imagine those arguments, too... but if we want to keep the contract of the Object.equals() method, which says:
Quote:
The equals method implements an equivalence relation on non-null object references:
...
It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
...

we have to implement it this way.
You can still keep the symetric property, just as long as no derived classes redefine equals().
Quote:Original post by OrangyTang
You can still keep the symetric property, just as long as no derived classes redefine equals().


Yes, that's right, but still this case can be quite common, when you have a little more complex hierarchy of objects, where it is needed to override the equals() method in the subclass because of e.g. additional field.

Then you lose the symmetricity. If you really need your subclasses' instances to be equal to their upper classes, then it is fine. But I would not create a subclass just for logging, I would rather use log4j or smth - maybe only if the performance would be really hit seriously by "isDebugEnabled()" kind of methods.

(Even in that case I would rather use AspectJ for logging, but that is something too out of topic ;))


Quote:Original post by OrangyTang

Ooo, interesting one. If all I've done is inherit from Edge to add extra logging, shouldn't Edge and LoggingEdge with the same id count as equal?

I can see arguments for both ways TBH, which is best probably depends on exactly what an Edge and an id mean in this context.


Equals compares equality of state, not equality of identity.

Comparing derived classes is tricky, and should generally be avoided - it's possible if classes have exactly the same members. When comparing classes which contain different members, equals should always return false.

Equals is defined very very clearly, so is its relation to hashCode. While I'm sure it's possible to break it in many ways and get adequate results, there's always those corner cases which result in problems.

Quote:According to me, 2 things are duplicate if they have the same ID member variable.


Then use Comparable or Comparator interface.

Quote:This seems straightforward, yet HashSet is not calling my equals method.


It doesn't need to, since hashCode() is faster. Two equal objects will always have identical hashCode. When manipulating containers, that one will be used as primary test.

If two identical hashCode()s are found, then equals will be used to resolve equality. Then - either of the two objects will be stored. This is it's invalid to make arbitrary assumptions about what is equal and what isn't.
Is it just me, or does Java make custom comparison overly complicated? You have hashCode(), equals(), the Comparable interface, if not more. It always seemed simpler to do this stuff in C++ to me.

This topic is closed to new replies.

Advertisement