Yeah, while hashes are one way, you can just create an indexed dictionary of key value pairs, where the key is the hash value and the value is the image or the quadrant of an image. If you store 1,000,000 memes on disk, you can save disk space by not storing the parts which are common across all 1,000,000 memes, and instead just have a reference link via the hash value (the key lookup in a dictionary).
The challenge comes with distortions in the image and recognizing that it's an anomaly that can be ignored. So, traditional hashing techniques, such as MD5 are not suited for this because they're looking at binary comparisons. Using MD5 would just get us back to where we started, which is storing 1,000,000 very similar images on disk with common information which may vary by a single bit. So, we'd want to have a neural network which looks at an image and is a little bit more fuzzy with precision. Changing a single bit between to identify images would not change its 99.9% confidence level at identification. So, if the neural network can tolerate a lot of distortion and variation between binary values but still have a 99.9% confidence level at correctly identifying the image, we can use that identified image's hash value to identify it.
The thing I haven't really considered deeply until now, is that the neural network will necessarily have some sort of threshold value for anomaly tolerance. What if... the tolerance is too high and we resolve two unique images to the same hash value? This would be a collision at the neural net level rather than at the hashing function level. Even with careful tweaking of tolerance values, you'd still run a risk of collisions with the neural network. But, maybe we just bite the bullet and say collisions don't matter? Can humans look at two distinct pictures and think they're the same? If humans can't tell the difference, then maybe we can excuse computers if they can't either? But this assumes that computer vision is on par with human vision as well, and I'm not sure we're there yet.
After I wrote my blog post, the following day I saw an article which claimed that something similar to what I described above could change the way databases catalog data: https://blog.bradfieldcs.com/an-introduction-to-hashing-in-the-era-of-machine-learning-6039394549b0
It's interesting because the approaches to indexing data are similar... and makes me wonder if there is some really good targeted advertising towards me with another machine learning system? A bit ironic in a way. Anyways, read the article if you have time. It's a good one.