Local hash

Started by
18 comments, last by Juliean 7 years, 11 months ago

std::unordered_map<> is a hash table
And it is very new in C++ terms (C++11, it seems).

So unless you have been learning programming C++ in the last 5 years, you never encountered it.

Advertisement

std::unordered_map<> is a hash table

And it is very new in C++ terms (C++11, it seems).
So unless you have been learning programming C++ in the last 5 years, you never encountered it.


I learned C++ in 2001, and at the time gcc had hash_map, which served the same purpose. Hash-based maps are now part of the standard, but they are not a new thing.

[EDIT: It's possible that it was Boost that provided hash_map, not gcc. Also, the C++ committee proposed unordered_map in 2005, as part of TR1. The STL also had hash_map, and that has been around since 1994.]
[EDIT: It's possible that it was Boost that provided hash_map, not gcc. Also, the C++ committee proposed unordered_map in 2005, as part of TR1. The STL also had hash_map, and that has been around since 1994.]

No, you're almost certainly right.

I'm pretty sure I've used it, and I abstain from Boost because it's such an awful big dependency -- so it must be in GCC.

If I recall correctly and am not having Korsakoff's syndrome, I even remember it was some modulo-some-prime based implementation. Which made it slightly slower overall (but not much) but rendered it very resilient to bad hash functions and bad input (you could hardly get it to deteriorate even if you tried hard feeding it with deliberately bad hashes).

Fair enough, I just looked at the standard http://en.cppreference.com/w/cpp/container/unordered_map which says C++11.

I learned C++ in the '90s from Stroustrup "The C++ language 3rd edition", and that's what I always used. I never looked around for more, since it covered all my needs, and only using the standard concepts has the nice property that it works everywhere out of the box. (A lot of my software gets build and used by others than me, adding more dependencies is generally bad.) Even today, I haven't yet used the unorderd map in C++ (not needed so far).

I did use hashtables in Java and Python though. For classes that need my own equivalence notion, the "<" ordering of C++ feels simpler to me than the "equals" + "hash". Inventing a hash function is always a bit of a problem, as it is hard to understand how your choice affects performance.

std::unordered_map<> is a hash table. If you don't need to keep the items in your map sorted by key, it's very likely that you can use unordered_map instead of map, and in many cases the resulting code will be faster (e.g., if you have lots of entries (because hash tables have better asymptotic performance), or if key comparisons are expensive).

With "lots" being "several thousand." A std::map<> with a string key uses simple string comparison, so O(1) on the average length of the string keys (almost always very short) and O(log n) searches on the number of keys. Unless you choose a custom string hash and bucket sizes very carefully, or have a huge data set you're mapping, std::map<> with std::string keys will beat std::unordered_map<> every time, in terms of speed. It will also beat it when trying to debug, because the data is ordered making it easier to find missing or incorrect values (I calculate developer time as the most expensive resource I manage).

If your key is an integral value, std::unordered_map<> is sometimes a better bet because of memory allocation characteristics, but metrics are always your friend there. Fortunately, the standard map classes are almost entirely interchangeable at the code level so switching is easy.

For smaller data sets (on the order of hundreds) a std::vector<> almost always beats any other collection in terms of memory and speed. If it's created once and treated read-only it will always beat a std::map<> for lookup speed, but std::map<> usually still has simpler code for the same functionality.

Stephen M. Webb
Professional Free Software Developer


With "lots" being "several thousand." A std::map<> with a string key uses simple string comparison, so O(1) on the average length of the string keys (almost always very short) and O(log n) searches on the number of keys. Unless you choose a custom string hash and bucket sizes very carefully, or have a huge data set you're mapping, std::map<> with std::string keys will beat std::unordered_map<> every time, in terms of speed. It will also beat it when trying to debug, because the data is ordered making it easier to find missing or incorrect values (I calculate developer time as the most expensive resource I manage).

If your key is an integral value, std::unordered_map<> is sometimes a better bet because of memory allocation characteristics, but metrics are always your friend there. Fortunately, the standard map classes are almost entirely interchangeable at the code level so switching is easy.


Thousands of items with integral keys is a pretty common situation, at least for me.

If anybody knows of a way to keep track of used integers, I would like to know. My method isn't as efficient as I thought I think. Maybe there's multiple ways to encode the same values with that way. E.g... 0m0 01m 101 = 000 010 010 011 101 = 010 011 101 000 = 01m 101 000... Mmm

It would be very helpful if you could clearly and completely describe the problem you're trying to solve, and what you've tried so far.

As it stands I have no clue what you are attempting to accomplish.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

A, who cares...

Hey I made some progress. I tried setting all the mask bits to 1 and it always leads to all 1's irreversibly. If only 1 of the bits is 1, then it cycles back and forth between different ends of the range but doesn't cover all of it.

Basically you can solve for variables.



110 -> 000
	010 -> 001
	111 -> 010
	001 -> 011

	truth table boolean
	i2	i1	i0		o2	o1	o0
	0	0	0		1	1	1
	...
	0	0	1		0	1	1
	1	1	0		0	0	0
	0	1	0		0	0	1
	1	1	1		0	1	0

	//hash fun
	o = (i << 1) | (i >> 31)
	o = o ^ ~(i & m)
	i = o
	//repeat 32

	o0[0] = i[2] ^ ~( i[0] & m0[0] )
	o0[1] = i[0] ^ ~( i[1] & m0[1] )
	o0[2] = i[1] ^ ~( i[2] & m0[2] )
	o1[0] = o0[2] ^ ~( o0[0] & m1[0] )
	o1[1] = o0[0] ^ ~( o0[1] & m1[1] )
	o1[2] = o0[1] ^ ~( o0[2] & m1[2] )
	o2[0] = o1[2] ^ ~( o1[0] & m2[0] )
	o2[1] = o1[0] ^ ~( o1[1] & m2[1] )
	o2[2] = o1[1] ^ ~( o1[2] & m2[2] )

	truth table t ^ ~(i & m)
	t	i	m			t2
	0	0	0			1
	0	0	1			1
	0	1	0			1
	0	1	1			0
	1	0	0			0
	1	0	1			0
	1	1	0			0
	1	1	1			1

	train for 110 -> 000
	o2[0] = 0
	0 = o1[2] ^ ~( o1[0] & m2[0] )
	so o1[2]=0 and ~( o1[0] & m2[0] )=0
	o1[2]=0
	so o1[0]=1 and m2[0]=1
	so o0[2]=0 and ~( o0[0] & m1[0] )=1
	so o0[2]=0 and o0[0]=0 and m1[0]=0
	or o0[2]=0 and o0[0]=1 and m1[0]=0
	or o0[2]=0 and o0[0]=0 and m1[0]=1
	or o0[2]=1 and ~( o0[0] & m1[0] )=0
	so o0[2]=1 and o0[2]=1 and o0[0]=1 and m1[0]=1
	o0[2]=1  and o0[0]=1
	so o0[2]=1 and i[2]=1 and ~( i[0] & m0[0] )=0
	so o0[2]=1 and i[0]=1 and m0[0]=1
	or o0[2]=1 and i[2]=0 and ~( i[0] & m0[0] )=1
	so o0[2]=1 and i[0]=0 and m0[0]=0
	or o0[2]=1 and i[0]=1 and m0[0]=0
	or o0[2]=1 and i[0]=0 and m0[0]=1
	or o1[2]=1 and ~( o1[0] & m2[0] )=1
	so o1[2]=1 and o1[0]=0 and m2[0]=0
	o1[2]=1 and o1[0]=0
	so o1[2]=1 and o0[2]=1 && ~( o0[0] & m1[0] )=1
	or o1[2]=1 and o0[2]=0 && ~( o0[0] & m1[0] )=0
	or o1[2]=1 and o1[0]=1 and m2[0]=0
	o1[2]=1 and o1[0]=1
	so 
	or o1[2]=1 and o1[0]=0 and m2[0]=1
	o1[2]=1 and o1[0]=0
	so o1[2]=1 and o0[2]=1 && ~( o0[0] & m1[0] )=1
	or o1[2]=1 and o0[2]=0 && ~( o0[0] & m1[0] )=0
	etc.

	just pick first one given restrictions, then pick for second applying the new restrictions, 
	and if it doesn't work, back up the last step and try different, and so on

	truth table t ^ ~(i & m)
	t	i	m			t2
	0	0	0			1
	0	0	1			1
	0	1	0			1
	0	1	1			0
	1	0	0			0
	1	0	1			0
	1	1	0			0
	1	1	1			1

			o0[0] = i[2] ^ ~( i[0] & m0[0] )
			o0[1] = i[0] ^ ~( i[1] & m0[1] )
			o0[2] = i[1] ^ ~( i[2] & m0[2] )
			o1[0] = o0[2] ^ ~( o0[0] & m1[0] )
			o1[1] = o0[0] ^ ~( o0[1] & m1[1] )
			o1[2] = o0[1] ^ ~( o0[2] & m1[2] )
			o2[0] = o1[2] ^ ~( o1[0] & m2[0] )
			o2[1] = o1[0] ^ ~( o1[1] & m2[1] )
			o2[2] = o1[1] ^ ~( o1[2] & m2[2] )

			therefore
			o2[0] = o1[2] ^ ~( o1[0] & m2[0] )
			o2[1] = o1[0] ^ ~( o1[1] & m2[1] )
			o2[2] = o1[1] ^ ~( o1[2] & m2[2] )

			therefore
			o2[0] = ( o0[1] ^ ~( o0[2] & m1[2] ) ) ^ ~( o1[0] & m2[0] )
			o2[1] = ( o0[1] ^ ~( o0[2] & m1[2] ) ) ^ ~( o1[1] & m2[1] )
			o2[2] = ( o0[0] ^ ~( o0[1] & m1[1] ) ) ^ ~( o1[2] & m2[2] )

			therefore
			o2[0] = ( o0[1] ^ ~( o0[2] & m1[2] ) ) ^ ~( ( o0[2] ^ ~( o0[0] & m1[0] ) ) & m2[0] )
			o2[1] = ( o0[1] ^ ~( o0[2] & m1[2] ) ) ^ ~( ( o0[0] ^ ~( o0[1] & m1[1] ) ) & m2[1] )
			o2[2] = ( o0[0] ^ ~( o0[1] & m1[1] ) ) ^ ~( ( o0[1] ^ ~( o0[2] & m1[2] ) ) & m2[2] )

			therefore
			o2[0] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( o0[2] & m1[2] ) ) ^ ~( ( o0[2] ^ ~( o0[0] & m1[0] ) ) & m2[0] )
			o2[1] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( o0[2] & m1[2] ) ) ^ ~( ( o0[0] ^ ~( o0[1] & m1[1] ) ) & m2[1] )
			o2[2] = ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( o0[1] & m1[1] ) ) ^ ~( ( o0[1] ^ ~( o0[2] & m1[2] ) ) & m2[2] )

			therefore
			o2[0] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( o0[2] ^ ~( o0[0] & m1[0] ) ) & m2[0] )
			o2[1] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( o0[0] ^ ~( o0[1] & m1[1] ) ) & m2[1] )
			o2[2] = ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) ^ ~( ( o0[1] ^ ~( o0[2] & m1[2] ) ) & m2[2] )

			therefore
			o2[0] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[1] ^ ~( i[2] & m0[2] ) ) ^ ~( o0[0] & m1[0] ) ) & m2[0] )
			o2[1] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( o0[1] & m1[1] ) ) & m2[1] )
			o2[2] = ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) ^ ~( ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( o0[2] & m1[2] ) ) & m2[2] )

			therefore
			o2[0] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[1] ^ ~( i[2] & m0[2] ) ) ^ ~( ( i[2] ^ ~( i[0] & m0[0] ) ) & m1[0] ) ) & m2[0] )
			o2[1] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) & m2[1] )
			o2[2] = ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) ^ ~( ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) & m2[2] )

			and
			o[0] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[1] ^ ~( i[2] & m0[2] ) ) ^ ~( ( i[2] ^ ~( i[0] & m0[0] ) ) & m1[0] ) ) & m2[0] )
			o[1] = ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) ^ ~( ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) & m2[1] )
			o[2] = ( ( i[2] ^ ~( i[0] & m0[0] ) ) ^ ~( ( i[0] ^ ~( i[1] & m0[1] ) ) & m1[1] ) ) ^ ~( ( ( i[0] ^ ~( i[1] & m0[1] ) ) ^ ~( ( i[1] ^ ~( i[2] & m0[2] ) ) & m1[2] ) ) & m2[2] )

But there are multiple ways to get a certain output bit at a level, so there are many possibilities to try. This is better than using 8-bit floats for every neuron. Instead you can do 64 neurons at once. And this is simpler on the silicon. I don't have a solution yet though. But maybe I'll solve it.

This topic is closed to new replies.

Advertisement