deinstancing c-strings

Started by
47 comments, last by tanzanite7 10 years, 1 month ago

Seriously, noone comes up with this?

enum Colors
{
  red = 0xffff0000,
  green = 0xff00ff00,
  blue = 0xff0000ff
};

You get the best of enums and you get rid of the if/else-code completely.

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

Advertisement

I think what he wants to do is to not be required to explicitly declare enums to be able to use them.

Would it be possible to use a macro or such to hash the string at compile time and pass hashes instead of pointers.

eg.
if (hashEnum == makeHashEnum("red")) {}

where hashEnum is an int of some size.

Collisions probably won't be a problem if you use 64 bits or something. Probably.

the base idea (I was writting here about 4 times already) was to have "ad-hoc" enums - you understand it right, some enum I could use without defining them

foo(x,y, @red);

foo(x,y, @green);

foo(x,y, @quick);

just to use free ad hoc values in any place and compare them
with another set of free ad hoc values in any other place (this is one
of contepts of my own)

So something like symbols in Lisp, in other words?

enum Colors
{
red = 0xffff0000,
green = 0xff00ff00,
blue = 0xff0000ff
};

I really like that. I don't think enums are guaranteed to hold 32 bits, but this is more likeley to work than merging strings. If the compiler can't produce 32 bit enums for whatever reason, it should at least spit out a warning or error to alert you of this.

Fir, I know everyone's already told you why this is a bad idea. I used to code like this at one point. I completely agree with you that the string literals are just turned into a magic number by the compiler. Each time you compile, its different, but its a unique number because that literal needs a unique address in memory. It would work perfectly well if all the constants could be merged together across files. But, thats not guaranteed behavior, and even if one compiler does it today, it might not tomorrow.

If you want to keep the idea of an ad-hoc enum, where you can just use it without defining anything, then you could use a syntax like:

At the top of each file have:


#include <adhoc_enums.h>

Then in your code use a notation like:


void color_something(thing* thing, adhoc color)
{

   if (color == adhoc(red) )
      ...
   if (color == adhoc(green) )
     ...

}

Your makefile can call a short script, that finds all the adhoc references, and produces adhoc_enums.h:


#define adhoc(x) adhoc_enum_ ## x

typedef enum {
    adhoc_enum_null = 0,
    adhoc_enum_red = 1,
    adhoc_enum_green,
    adhoc_enum_blue
...

} adhoc_enum;

Advantages:

  • You get real enums
  • Your values are basically ints now. Then can be safely =='d against each other
  • You can use switch/case for if/else chains.
  • Type safety against other char*. Also probably compiler warning against ints or other enum types. adhoc_enums only 'like' each other
  • No dependence of weird compiler or linker options

Disadvantages:

  • Extra build step w/ an extra utility you need to write (not as gross as you think. Ever use qmake? )

You also have the added advantage, that you could do this once, and keep the generated enum file. When you use new enum values, you can add them to adhoc_enums.h manually. Then maybe at some point you will decide to split your enums into different types (enum_color, enum_shape). You might find yourself going down a standards-compliant path.

(blank, mistaken)

I think what he wants to do is to not be required to explicitly declare enums to be able to use them.

Would it be possible to use a macro or such to hash the string at compile time and pass hashes instead of pointers.

eg.
if (hashEnum == makeHashEnum("red")) {}

where hashEnum is an int of some size.

Collisions probably won't be a problem if you use 64 bits or something. Probably.

the base idea (I was writting here about 4 times already) was to have "ad-hoc" enums - you understand it right, some enum I could use without defining them

foo(x,y, @red);

foo(x,y, @green);

foo(x,y, @quick);

just to use free ad hoc values in any place and compare them
with another set of free ad hoc values in any other place (this is one
of contepts of my own)

So something like symbols in Lisp, in other words?

dont know lisp, just need an ad-hoc enum,

free anum that needs not a definition, way to pass enum-style identifiers and check them in the other place

I did find out in gcc (at least on my version of linux), if I use at least -O1, and all my input files are passed at once( eg. gcc -O1 bar.c foo.c baz.c ), all my identical constants were merged. Is there any reason why you can't build it all 'at once', instead of making a bunch of .o files seperately?

fir, you should take a look at this: http://stackoverflow.com/questions/7459939/what-do-single-quotes-do-in-c-when-used-on-multiple-characters

If you restrict yourself to 4 characters, you can do this:


 foo(x,y, 'red' );
 foo(x,y, 'grn' );
 foo(x,y, 'quik' );

//i tried out out on gcc, and if its too long, it just takes the 1st 4:
foo(x,y, 'red' );
foo(x,y, 'green' );  //really gree
foo(x,y, 'quick' );  //really quic

You 4 characters packed into an int. Its as close to human-readable, not predefined values you'll be able to get in C without resorting to linker tricks.

I did find out in gcc (at least on my version of linux), if I use at least -O1, and all my input files are passed at once( eg. gcc -O1 bar.c foo.c baz.c ), all my identical constants were merged. Is there any reason why you can't build it all 'at once', instead of making a bunch of .o files seperately?

fir, you should take a look at this: http://stackoverflow.com/questions/7459939/what-do-single-quotes-do-in-c-when-used-on-multiple-characters

If you restrict yourself to 4 characters, you can do this:


 foo(x,y, 'red' );
 foo(x,y, 'grn' );
 foo(x,y, 'quik' );

//i tried out out on gcc, and if its too long, it just takes the 1st 4:
foo(x,y, 'red' );
foo(x,y, 'green' );  //really gree
foo(x,y, 'quick' );  //really quic

You 4 characters packed into an int. Its as close to human-readable, not predefined values you'll be able to get in C without resorting to linker tricks.

well very interesting - i dint know about the second thing it could be used here if limits self to 4 letters 'red' 'grn' 'blue', sometimec can be handy - what type you used in header maybe 64bit intrger would encode eight letters?

first thing also interesting, previously i used this 'source builds',

I could use it but some other reasons make me to use ful binary

seperation when compiling

im very unhappy thai it seem that there is no way to make gcc

merge this literals (or there is a way and i dont know it)

it seems that i must drop binary separation when compiling

or this kind of string literal enums (or use physical compare)

or fourth choice is to find other compiler who merges that (if there is one) - 4 choices each one is bad :C

if i would use 64 bit (maybe i will soon) and if it will show that when using 64 bit integers i can encode '8 letter' the way you said i could yet use it (esp if it will show that this is passed in fastcall through register) it would be ok (5th choice),

4 letters is a bit to little though its worth of consiteration too - so tnx for the info

theorethically speaking even if it could encode 'any length of string', imo holding the pointer to data is better (from theoretical point), enum is some kind of type (it is really a class not just type) thet rely on the fact that strictly you use defined identifiers

not it contents (as with normal types when you use constants)

but incidentaly when use pointers as it identifiers you could

tie some usefull data witch each one even various kind of data

for each one so imo it is very fortunate tu use pointers for them

- but this kind of encoding strings as values is also nice

Would it be possible to use a macro or such to hash the string at compile time and pass hashes instead of pointers.

eg.
if (hashEnum == makeHashEnum("red")) {}

where hashEnum is an int of some size.

Collisions probably won't be a problem if you use 64 bits or something. Probably.

Yep, here's a simple implementation biggrin.png


//Runtime version:
inline u32 Fnv32a( const char* text )
{
	u32 offsetBasis = 2166136261;//magic salt
	u32 prime = 16777619;//magic prime 2^24+203
	u32 hash = offsetBasis;
	if( text )
	{
		for( const u8* b=(u8*)text; *b; ++b )
		{
			hash ^= *b;
			hash *= prime;
		}
	}
	return hash;
}

//Compile-time version:
#define HASH_(t) ^*(t))*16777619U)
#define HASH_1(text) HASH_(text)
#define HASH_2(text) HASH_(text) HASH_1(text+1);
#define HASH_3(text) HASH_(text) HASH_2(text+1);
#define HASH_4(text) HASH_(text) HASH_3(text+1);
#define HASH_5(text) HASH_(text) HASH_4(text+1);
#define HASH_6(text) HASH_(text) HASH_5(text+1);
#define HASH_7(text) HASH_(text) HASH_6(text+1);
#define HASH_8(text) HASH_(text) HASH_7(text+1);
#define HASH_9(text) HASH_(text) HASH_8(text+1);

#define OPEN_1 ((
#define OPEN_2 (( OPEN_1
#define OPEN_3 (( OPEN_2
#define OPEN_4 (( OPEN_3
#define OPEN_5 (( OPEN_4
#define OPEN_6 (( OPEN_5
#define OPEN_7 (( OPEN_6
#define OPEN_8 (( OPEN_7
#define OPEN_9 (( OPEN_8

#define JOIN(a,b) a##b
#define FNV32A(text, size) JOIN(OPEN_,size) 2166136261U JOIN(HASH_,size)(text)

// Example test:
	uint compileTime = FNV32A("asdf", 4);
	uint runTime = Fnv32a("asdf");
	assert( compileTime == runTime );

The only bad thing about this version is that it requires you to include the length of the string into the macro (e.g. ENUM("red",3), instead of just ENUM("red")).
However, this is just my simple example of compile-time hashing. There's many other implementations to be found on the net if you search for "compile time hashing", etc...

C++11 features such as constexpr make more modern implementations a lot cleaner than the above!

https://gist.github.com/erikcharlebois/1602220
http://sizeofvoid.blogspot.com.au/2012/07/compile-time-hashes.html
http://stackoverflow.com/questions/2111667/compile-time-string-hashing
http://bitsquid.blogspot.com.au/2010/10/static-hash-values.html

It's fairly safe to assume that there will be no hash collisions, but to be safe, in your debug builds I would create a global object that maintains a dictionary of all hashes and their strings (like the set<string> in my first post in this thread, except a map<uint,string>). In the debug build, the ENUM macro (etc) would insert the input string and the hash into this dictionary, and if that key/hash already exists, it would assert that the value/string in the dictionary matches the one that you're trying to insert.

That way if you do have any hash collisions, you'll find out about it during debugging.

If you restrict yourself to 4 characters, you can do this:


 foo(x,y, 'red' );
 foo(x,y, 'grn' );
 foo(x,y, 'quik' );

That's interesting, I didn't know about these "multi-character literals". Even though they're implementation defined, I wonder if there's some agreeance between compilers as to how they should work? Maybe it just depends if you're compiling for big or little endian...

if i would use 64 bit (maybe i will soon) and if it will show that when using 64 bit integers i can encode '8 letter' the way you said i could yet use it (esp if it will show that this is passed in fastcall through register) it would be ok (5th choice),

4 letters is a bit to little though its worth of consiteration too - so tnx for the info

A more compliant version of the above trick might be (although this assumes your string has at least 4 characters):
#define FOURCC(text) ((uint32_t)text[0] | ((uint32_t)text[1]<<8) | ((uint32_t)text[2]<<16) | ((uint32_t)text[3]<<24))
Or for eight chars (though again, this assumes your string has at least 8 characters):
#define EIGHTCC(text) ((uint64_t)text[0] | ((uint64_t)text[1]<<8) | ((uint64_t)text[2]<<16) | ((uint64_t)text[3]<<24) | ((uint64_t)text[4]<<32) | ((uint64_t)text[5]<<40) | ((uint64_t)text[6]<<48) | ((uint64_t)text[7]<<56))

If you restrict yourself to 4 characters, you can do this:


 foo(x,y, 'red' );
 foo(x,y, 'grn' );
 foo(x,y, 'quik' );
That's interesting, I didn't know about these "multi-character literals". Even though they're implementation defined, I wonder if there's some agreeance between compilers as to how they should work? Maybe it just depends if you're compiling for big or little endian...

I used thous a lot a-bloody-long-time-ago, but abandoned it when more than one compiler entered my life and one of them pointed out that i am sitting in a minefield. Anyway, IIRC, it was fairly consistent on little endian (with tripple-facepalm worthy highly objectionable nonsensical ordering!), if you can stomach the bazillion warning messages it produces - i could not.

It is a dead feature for me (even though 'foo' == 'foo' is always true and even 'foo' != 'bar' was consistently true - event though it does not have to be).

edit: actually, i think borland used to default to sensible ordering (ie. it changed the default at some point which was the source of the warnings i remember - i think), but had a compatibility switch to reverse order - so, that would make it clear that there really is no agreeance. Microsoft has the retarded order and no switch iirc.

edit2: http://www.borlandtalk.com/is-there-a-standard-for-multi-char-constants-vt104092.html seems to confirm my vague memory. It changed between Borlands C Builder and Developer Studio. Ie. not only is it inconsistent on little-endian platform - it is inconsistent on compilers from the same manufacturer and also dependent on compiler settings.

That said, it seems Microsofts ordering has prevailed and one could say that it is now more consistent than ever before :D, whatever da-f* that is worth :/.

This topic is closed to new replies.

Advertisement