deinstancing c-strings

Started by
47 comments, last by tanzanite7 10 years, 2 months ago

Just don't try to use pointers as enumeration constants. Its a bad idea - loss of enumeration type safety, hard to serialize, hard (as you have seen) to enforce the behaviour you need with different compilers, hard for other programmers to understand etc etc.

I totally fail to see why "red" is better than Color::red by the way. I can see reasons why Color::red is better than "red" though:

void f(Color::Type type);

void g(const char *type);

f() you can't accidentally pass a wrong type to. g() you can pass anything to. Its just abuse of the compiler and silly.

Yes, if you can guarantee that all of the identical strings are merged and all reside at the same address, this can work. That doesn't make it something you should do.

I was verry hapyy with that : you just passes string litereals everything

works and is top efficient end easy expandable, errors and mistakes

are also very easy to cactch here


void f(char* str)
{
       if(str=="red") ;
else if(str=="green") ;
else if(str=="orange") ;
else ERROR("string %s unrecognized in  ...", s);
}

you can also build many othet techniques on top of that (you can fall to consuming contents of the string if needed etc)

(thats why i am emphasizing this is my invention, (though maybe someone was used before but i never heard of it) (well two inventions really, one is a concept of "ad-hoc" enum that was first , and second is its string literal implementation) because i was so happy with that

the practical reason of its great usability is the thing that you

spare code jumping (to definition - which is tiring really),

also you do not need to maintain this enums definitions stuff

but aprat of this - i got a bit of breadown because this -fmerge-constants do not work across the modules (so i cannot use this technique across the modules) and i dont know why it does not work, and how to make it work - could someone help yet? (tnx kingmir for hint it was usefull)

Advertisement

I understand your reasons why you think this is a good idea. But I still maintain it isn't. You are typing all of these properties as const char*, with no way to differentiate them. Your example of the if-else chain is moving a check to runtime that should be done at compile time, so it isn't more efficient at all.

If the types were defined as an enumeration, the compiler can check at compile time whether a valid valid is being used (unless of course you mess about with static_casts but then all bets are off).

There are cases where I can see this being useful. Using strings as identifiers can reduce coupling between modules and is in some circumstances a good approach. But these cases are handled best using interning and maybe hashmapping, not by directly comparing pointers.

With respect, you haven't "invented" this. You have just been exploiting an implementation detail of a specific compiler in a standards-undefined manner.

I understand your reasons why you think this is a good idea. But I still maintain it isn't. You are typing all of these properties as const char*, with no way to differentiate them. Your example of the if-else chain is moving a check to runtime that should be done at compile time, so it isn't more efficient at all.

If the types were defined as an enumeration, the compiler can check at compile time whether a valid valid is being used (unless of course you mess about with static_casts but then all bets are off).

There are cases where I can see this being useful. Using strings as identifiers can reduce coupling between modules and is in some circumstances a good approach. But these cases are handled best using interning and maybe hashmapping, not by directly comparing pointers.

With respect, you haven't "invented" this. You have just been exploiting an implementation detail of a specific compiler in a standards-undefined manner.

It is very cheap check in runtime, not a problem for me (you could even include it only in debug mode)

I like this kind of oldschool simplicity

We got a difference of opinions here, well, go on your way if you like

I understand your reasons why you think this is a good idea. But I still maintain it isn't. You are typing all of these properties as const char*, with no way to differentiate them. Your example of the if-else chain is moving a check to runtime that should be done at compile time, so it isn't more efficient at all.

If the types were defined as an enumeration, the compiler can check at compile time whether a valid valid is being used (unless of course you mess about with static_casts but then all bets are off).

There are cases where I can see this being useful. Using strings as identifiers can reduce coupling between modules and is in some circumstances a good approach. But these cases are handled best using interning and maybe hashmapping, not by directly comparing pointers.

With respect, you haven't "invented" this. You have just been exploiting an implementation detail of a specific compiler in a standards-undefined manner.

It is very cheap check in runtime, not a problem for me (you could even include it only in debug mode)

I like this kind of oldschool simplicity

We got a difference of opinions here, well, go on your way if you like

No problem at all, each to their own. If you find a way to get this to work in a standards-defined manner across every compiler your code may every possibly be compiled with (including assurances that no future updates to the compiler will suddenly break it), please share.

I too would be interested in maybe using this technique if the above was not an issue.

Hodgman I really like, and I often support you, but that code you just posted is filled with ugliness: It's got std::set, std::string and globals. Can't get uglier than that.

Hodgman is a good man :) interesting answers often

I understand your reasons why you think this is a good idea. But I still maintain it isn't. You are typing all of these properties as const char*, with no way to differentiate them. Your example of the if-else chain is moving a check to runtime that should be done at compile time, so it isn't more efficient at all.

If the types were defined as an enumeration, the compiler can check at compile time whether a valid valid is being used (unless of course you mess about with static_casts but then all bets are off).

There are cases where I can see this being useful. Using strings as identifiers can reduce coupling between modules and is in some circumstances a good approach. But these cases are handled best using interning and maybe hashmapping, not by directly comparing pointers.

With respect, you haven't "invented" this. You have just been exploiting an implementation detail of a specific compiler in a standards-undefined manner.

It is very cheap check in runtime, not a problem for me (you could even include it only in debug mode)

I like this kind of oldschool simplicity

We got a difference of opinions here, well, go on your way if you like

No problem at all, each to their own. If you find a way to get this to work in a standards-defined manner across every compiler your code may every possibly be compiled with (including assurances that no future updates to the compiler will suddenly break it), please share.

alright, i think the assurance should be made by c standard because it is so usable, but afaik it now sadly depends on the linker

in reality form me it is not primary trouble because i can stick to one given compiler (mingw) and only write code using compiler dependant

behavior, so if it only be working here i would be happy [i know many will find write compiler dependant source (use specyfic extensions etc) as a sin but Im okay with that]


alright, i think the assurance should be made by c standard because it is so usable, but afaik it now sadly depends on the linker

Well, the C language has assured this will not work since about 1970. Resolving symbols and other issues regarding separate compilation units has never been a part of the language, and always a part of the system object linker.

If you really want to ensure the starting address of arbitrary constant data sequences in memory will be the same across all compilation units, you're going to need to use language extensions to explicitly put the constants into named sections (__attribute__((section,"my_enum"))), then use a linker script to assign those sections to a fixed base address. You will need to use the named constant addess everywhere for comparison (as in if(color==g_red).

Or, you could stick to using strcmp() and move on.

Stephen M. Webb
Professional Free Software Developer


alright, i think the assurance should be made by c standard because it is so usable, but afaik it now sadly depends on the linker

Well, the C language has assured this will not work since about 1970. Resolving symbols and other issues regarding separate compilation units has never been a part of the language, and always a part of the system object linker.

If you really want to ensure the starting address of arbitrary constant data sequences in memory will be the same across all compilation units, you're going to need to use language extensions to explicitly put the constants into named sections (__attribute__((section,"my_enum"))), then use a linker script to assign those sections to a fixed base address. You will need to use the named constant addess everywhere for comparison (as in if(color==g_red).

Or, you could stick to using strcmp() and move on.

I do not understant why

"-fmerge-constants

Attempt to merge identical constants (string constants and floating-point constants) across compilation units."

does not work- i understand it that it should work, this is probably easy by linker to realloc all multiplicated string literal instances into one place in the data (constants probably) section - so why this option is not working, and what "attempt" mean?

(i would like to write a improved c dialect compiler and linker myself but is a matter of years, so by now i would like to prefer it would just be working here in mingw)

((customized strcmp is probably the way to consider if primary way would be not working but its sad))

For my guess at why -fmerge-constants may not be working for you: are you sure it's being properly passed as a linker setting in your build system? Post the command to link your program together.

But I agree with others that this is a bad idea. If you need an enum, use an enum. It's faster, safer, and clearer. Here's a few reasons why:

1)You shouldn't do at runtime what you can do at compile time. An enum is a compile time constant. A pointer is a runtime constant.

2)Char * may confuse your compiler because it could alias with anything. That means it can't reorder accesses to the char pointer with other operations involving pointers. Since reordering is crucial to optimisation, this can be a major blow in non obvious ways.

3)Is it "turquose" or "turquoise" or what? If you misspell a word, it's a different value, and there's nothing to catch your mistake. Likewise, is cyan a different color or not?

4)The use of standard patterns, like enums, makes your code easier to read by others and by you in the future. Being clever is a bad thing for this.

5)If the value is not a compile time literal, your method will check it against all constant pointers and always fail to match. That's an overhead for nothing. And there's no compile time check to guarantee the pointer passes is a literal. In contrast, an enum can match with a runtime integer, or optionally not try to match a runtime integer because it's a different type.

6)On a 64 bit system, pointers are 64 bits. But an enum can have a smaller memory and register footprint. This can also mean that more function parameters are passed by register.

7)You're relying on behaviour not guaranteed by the standard. That makes your code not portable and technically not C.

For my guess at why -fmerge-constants may not be working for you: are you sure it's being properly passed as a linker setting in your build system? Post the command to link your program together.

I used this way

c:\mingw\bin\g++ -O3 module1.c -c -fno-rtti -fno-exceptions -fmerge-constants -fmerge-all-constants

to compile modules (only change the name of modules here

and

c:\mingw\bin\g++ -O3 -Wl,--subsystem,windows -w module1.o module2.o module3.o -lgdi32 -s -fno-rtti -fno-exceptions -o program.exe -fmerge-constants -fmerge-all-constants
to link them
i found some example by some man (in SO) when searching google about this merging (no need to write it yourself)

// s.c
#include <stdio.h>

void f();

int main() {
printf( "%p\n", "foo" );
printf( "%p\n", "foo" );
f();
}

// s2.c
#include <stdio.h>

void f() {
printf( "%p\n", "foo" );
printf( "%p\n", "foo" );
}
when compiled as:
gcc s.c s2.c
produces:
00403024
00403024
0040302C
0040302C

but didnt find if it is possible to merge.. this example i was not yet tried, will check this rught now

edit: tested

sadly
with
c:\mingw\bin\gcc test.c test2.c -fmerge-constants -fmerge-all-constants
still got
00403024
00403024
0040302C
0040302C

This topic is closed to new replies.

Advertisement