deinstancing c-strings

Started by
47 comments, last by tanzanite7 10 years, 2 months ago

But I agree with others that this is a bad idea. If you need an enum, use an enum. It's faster, safer, and clearer. Here's a few reasons why:

1)You shouldn't do at runtime what you can do at compile time. An enum is a compile time constant. A pointer is a runtime constant.

2)Char * may confuse your compiler because it could alias with anything. That means it can't reorder accesses to the char pointer with other operations involving pointers. Since reordering is crucial to optimisation, this can be a major blow in non obvious ways.

3)Is it "turquose" or "turquoise" or what? If you misspell a word, it's a different value, and there's nothing to catch your mistake. Likewise, is cyan a different color or not?

4)The use of standard patterns, like enums, makes your code easier to read by others and by you in the future. Being clever is a bad thing for this.

5)If the value is not a compile time literal, your method will check it against all constant pointers and always fail to match. That's an overhead for nothing. And there's no compile time check to guarantee the pointer passes is a literal. In contrast, an enum can match with a runtime integer, or optionally not try to match a runtime integer because it's a different type.

6)On a 64 bit system, pointers are 64 bits. But an enum can have a smaller memory and register footprint. This can also mean that more function parameters are passed by register.

7)You're relying on behaviour not guaranteed by the standard. That makes your code not portable and technically not C.

in general i agree (this is, the worst thing here, is this, that there could be a compiler where i could not switch this behavior on :c (like still here) -

other are some tradeofs i could easily pay for personally (espescially that it really should be hopefuly only very slightly slower than enum ) ..,

As to 'enum is compile time and string literal pointer is not compile time' im not sure if this is true,- you use enum you have 117 here you got 00403024 both are 'runtime static' numbers

As to second point youre right (thats good point) but im too not sure as to details - if compiler would be easy enough to detect that this char* points to const section it should (probably) not cumber him

Advertisement

As to 'enum is compile time and string literal pointer is not compile time' im not sure if this is true,- you use enum you have 117 here you got 00403024 both are 'runtime static' numbers

What you seem to be missing is that the reason you are finding this so difficult is that there is a world of difference between an enum being equal to 117 (a standards-defined behaviour) and the value of a pointer, which is an implementation detail.

Nobody minds if you want to write compiler-dependant, non-standard code that can break in the future, that's entirely up to you. But continuing to try to argue that it "should" be possible and "should" be implemented is just wrong.


As to 'enum is compile time and string literal pointer is not compile time' im not sure if this is true,- you use enum you have 117 here you got 00403024 both are 'runtime static' numbers

What you seem to be missing is that the reason you are finding this so difficult is that there is a world of difference between an enum being equal to 117 (a standards-defined behaviour) and the value of a pointer, which is an implementation detail.

Nobody minds if you want to write compiler-dependant, non-standard code that can break in the future, that's entirely up to you. But continuing to try to argue that it "should" be possible and "should" be implemented is just wrong.

you say value of a pointer is implementation detail (i do not quite understand this statement) but you use them in some way - so you can compare them for equality here too

(maybe even if you will get a rule that they should be stored in

string literals secton in alphabetic ordered way you could even

legally compare them for less or more relation ;/ ) (thats a bit

joke probably it would not be good)

- so this is not a good argument imo, but I am not sure if I want to convince somebody and take such kind of conversation on my back ;/

i need some advice why this merge strings not working in my mingw

It is absolutely true enums get better optimizations than pointers at compile time. Enums allow optimizations that are literally impossible with pointers. For example, simple case of color channel by enum versus string constants (strikethrough is part optimized):

enum {RED, GREEN, BLUE}

if (enum == RED) do something

else if (enum == GREEN) do something

else if (enum == BLUE) do something

It is very easy to tell enum has to be BLUE if it is not GREEN or RED

if (str == "red) do something

else if (str == "green") do something

else if (str == "blue") do something

It is impossible to know if str is restricting itself to only those 3 possible values. There are 4 billion possible values for str on 32-bit, and 18 quintillion possible values on 64-bit. The optimizer is not even going to try to track values for a pointer given how impossible a task that would be.

Even if you could do this and there were no performance issues, you're giving up compile-time checking for runtime checking. If you misspell a string somewhere in your code, you won't find out about it until that code is executed (hope you have a way to get 100% code coverage) - and even then the result might just be a subtle bug.

If you're using enums (or string constants), then if you mistype the name somewhere you'll get a compile error.

It sounds like you prefer to be a bit lazy writing your code ("simplicity"), even if it results in less maintainable, less robust code. That's never a good trade-off.


I do not understant why

"-fmerge-constants

Attempt to merge identical constants (string constants and floating-point constants) across compilation units."

does not work- i understand it that it should work, this is probably easy by linker to realloc all multiplicated string literal instances into one place in the data (constants probably) section - so why this option is not working, and what "attempt" mean?

An address in a symbol table is not a constant. That's why expecting the linker to merge constants will not merge the addreses.

As I said, you need to use a compiler extension to put your constant byte sequences in a particular segment. You then need to tell the linker explicitly where to base those byte sequences so they're starting address will be the same after final linking. There is no other simple way to force that because the language is not supposed to work that way so there is no built-in support. The linker, on the other hand, is designed to do those things. You just have to explicitly tell it what to do.

Stephen M. Webb
Professional Free Software Developer


I do not understant why

"-fmerge-constants

Attempt to merge identical constants (string constants and floating-point constants) across compilation units."

does not work- i understand it that it should work, this is probably easy by linker to realloc all multiplicated string literal instances into one place in the data (constants probably) section - so why this option is not working, and what "attempt" mean?

An address in a symbol table is not a constant. That's why expecting the linker to merge constants will not merge the addreses.

As I said, you need to use a compiler extension to put your constant byte sequences in a particular segment. You then need to tell the linker explicitly where to base those byte sequences so they're starting address will be the same after final linking. There is no other simple way to force that because the language is not supposed to work that way so there is no built-in support. The linker, on the other hand, is designed to do those things. You just have to explicitly tell it what to do.

as far as i understand this is not imposed by standard but is also not forbidden - i do not know why it is not merged -

this "cat" pointers indeed cannot be marged not because they are not constants but because we are talking about not merging pointers but the const literals they pointing to - this should be done probably

i was seen people doing avr code (when one need to save memory) were searching for this option to merge strings in different modules too

As to 'enum is compile time and string literal pointer is not compile time' im not sure if this is true,- you use enum you have 117 here you got 00403024 both are 'runtime static' numbers

As to second point youre right (thats good point) but im too not sure as to details - if compiler would be easy enough to detect that this char* points to const section it should (probably) not cumber him

An enum's value is computed when the enum is defined. A pointer's value cannot be known until the program is run, and will be different on each run of the program. So a compiler can use an immediate value for an enum variable, but must at best compute a pointer segment in a register. Likewise it can optimize based on the specific value of an enum, such as by using it in a jump table.

A compiler cannot easily detect that a char * points to a const section when passed as a function argument, because functions are optimized per translation unit, which means that non-static functions don't see all their uses. There is also link time optimisation, but it is not as thorough.

Furthermore, aliasing does not just cause problems for uses of the variable, but also all other variables that might otherwise be reordered across a read through that pointer. So you're making code that has nothing to do with this kind of enum slower. Depending on how pedantic or smart your compiler is set to be, it can mean that any pointer argument, global, or value derived from these can alias. This includes the ubiquitous this pointer.

A pointer's value cannot be known until the program is run,

Is this true? dont think so, i know some pointers are rebased or something by windows loader but in some way this pointer value is 'produced' in compile+link time.so i am not sure your suggestions are fully true here* (Some are right when using enum you could do indexing table of instead of the branch switch etc - but this is all not co crucial - it will be working roughly fast anyways imo and as I said i can pay a little for the comfort of using strings (as I said i was using it many (over two years ) and i was very pleased

* besides youre probably right with this optimization problems when passing char* (im not fully understand it but heard something - but anyway this is not a problem of 'my way' but a general problem relating to passing any pointer (or intersecting data idont remember) - if this general problem would be mended (and it is possible probably) it also will stop influence this special case

A pointer's value cannot be known until the program is run,

Is this true? dont think so, i know some pointers are rebased or something by windows loader but in some way this pointer value is 'produced' in compile+link time.so i am not sure your suggestions are fully true here (Some are right when using enum you could

do indexing table of instead of the branch switch etc - but this is

all not co crucial - it will be working roughly fast anyways imo

and as I said i can pay a little for the comfort of using strings

(as I said i was using it many (over two years ) and i was very

pleased

I don't know exactly how the global section is made known to instructions, but the location of global variables changes from run to run, so it must be stored in a register or memory. There is an association with the data, which is finalized during linking, meaning the offset of the pointer is computer then. That means that earlier optimisations, such as most function level optimisations run way prior to the value being known.

As for comfort, doing things the way you're used to when a better way exist means that you don't grow as a programmer. You're stuck with the bad habits of the previous decade, while hardware and compiler technology marches forward making your code more and more inefficient. Raw pointers and their abundant use are to be avoided when possible. (But copying data should be avoided too, so there's a trade-off usually)

This topic is closed to new replies.

Advertisement