deinstancing c-strings

Started by
47 comments, last by tanzanite7 10 years, 2 months ago

A pointer's value cannot be known until the program is run,


Is this true? dont think so, i know some pointers are rebased or something by windows loader but in some way this pointer value is 'produced' in compile+link time.so i am not sure your suggestions are fully true here*

Aaaand you proved your ignorance on architecture (not trying to offend you).

You're seeing pointers are equal to integers in an x86 machine, running probably on Windows or may be even Linux.
Pointers are NOT integers. They're pointers.
An architecture could store, use and load pointers in a special purpose register that cannot directly talk to integer- or general purpose registers.
Memory addresses could be layed out in a segmented memory model, or other model different than the flat model.
C & C++ standards account for that. They even account for architectures where a byte is composed of 61 bits and not 8 bits (an arch that hasn't been produced in decades btw.)

Hence, when you're saying "should be possible"... it is possible in the popular x86 arch running with a flat memory model. But it's probably not gonna be ever standard, because it will not work with radically different targets.
Advertisement

A pointer's value cannot be known until the program is run,

Is this true? dont think so, i know some pointers are rebased or something by windows loader but in some way this pointer value is 'produced' in compile+link time.so i am not sure your suggestions are fully true here (Some are right when using enum you could

do indexing table of instead of the branch switch etc - but this is

all not co crucial - it will be working roughly fast anyways imo

and as I said i can pay a little for the comfort of using strings

(as I said i was using it many (over two years ) and i was very

pleased

I don't know exactly how the global section is made known to instructions, but the location of global variables changes from run to run, so it must be stored in a register or memory. There is an association with the data, which is finalized during linking, meaning the offset of the pointer is computer then. That means that earlier optimisations, such as most function level optimisations run way prior to the value being known.

As for comfort, doing things the way you're used to when a better way exist means that you don't grow as a programmer. You're stuck with the bad habits of the previous decade, while hardware and compiler technology marches forward making your code more and more inefficient. Raw pointers and their abundant use are to be avoided when possible. (But copying data should be avoided too, so there's a trade-off usually)

I found myseld as an oldschool spirit c programmer

As a hobby I am interested in theme of some c language improvements (workin on this for years and I collected the long list of c improvements I thing should be applied to c - but this is really a long story)

if we incidentaly speaking about this trouble with aliasing

it should probably be mended if c would distinguish (had a

keyword etc) between "read pointers" (pointers for read) "write pointers" and "readwrite pointers" - if i could design my char* here as a "char read*" wouldnt it improve the optymizer enough?

(this is side question i am not to much accustomed with this argument aliasing problems, but i understand that this 'read pointers' way could mend that troubles)

A pointer's value cannot be known until the program is run,


Is this true? dont think so, i know some pointers are rebased or something by windows loader but in some way this pointer value is 'produced' in compile+link time.so i am not sure your suggestions are fully true here*

Aaaand you proved your ignorance on architecture (not trying to offend you).

You're seeing pointers are equal to integers in an x86 machine, running probably on Windows or may be even Linux.
Pointers are NOT integers. They're pointers.
An architecture could store, use and load pointers in a special purpose register that cannot directly talk to integer- or general purpose registers.
Memory addresses could be layed out in a segmented memory model, or other model different than the flat model.
C & C++ standards account for that. They even account for architectures where a byte is composed of 61 bits and not 8 bits (an arch that hasn't been produced in decades btw.)

Hence, when you're saying "should be possible"... it is possible in the popular x86 arch running with a flat memory model. But it's probably not gonna be ever standard, because it will not work with radically different targets.

I do not understand - Im not sure but some operations on the pointers are I think quaranteed by the standard (as comparing them for equality this is legal (probably, i am not heavy writer (del, reader) of the standards more focused on rationale behind them)

The main problem with pointers is the above discussed aliasing issues. Compilers can better optimize code when they are free to reorder it, but aliasing prevents that. Here's a talk on the problems aliasing broadly causes.

But my critique was more broad: comfort in an old technique is not a good reason to keep using it, if there are provably better alternatives.

if we incidentaly speaking about this trouble with aliasing

it should probably be mended if c would distinguish (had a

keyword etc) between "read pointers" (pointers for read) "write pointers" and "readwrite pointers" - if i could design my char* here as a "char read*" wouldnt it improve the optymizer enough?

(this is side question i am not to much accustomed with this argument aliasing problems, but i understand that this 'read pointers' way could mend that troubles)

No, that's not enough. Even if you added a stronger "const" to C, a compiler still would not be able to reorder writes around reads because any write to memory through a pointer or global could effect future reads through pointers or globals. C99 does add a restrict key word to help with aliasing, but that's not usable in your case, because in it you actually expect the pointers passed in to alias with global constants.

Everyone's explained why this is an evil hack, so it goes without saying that the code I'm posting here is evil, hacky code... But if you actually want it to be valid, such that it will work on other compilers, here's one way you'd do it with your own macro-keywords...
#define IMPLEMENT_ENUM(name) extern const char* const g_enum##name = #name;
#define DECLARE_ENUM(name) extern const char* const g_enum##name;
#define ENUM(name) ((void*)g_enum##name)
void SetColor(void* color)
{
  DECLARE_ENUM(red); // in each file OR function, before using ENUM(red)
  if( color == ENUM(red) )
    Stuff();
  else
    Assert(false);
}

//elsewhere
DECLARE_ENUM(red);
int main() {
  SetColor( ENUM(red) );
}

//in one file in the project - actually allocate the single unique 'red' variable
IMPLEMENT_ENUM(red);
Seeing that the pointed-to data is never actually used, you could implement this in many other ways, such as not even storing the strings:
#define IMPLEMENT_ENUM(name) extern const char g_enum##name = 0; //allocate a byte
#define DECLARE_ENUM(name) extern const char g_enum##name; //declare there is a named byte somewhere in the project
#define ENUM(name) ((void*)&g_enum##name) //use that byte's address as a unique identifier

Char * may confuse your compiler because it could alias with anything. That means it can't reorder accesses to the char pointer with other operations involving pointers. Since reordering is crucial to optimisation, this can be a major blow in non obvious ways.

Aliasing issues don't apply to fir's hack, because he never accesses the pointed-to data; he only uses the pointer's value itself as a unique value, never actually using it as a pointer.

Everyone's explained why this is an evil hack, so it goes without saying that the code I'm posting here is evil, hacky code... But if you actually want it to be valid, such that it will work on other compilers, here's one way you'd do it with your own macro-keywords...

#define IMPLEMENT_ENUM(name) const char* const g_enum##name = #name; void ForceGlobalScope##name(){};
#define DECLARE_ENUM(name) extern const char* const g_enum##name;
#define ENUM(name) ((void*)g_enum##name)
void SetColor(void* color)
{
  DECLARE_ENUM(red); // in each file OR function, before using ENUM(red)
  if( color == ENUM(red) )
    Stuff();
  else
    Assert(false);
}

//in one file in the project
IMPLEMENT_ENUM(red);

Why not:
#define IMPLEMENT_ENUM(name) extern const char* const g_enum##name = #name;
That forces global scope, without declaring another symbol.

You've been asking a lot of questions about what the compilers "should" do in certain circumstances, and what they "must" do. Similarly there have been questions about linkers and other compilation steps. Not just in this thread but in most of the others that were started recently.

Usually when a programmer gets to that point, it is good for them to have a detailed study of the language standard.

That is usually a great thing. It means you want to know how things really work versus how they are supposed to work, and want to understand the inner workings.

A surprisingly high percentage of programmers are content to read books ABOUT the language. They will read books that describe how to use the language, how to leverage it effectively. They will read books about optimizing the code. They will read books about making code portable. But few read the actual language standard.

The final draft of the C11 standard is located here (pdf). The final version has a different first page and removes the word "draft" and some annotations, but is otherwise the same. It is about 700 pages long, and it has all the answers to what they MUST do.

Pay attention phrases like "shall", "shall not", and "may".

Pay attention to "at least", "at most", and "exactly".

Pay attention to "unspecified", "undefined", and "implementation defined".

Usually people who think they know a language are in for major discoveries when they read the actual language standards.

You've been asking a lot of questions about what the compilers "should" do in certain circumstances, and what they "must" do. Similarly there have been questions about linkers and other compilation steps. Not just in this thread but in most of the others that were started recently.

Usually when a programmer gets to that point, it is good for them to have a detailed study of the language standard.

That is usually a great thing. It means you want to know how things really work versus how they are supposed to work, and want to understand the inner workings.

A surprisingly high percentage of programmers are content to read books ABOUT the language. They will read books that describe how to use the language, how to leverage it effectively. They will read books about optimizing the code. They will read books about making code portable. But few read the actual language standard.

The final draft of the C11 standard is located here (pdf). The final version has a different first page and removes the word "draft" and some annotations, but is otherwise the same. It is about 700 pages long, and it has all the answers to what they MUST do.

Pay attention phrases like "shall", "shall not", and "may".

Pay attention to "at least", "at most", and "exactly".

Pay attention to "unspecified", "undefined", and "implementation defined".

Usually people who think they know a language are in for major discoveries when they read the actual language standards.

I generally know c (though have some holes in this knowledge too)

As to standards as i said i am more interested in rationale behind them than in standards perself. [My vision how c (great language) should like, goes much far different than speaking here (but is the different story).. ]

Second question besides standards there are also compiler behaviors which there is the next more concrete step to know

what i can do in compiler (and because I personally are not agains writing compiler dependant code many undefines become defines to me)

Anyway does maybe someone know would there be some way to write message to some gcc deciding people where i could ask them to add the switch that would allow to merge different module string litereals into one at linking stage?

I think what he wants to do is to not be required to explicitly declare enums to be able to use them.

Would it be possible to use a macro or such to hash the string at compile time and pass hashes instead of pointers.

eg.
if (hashEnum == makeHashEnum("red")) {}

where hashEnum is an int of some size.

Collisions probably won't be a problem if you use 64 bits or something. Probably.

o3o

I think what he wants to do is to not be required to explicitly declare enums to be able to use them.

Would it be possible to use a macro or such to hash the string at compile time and pass hashes instead of pointers.

eg.
if (hashEnum == makeHashEnum("red")) {}

where hashEnum is an int of some size.

Collisions probably won't be a problem if you use 64 bits or something. Probably.

the base idea (I was writting here about 4 times already) was to have "ad-hoc" enums - you understand it right, some enum I could use without defining them

foo(x,y, @red);

foo(x,y, @green);

foo(x,y, @quick);

just to use free ad hoc values in any place and compare them
with another set of free ad hoc values in any other place (this is one
of contepts of my own)
(i got yet one other related, yhe thing i call temporarely as a eoc
"enum of congragation" this is some big standarized globally enum mapping something like unicode but for enums mapping ints for
meaning so you know tkat enumcode of 115 means 'lily' and 114 means 'rose' - but maybe it is also not so crucial and better ways is use pointers to description than such telegraphist code, but that was some oldschool (in some way good imo) concept there - but this was a side thing (i got not to much time to discuss it here) )

This topic is closed to new replies.

Advertisement