static symbols and linker

Started by
9 comments, last by frob 10 years, 2 months ago

If i do write in c source and compile this to .o

int x=1;

static int y=2;

static int f()

{

return 100;

}

int g()

{

return 200;

}

the symbols of y and f are internal - I know obj .o file has a some

table of exported symbols and offsets to binary inside, would just

the symbols y and f be deleted or maybe only marked as internal ?

(maybe someone would know if ofset to binary is also deleted there)

tnx for answer

(is the .o format (used by 32 bit mingw/gcc) hard to learn ? maybe

someone know some easy tutorial on this?)

Advertisement

No, they are not deleted from the object file. The linker needs them to be available if a function from the same compilation unit needs them. They are just marked as local and are thus not available for other compilation units. You can check that with the tool 'nm' (i guess it is also available in mingw):


$ nm static.o 
0000000000000000 t f
000000000000000b T g
0000000000000000 D x
0000000000000004 d y

This basically means that f and y are local to static.o (lower case t and d as symbol types), while g and x are accessible from other compilation units (upper case symbol types).

It might all be a bit different on Windows though, not sure what object format mingw uses, this is an example from Linux. Learning the format of object files is probably not a very useful task. First, you barely ever need that and second, there are libraries and tools available that can already do that for you.

So since we're talking about language-lawyer stuff, we go to the standards.

I don't have the C11 standard, but I do have the C99. This is what is actually required:

An identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage. There are three kinds of linkage: external, internal, and none.

...

If the declaration of a file scope identifier for an object or a function contains the storage class specifier static, the identifier has internal linkage.

Everywhere in the standard after that, there are references about if something "has linkage" or if it has "external or internal linkage", and prohibitions on things that have no linkage. There are no specific requirements about what ends up in object files. Internal linkage must be accessible inside that translation unit (=object file) and different translation units can have their own names with internal linkage.

That means you can make "static void foo(void){}" in as many object files as you want, but the linker won't try to bind them together. If they had external linkage it would complain because you can only have one of them globally.

So now back to your questions.

1. I know obj .o file has a some table of exported symbols and offsets to binary inside, would just the symbols y and f be deleted or maybe only marked as internal ? (maybe someone would know if offset to binary is also deleted there)

2. is the .o format (used by 32 bit mingw/gcc) hard to learn ? maybe someone know some easy tutorial on this?

1. They can include anything they want in the .o file.

There are some things they are required to support in a translation unit, but they can include all the extra stuff that they want.

They are not required to track anything with internal linkage.

They might include them in the file. They might not include them in the file. This might be adjustable based on optimizer settings, debug information, and other compiler options. As the example from rnlf shows above, his compiler did include the information inside the object file even though it was not required. It is likely that he could change his compiler options and get them to vanish from the file. It is not required to be there.

2. It uses the COFF format for object files, the PE format for Windows executables.

The file formats have been used for decades and are well documented. Microsoft documented both PE and COFF together here. It is updated every few years with new processor IDs, but the data structures and overall format is the same as it was in the 1980s. Google can find many other sources including books to learn from. I don't know if you consider it hard to learn or not.

So...

The big thing to know is that the language specifies minimum behavior. Things must be at least a certain size. Things must have at least this visibility. Algorithms must perform at least this fast. Elements must contain at least this data.

Compilers are allowed to go beyond that functionality if they want. Compiler writers add their own bonus features and extensions all the time. They routinely add debugging information, profiling information, and assorted metadata that is far beyond what the standard requires.

If you are trying to hide object names, making them static and marking functions as inline can potentially help. You would still need to go through the object file to ensure there aren't any direct references to the names. Turning on heavy optimizations, disabling debug information, and using symbol-stripping tools might help obliterate named references.

for me understanding coff would be usefull - also understanding what exactly linkers do (i heard they are very small programs doing very simple things but i dont know exactly what they do)

Object files with unresolved references include information about the name of each unresolved reference and the places in the executable code where they need to be fixed up. The object code has dummy values which the linker fills in with the correct address when it knows the address of all the symbols.

It's more complicated if the object code can be relocated to an arbitrary address (basically, the "linked" code still needs the fixup table then) but you get the idea.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Object files with unresolved references include information about the name of each unresolved reference and the places in the executable code where they need to be fixed up. The object code has dummy values which the linker fills in with the correct address when it knows the address of all the symbols.

It's more complicated if the object code can be relocated to an arbitrary address (basically, the "linked" code still needs the fixup table then) but you get the idea.

could you maybe say what is fixed exactly, how it looks like? lets say

there is external symbol f and corresponding function body in binary

call unresolved symbol f

in the object code it is stored as:

call ????

where ???? could be anything.

f is in the unresolved symbol table. There is an entry for the location of ???? in the machine code data which needs fixing up.

Linker knows about all the function addresses from every module (otherwise it is a link error - unresolved external symbol). It replaces ???? with the actual address of f in the binary when linked.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

call unresolved symbol f

in the object code it is stored as:

call ????

where ???? could be anything.

f is in the unresolved symbol table. There is an entry for the location of ???? in the machine code data which needs fixing up.

Linker knows about all the function addresses from every module (otherwise it is a link error - unresolved external symbol). It replaces ???? with the actual address of f in the binary when linked.

ah, i see - you mean when i got some module and use external symbols here (for example printfs or winapi calls) - linker fills this

calls with proper adresses, and thats all? very little work

Yes, that is their main job. The linker gets a list of functions that are missing, looks up their final address, and fills in the holes. Then it stuffs everything together into one file, and calls the result an EXE.


Linkers can do more things than their main job. They can perform optimizations at a whole-program level. They are responsible for coordinating debug information, either consolidating it for use or discarding it. They can strip out functions that nobody uses. They can apply digital signatures. Those are just bonus features.

Yes, that is their main job. The linker gets a list of functions that are missing, looks up their final address, and fills in the holes. Then it stuffs everything together into one file, and calls the result an EXE.


Linkers can do more things than their main job. They can perform optimizations at a whole-program level. They are responsible for coordinating debug information, either consolidating it for use or discarding it. They can strip out functions that nobody uses. They can apply digital signatures. Those are just bonus features.

and what with inner module calls and jumps, and also static in-module data adresses - are the in module calls and jumps made by relative jumps or by 'static' adres calls if they done by static adresses and in both modules they begin adressing from 0 (in some 0-based adress space) when linking two modules we got conflict of adress spaces,

what with that (i heard vaguelly (but im not sure) that linker need to realloc all such kind of in-module adressing, it would be probably all non-relative jumps/calls and all not relative adressing - and this would be many of them - maybe this set of references to re-fix is the same set as set of internal symbols - because usualy when you got a call this is a symbol of function name and when you got static adressing this is a

symbol for static data too), but im not sure how this works [when i was reading something about it i do not quite understood it] could someone a bit elaborate on this?

This topic is closed to new replies.

Advertisement