relocation unknown

Started by
2 comments, last by fir 9 years, 9 months ago

Im trying to understand relocation concepts but it is not to much

clearly described, maybe someone would like to clear it a bit

would number questions for clarity

when i got some assembly procedure it seem to me that often it

does not need a 'realocation fixups' (if i correctly understand what they

mean by this)

(1)- branching is usually relative so its reallocable, acces to local variables are

also relative so such kind of procedure do not need realocation fixup table

- am i right?

(2)I suspect (becouse im not sure as to this) that code

1) that uses calls 2) that references to global data - need fixups

as i suspect calls are not relative, also global data acces is not relative

- am i right here?

(3) when wathing some disasembly of .o .obj files etc and stuf i never see

relocation tables listed - Do most such of this object files has such fixup tables build-in or no? is there a way of listing them or something?

(4) if i understand correctly if some .obj module references 17 symbols

(I think thay may be both external or internal, (say 17 = for example 7 external function, 7 internal functions, 2 internal static data, 1 external static data ) it would be easiest to define 17 pointers and do not do a rewritting of each move that references do data and each call that calls the function but only fill up those pointers at link time and do indirectional (thru the pointer) addressings (?)

but loaders prefer to physically rewrite each immediate reference for efficiency reasons?

edit

specifically i am not sure if i understand right the shape of such fixup table, if i understand it right it should be like this

fixup table:

printf

+45

+235

+567

+597

+1120

min

+672

table

+527

+1230

I mean in the byte .code+45 (and 4 other places) are calls to the printf - so linker takes real adres of printf say it is 0x20004567 and puts that into code[45]=0x20004567, code[235]=0x20004567, ...

though some things are not clear and i would like to understood it in details, for example i do not understand if this revriting is done by static linker or it is done by loader

it seems to me that linker who van put many modules into one big ones could resolve all this adresses if he could known the base adresses of .code and .data - do the linker knows this? (If no it cannot resolve it till the load time when .code base adres and .data base adress would be provided

not sure how the things look like here, I heard that dlls do not know the base adress till the load time but what with exes? Are those fixed adresses? both of them? (make both fixed would help but where to place .data?)

Hope someone understand what i got on mind and could clarify a bit

Advertisement

relocation:

Put something to another location. If something makes assumptions on the position where it is put to, and should be moved these position dependent things must be moved somehow.

I think the name do say much if not all about relocations.

Whenever you have absolut access you need some kind of relocation. As long as you use relative access you need no relocation. This is valid for code and data as well.

Read some articles about Position Independent Code (PIC) that is generated with the -fPIC option of gnu compilers. Its explained there in detail.

Relocation needs write access to the code segment that is not granted to the application loader. This is why you rarely find relocations in shared libraries. In older OS like DOS and TOS the relocation is done on every application at loadtime, because they do not have a memory management for that.

(1)- branching is usually relative so its reallocable, acces to local variables are also relative so such kind of procedure do not need realocation fixup table

Right, this is also known as position independent code (PIC). Not all CPUs support PIC (they do not have a register-indexed memory read instruction), but the ones targeted by Microsoft do. Some CPUs support only PIC. It's a wide world and you can compile C code to almost anything in it.

PIC requires the dedicated use of one or more registers. The old Macintosh binary format (pre-OS X) dedicated the 68k's a4 for locals and a5 for globals. The Intel architecture has a critical shortage of registers, so compilers tend to not use PIC for globals and locals use the already-dedicated SP register.

(2) I suspect (becouse im not sure as to this) that code 1) that uses calls 2) that references to global data - need fixups
as i suspect calls are not relative, also global data acces is not relative

Yes. Well, there's a bit of confusion here. External references that are not relative and are going to need some kind of resolution before runtime. It's possible to have non-external globals that are not relative, and can use absolute addresses. It's a little more complicated than that if you're doing partial linking (eg. separate compilation).

(3) when wathing some disasembly of .o .obj files etc and stuf i never see relocation tables listed - Do most such of this object files has such fixup tables build-in or no? is there a way of listing them or something?

Depending on your tool, you may need to request the relocation tables be dumped explicitly.

If you were on Linux using the ELF format, running 'readelf -aW' on a .o, .so, or binary executable would reveal much. Pipe it through a pager.

(4) if i understand correctly if some .obj module references 17 symbols

(I think thay may be both external or internal, (say 17 = for example 7 external function, 7 internal functions, 2 internal static data, 1 external static data ) it would be easiest to define 17 pointers and do not do a rewritting of each move that references do data and each call that calls the function but only fill up those pointers at link time and do indirectional (thru the pointer) addressings (?)

but loaders prefer to physically rewrite each immediate reference for efficiency reasons?

The smaller the relocation table at load time, the faster loads will be.

Your static linker will do its best to resolve symbols at static link time. If it can't, the symbol goes into the relocation table for resolution at load time. Depending on settings, the symbols get resolved immediately or when needed (lazy loading).

Different binary standards resolve symbols in different ways. An ELF file (eg. Linux) has a global lookup table that gets loaded and patched as required. A COFF file (eg. Windows) has a jump table that gets statically linked and backpatched at load time (the .LIB file for a .DLL). A MACH-O file (eg. Mac OS X) behaves much like an ELF file, but in an incompatible way.

Some required reading for someone trying to understand this is the seminal paper by Ulrich Drepper on shared libraries. It's a little bit Linux-specific but many of the concepts can be generalized, and I think it's just the sort of thing you might be looking for. If not, it's still an interesting read.

Stephen M. Webb
Professional Free Software Developer

The smaller the relocation table at load time, the faster loads will be.


Your static linker will do its best to resolve symbols at static link time. If it can't, the symbol goes into the relocation table for resolution at load time. Depending on settings, the symbols get resolved immediately or when needed (lazy loading).

Different binary standards resolve symbols in different ways. An ELF file (eg. Linux) has a global lookup table that gets loaded and patched as required. A COFF file (eg. Windows) has a jump table that gets statically linked and backpatched at load time (the .LIB file for a .DLL). A MACH-O file (eg. Mac OS X) behaves much like an ELF file, but in an incompatible way.

alright I seem to understand it (mostly) right, will do some more reading yet,

interesting is this about jumptable (as an opposite to fixuptable) that coff

use - interesting is if they dont need no more fixup data, they probably need it still bicouse if no they would need to patch all the references to this jumptable pointers and this is easily lost in the 'troubles of disassembly'

the one mysterious thing is yet those .code and .data start adresses - if this is known staticcaly then after generating exe would not need not a fixups at all - (I remember vaguelly that .code adres is probably something like 0x004000.. there was some 4 in it.. so probably this can be absolutly hardcoded.. but dont know about .data sections - it seem to me be very important to get know this... maybe i will find something about this..

Im trying to wrote some basic assembler.. first step i need to do is to do some code that is takin a raw binary contents of assembly procedure and saves this as an object file - but im easily confused but a tons of complexity of this, sections, relocations, endians, you know (im getting easily confused with this stuff)

if you seem to have some knowledge about this formats have maybe some orientation which one could be easiest to generate?

I dont need to generate it extensible and right in general way - need some quick hack to get this as fast and as easy as it could - so i can skipp some sectuions etc

maybe yet more hints on this that could help me to do that? which format to use what skip etc (some sources are avaliable over the net but im getting yet more easily confused by reading sources than specyfications)

(I know that not much many people is doing such stuff to know the details but maybe if some has some (on-topic, and to-the-pont) hints it would be welcome)

This topic is closed to new replies.

Advertisement