Jump to content
  • Advertisement
Sign in to follow this  
lawnjelly

C++ 32bit to 64bit conversion of file fixup code

This topic is 366 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm converting a load of c++ code from 32 bit to 64 bit, and have run up to the predictable snag of fixup (relocation) pointers in binary files.

Essentially there are a bunch of pointers in a binary file, but when saved on disk they are relative to the start of the file. Then on loading, the pointers are 'fixed up' by adding the address in memory of the start of the file to the offset, to give an absolute pointer which can be resaved in the memory location, and used at runtime as normal pointers.

This is great but has so far been relying on the offset and pointer being 32 bit. The files are unlikely to be anywhere near 4 gigs so the offsets don't *need* to be 64 bit.

My question is what would be best (or rather what do most of you guys do) for this situation? One particular quirk is that the code needs to compile and run fine as 32 bit and as 64 bit as it needs to run on both classes of device, and the binary files must be the same.

  1. The most obvious solution is to store all the offsets / pointers in the binary file as 64 bit. This would mean re-exporting all the binary files, but this is doable (even if somewhat of a pain). This would simplify things for 64 bit version, and require only slight modification for 32 bit. The downside is the file sizes / size in memory would be bigger + any cache implications.
  2. Keep the pointers as 32 bit offsets and do the pointer addition on the fly as the parts of the data need to be accessed. The files are kept the same and the only cost is the extra pointer arithmetic at runtime. I have a vague memory of seeing a presentation by a guy who did such relocation on the fly and found there was very little runtime cost.

There also appears to me the question, even with 64 bit pointers, are they ever going to be more than a 32 bit value if the program is using a lot less than 4 gigs? I'm assuming yes as the address space may be past the first 4 gigs, and all the virtual memory address space / paging / randomization that goes on, but I just thought I'd check that assumption, as I'm not well versed on the low level details.

Edited by lawnjelly
Missed out 'best' in sentence :)

Share this post


Link to post
Share on other sites
Advertisement

Imho, option 1 would be useful if you only need to target 64 bit, ie you'd do the same trick that you already did, but in the 64 bit world. In this case, option 1 breaks the 32 bit version as much as the current version is broken in 64 bit.

When you resize a pointer, all data behind it shifts. That means pointers pointing into that data must adapt to that shift too. With 2 platforms, you're going to have that problem in one of them, if you use native pointers in the data.

 

So option 2 seems much more useful to me. I am not sure how you need an offset in the file that is longer than 32bit. Surely the pointers only point within the data itself, right? I mean, pointing outside the block that you load imply the relative position of the block and the thing it points to, stays the same upon loading, which isn't really the case if you allocate data separately.

You can easily test this, load the file, and check that all offsets are between 0 and the size of the data file.

Share this post


Link to post
Share on other sites

Yes I admit, given the need to keep support for 32 bit, I'm inclined towards option 2. As you say none of the offsets require being more than 32 bit as they are relative addresses within the file.

Rather than keeping the pointers as offsets in the 32 bit version, maybe I can make the fixup routine a 'no op' in the 64 bit version, and access the pointers through an accessor function that simply returns the pointer in the 32 bit version, and does the offset + start calculation in the 64 bit version.

Share this post


Link to post
Share on other sites

I use option #1 and #2 in different situations :)

If re-exporting is a pain, fix your build pipeline :D

I have multiple template types for #2. Storing offsets from the start of the file is actually harder to deal with at runtime, because you can't just use the variable like a pointer, as "dereferencing" requires you to know the address of the beginning of the file.

What I usually do is to store offsets from the current file-cursor (offsets from the "fake pointer" field itself), which is simpler to use at runtime.

e.g. For offsets-from-cursor, I use this template which has overloaded operators to make it look just like a regular pointer at runtime.

template<class T, class Y=s32> struct Offset
{
	const T* NullablePtr() const { return offset ? Ptr() : 0; }
		  T* NullablePtr()       { return offset ? Ptr() : 0; }
	const T* Ptr() const { return (T*)(((u8*)&offset) + offset); }
		  T* Ptr()       { return (T*)(((u8*)&offset) + offset); }
	const T* operator->() const { return Ptr(); }
		  T* operator->()       { return Ptr(); }
	const T& operator *() const { return *Ptr(); }
		  T& operator *()       { return *Ptr(); }
	static uint DataSize() { return sizeof(T); }
	bool operator!() const { return !offset; }
	Offset& operator=( void* ptr ) { offset = ptr ? (Y)((u8*)ptr - (u8*)&offset) : 0; return *this; }

	Y offset;
};

Or this one for offsets-from-beginning-of-file, which isn't compatible with operator overloading:

template<class T, class Y=u32> struct Address
{
	const T* Ptr(const void* base) const { return (T*)(((u8*)base) + address); }
		  T* Ptr(const void* base)       { return (T*)(((u8*)base) + address); }
	uint DataSize() const { return sizeof(T); }

	Y address;
};

And this is probably over-complicated, but it works for option #1:


template<class T> struct Pad32to64
{
	union{
		T   data;
		u64 pad;
	};
	operator       T&()       { return data; }
	operator const T&() const { return data; }
	T operator -> () { return data; }
	const T operator -> () const { return data; }
	bool operator!() const { return !data; }
	operator bool() const { return !!data; }
	Pad32to64<T>& operator=( const T& o ) { data = o; return *this; }
};
template<int size, class T> struct Select64Wrapper {};
template<class T> struct Select64Wrapper<4,T> { typedef Pad32to64<T> Type; eiSTATIC_ASSERT(sizeof(Type)==8); };
template<class T> struct Select64Wrapper<8,T> { typedef           T  Type; eiSTATIC_ASSERT(sizeof(Type)==8); };
template<class T> struct PadTo64
{
	typedef typename Select64Wrapper<sizeof(T), T>::Type Type;
};

struct Test {
  PadTo64<int*>::Type myPointer;
};

 

Share this post


Link to post
Share on other sites

What's important, from a design point of view, is making the whole system (file format, writer, reader) oblivious to pointer size. You shouldn't have the 32 bit version and the 64 bit version, but a single implementation that compiles correctly and behaves identically on all platforms.
Offsets are measured in bytes and represented as 32 bits in the file because the file is limited to 2 or 4 GB and file content is unaligned (two assumptions that in some cases could be changed to allow smaller offsets), not because pointers are 32 bits.
Pointer arithmetic in code needs to convert a 32 bit integer, (i.e. int32_t) to a ptrdiff_t and vice versa, at worst using type traits from the standard library (numeric_limits<ptrdiff_t> from <limits>) to check for overflow.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!