Reading from file to structs

Started by
21 comments, last by Hodgman 12 years, 6 months ago

Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object? That in itself is undefined...
[/quote]
The fake-array-at-end-of-struct is a C idiom, such objects would be individually allocated with malloc(sizeof(someStructure) + length_of(string)).

I haven't seen anyone try to line successive structures in contiguous memory like that before. As you rightly point out - this would cause alignment problems.
Advertisement

[quote name='Brother Bob' timestamp='1318252870' post='4871057']
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?

Actually, it's a perfectly normal C idiom that was formalized in C99 with flexible array members. Even the Windows headers use it. Ex: the SYMBOL_INFO structure in dbghelp.h. I'm not personally a big fan of this technique, but it's not uncommon.
[/quote]
new, class, access specifiers and std::vector eliminates C as an excuse though.
I'm not saying it's great C++ code, but the struct hack is something you can reasonably expect your C++ compiler to handle without incident seeing that it is a C idiom that occurs in headers that C++ compilers are regularly expected to digest in APIs commonly used from C++ code. On non-x86 platforms alignment could be a deal breaker for this particular code, but the DirectX structures pretty much lock it in as it is. I wouldn't use it myself, but I would expect it to work.

new, class, access specifiers and std::vector eliminates C as an excuse though.


Except when it is.

If you are serialising and deserialising data then this method is considerably faster than any standard "C++ Way" of doing things as it allows for fast block loading of data with variable length members. Compressing XML to a binary format is one use, as it the serialisation of assets.

Certainly with loading the ability to simply dump something into memory and then fix up pointers/counts internally is going to be faster than loading a bit, reserving some memory 'somewhere else' (string/vector), loading some more into that, returning to your last bit, loading some more and so on. Lower fragmentation, better on the cache and centralised data which is easier to inspect in a memory dump are all things which can be useful.

Would I reach for this as my first solution? Probably not, but I would certainly consider it if the access pattern I was expected mean that this was the optimial solution to the problem.

Hello.

I tried different ways to load file contents in to structs but have some trouble. Simply because the structs contain strings, which makes them variable in size. The file is supposed to hold several instanses which each needs to be loaded in to a struct.

I am thinking that the file header contains the number of structs in the file. The struct header can contain the length of the string.


Would you be so kind to show me how I can read a chunk of data from a file and cast that in to a struct that has a string in it? (btw are std::strings bad for this?)


struct LOADSPRITEOBJECT
{
int NameLength;
std::string name;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
};



I would order that differently if I were you as you are very likely to waste space with the string in front of an aligned data type. So start with you aligned data types or at least with no dynamic length data types which you know will put you on a 16 byte boundary. This will safe you both file size and runtime memory.


the D3DXVECTOR is the aligned data type btw.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

I got this working:


//Load file and copy contents to struct
LOADSPRITEOBJECT LoadSpriteStruct;
std::ifstream InFile("spritedatanew.txt", std::ios::in | std::ios::binary);
if(!InFile) {
MessageBox(NULL, L"Unable to open spritedatanew.txt", L"Error", MB_OK);
PostQuitMessage(0);
return;
}

InFile.read((char *)&LoadSpriteStruct.structtype, sizeof(int));
InFile.read((char *)&LoadSpriteStruct.length, sizeof(int));
char* buffer = new char[LoadSpriteStruct.length+1];
InFile.read( buffer, LoadSpriteStruct.length+1);
buffer[LoadSpriteStruct.length] = '\0';
LoadSpriteStruct.name = buffer;
delete [] buffer;
InFile.read((char *)&LoadSpriteStruct.Color, sizeof(D3DCOLOR) );
InFile.read((char *)&LoadSpriteStruct.OffsetPosition, sizeof(D3DXVECTOR3) );
InFile.read((char *)&LoadSpriteStruct.visible, sizeof(bool) );
InFile.close();

MakeSprite(LoadSpriteStruct);



struct LOADSPRITEOBJECT
{
int structtype;
int length;
std::string name;
D3DCOLOR Color;
D3DXVECTOR3 OffsetPosition;
bool visible;
};
One detail: std::string::operator=() can throw an exception. If it does, you'll leak the buffer you allocated.
The usual way of using in-place memory offsets:
[font="Courier New"]1) int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
2) object->name = ((char*)&object->name) + int(object->name);//did you mean offset?[/font]
1)Offset is the distance from the start of the 'name' field to the end of the structure. The string data itself is written to the file after the structure, so the offset tells you how far forward in the file to jump in order to find the string.
2) Upon deserialisation, 'name' actually contains the above offset value, not a pointer. The offset is relative to the address of the 'name' field, so the address of 'name' is added to the integer value of 'name', resulting in a pointer to the string data.
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?
FWIW, I would also recommend that category of suggestions -- my above example is a similar technique. In my opinion, these in-place memory techniques are far superior to "C++ style" serialisation techniques. For example, in our game engine, deserialising a file that contains hundreds of data structures is a [font="Courier New"]nop[/font]; once the file is read from disk into memory by the OS, it's already usable without any parsing or decoding of it's contents (instead of the above pointer patching on-load, I'd do it on-use by using offset templates instead of pointers in my structures)template<class T> struct Offset {
T* Ptr() { return (T*)( ((char*)this) + offset ); }
T* operator->() { return Ptr(); }
T& operator*() { return *Ptr(); }
private: u32 offset;
};
I think you need to ask yourself how you are going to use this.

If you need random access to structs within the file, use a fixed char array, yes it will waste a bit of space but you can access any struct by its position which is much faster than having to go loop through possible all structs to find the one you are looking for (log(n)).

If you always going to read/write all structs at once (no random access) you can have a null terminated char array or string of the exact size of the string.
I could write a sophisticated header type which says where in the file different structs are in that case, like a table of contents in that case. But for now it will be a file which describes the whole GUI so everything is loaded. For a different GUI another file is loaded.

I will probably go with this:

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);

This topic is closed to new replies.

Advertisement