Sign in to follow this  
Tispe

Reading from file to structs

Recommended Posts

Hello.

I tried different ways to load file contents in to structs but have some trouble. Simply because the structs contain strings, which makes them variable in size. The file is supposed to hold several instanses which each needs to be loaded in to a struct.

I am thinking that the file header contains the number of structs in the file. The struct header can contain the length of the string.


Would you be so kind to show me how I can read a chunk of data from a file and cast that in to a struct that has a string in it? (btw are std::strings bad for this?)

[code]
LOADSPRITEOBJECT LoadSpriteStruct;
std::ifstream inbal("spritedata.txt", std::ios::in | std::ios::binary);
if(!inbal) {
MessageBox(NULL, L"Unable to open", L"Error", MB_OK);
PostQuitMessage(0);
return;
}

inbal.read((char *) &LoadSpriteStruct, sizeof(LoadSpriteStruct));
//inbal.close();

MakeSprite(LoadSpriteStruct); //init sprite and push in to a vector
[/code]

[code]
void USERINTERFACEBOX::MakeSprite(LOADSPRITEOBJECT &instance)
{
SPRITEOBJECT Sprite;

//Set read data
Sprite.Color = LoadSpriteStruct.Color;
Sprite.name = LoadSpriteStruct.name;
Sprite.OffsetPosition = LoadSpriteStruct.OffsetPosition;
Sprite.visible = LoadSpriteStruct.visible;

//Calculate position relative parent box
Sprite.position = BoxPosition + Sprite.OffsetPosition;

//Init Sprite and put in array
SpriteHandler.LoadSprite(Sprite);
Sprites.push_back(Sprite);
[/code]

[code]
struct LOADSPRITEOBJECT
{
int NameLength;
std::string name;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
};
[/code]

Share this post


Link to post
Share on other sites
If you want to read and write the whole struct in one line you can use a char array instead of std::string. Another way is to handle each member in the struct separately. You can write a std::string to file by first writing the length of the string and then the string data.

Share this post


Link to post
Share on other sites
Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.

Got any code on how I can LOAD a std::string from file by first reading the length of the string and then the string data?

[code]
struct LOADSPRITEOBJECT
{
int NameLength;
std::string name;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
};
[/code]

Share this post


Link to post
Share on other sites
The usual way of using variable-length strings:[code]struct Foo {
int bar;
std::string name;
int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int length = object.name.size();
write( &length, sizeof(int) );
write( object.name.c_str(), length+1 );
write( &object.baz, sizeof(int) );

-- deserialization:
Foo object;
read(&object.bar, sizeof(int));
int length;
read(&length, sizeof(int));
char* buffer = new char[length];
read( buffer, length );
buffer[length] = '\0';
object.name = buffer;
delete [] buffer;
read(&object.baz, sizeof(int));[/code]
The usual way of using in-place memory offsets:
[code]struct Foo {
int bar;
char* name;
int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);[/code]

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1318228575' post='4870984']
-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int length = object.name.size();
write( &length, sizeof(int) );
write( object.name.c_str(), length+1 );
write( &object.baz, sizeof(int) );
[/code]
[/quote]

In this case the layout in the file would be:
[code]
int bar
int length
char str[length+1]
int baz
[/code]

right? "bar" could be used to identify what type of struct we are reading?

Share this post


Link to post
Share on other sites
[quote]
In this case the layout in the file would be:
[/quote]
That is the current layout of the data in the file. str[length] is a NUL character.

[quote]
"bar" could be used to identify what type of struct we are reading?
[/quote]
Yes. Sometimes you can omit such identifiers, as the format of the file only allows certain structures in certain positions.

Note that if you are explicitly writing the length, the NUL terminator can be omitted. Care must be taken in this case to correctly convert the not-NUL-terminated character array into a std::string. There are constructor overloads that take a character pointer and a length, or the assign() member function could be used.

Share this post


Link to post
Share on other sites
The usual way of using in-place memory offsets:
[code]struct Foo {
int bar;
char* name;
int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);//did you mean offset? [/code]

Share this post


Link to post
Share on other sites
[quote name='Tispe' timestamp='1318227892' post='4870982']
Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.
[/quote]

Char arrays are fixed size. That doesn't mean they have to waste space though....

[code]
struct LOADSPRITEOBJECT
{
int NameLength;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
char name[1];

LOADSPRITEOBJECT* next() const
{
char* ptr = name + strlen(name) + 1;
return (LOADSPRITEOBJECT*)((void*)ptr);
}

private:
LOADSPRITEOBJECT();
};


struct LOADSPRITEOBJECT_FILEHEADER
{
int num;
LOADSPRITEOBJECT items[1];

private:
LOADSPRITEOBJECT_FILEHEADER();
}

class LoadOfSpriteObjectsFromFile
{
public:

void load(const char* filename)
{
FILE* fp = fopen(filename,"rb");
if(fp)
{
fseek(fp, 0, SEEK_END);
size_t sz = ftell(fp);
rewind(fp);
data = new unsigned char[ sz ];
fread(data, 1, sz, fp);
fclose(fp);

objects.resize( header.num );

LOADSPRITEOBJECT* obj = header->items;
for(int i=0; i<num; ++i)
{
objects[i] = obj;
obj = obj->next();
}
}
}

private:

union
{
unsigned char* data;
LOADSPRITEOBJECT_FILEHEADER* header;
};
std::vector<LOADSPRITEOBJECT*> objects;
};
[/code]

Share this post


Link to post
Share on other sites
[quote name='RobTheBloke' timestamp='1318247409' post='4871044']
[quote name='Tispe' timestamp='1318227892' post='4870982']
Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.
[/quote]

Char arrays are fixed size. That doesn't mean they have to waste space though....

[code]
struct LOADSPRITEOBJECT
{
int NameLength;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
char name[1];

LOADSPRITEOBJECT* next() const
{
char* ptr = name + strlen(name) + 1;
return (LOADSPRITEOBJECT*)((void*)ptr);
}

private:
LOADSPRITEOBJECT();
};


struct LOADSPRITEOBJECT_FILEHEADER
{
int num;
LOADSPRITEOBJECT items[1];

private:
LOADSPRITEOBJECT_FILEHEADER();
}

class LoadOfSpriteObjectsFromFile
{
public:

void load(const char* filename)
{
FILE* fp = fopen(filename,"rb");
if(fp)
{
fseek(fp, 0, SEEK_END);
size_t sz = ftell(fp);
rewind(fp);
data = new unsigned char[ sz ];
fread(data, 1, sz, fp);
fclose(fp);

objects.resize( header.num );

LOADSPRITEOBJECT* obj = header->items;
for(int i=0; i<num; ++i)
{
objects[i] = obj;
obj = obj->next();
}
}
}

private:

union
{
unsigned char* data;
LOADSPRITEOBJECT_FILEHEADER* header;
};
std::vector<LOADSPRITEOBJECT*> objects;
};
[/code]

[/quote]
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object? That in itself is undefined, and then your code doesn't even consider the fact that you're not aligning consecutive structures properly to ensure that their members are aligned.

You cannot use your objects by themselves; they are only good for storing them as pointers in an array given your code to load them. Your code will blow up as soon as you try do treat an object as a value. What you propose is nothing more than a pointer and a dynamic sized string, but instead of having a safe implementation of the pointer, you're way into the realm of undefined behavior.

Share this post


Link to post
Share on other sites
[quote name='Brother Bob' timestamp='1318252870' post='4871057']
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?[/quote]
Actually, it's a perfectly normal C idiom that was formalized in C99 with flexible array members. Even the Windows headers use it. Ex: the SYMBOL_INFO structure in dbghelp.h. I'm not personally a big fan of this technique, but it's not uncommon.

Share this post


Link to post
Share on other sites
Hidden
[quote]
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object? That in itself is undefined...
[/quote]
[url="http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx"]The fake-array-at-end-of-struct is a C idiom[/url], such objects would be individually allocated with malloc(sizeof(someStructure) + length_of(string)).

I haven't seen anyone try to line successive structures in contiguous memory like that before. As you rightly point out - this would cause alignment problems.

Share this post


Link to post
[quote name='SiCrane' timestamp='1318254184' post='4871059']
[quote name='Brother Bob' timestamp='1318252870' post='4871057']
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?[/quote]
Actually, it's a perfectly normal C idiom that was formalized in C99 with flexible array members. Even the Windows headers use it. Ex: the SYMBOL_INFO structure in dbghelp.h. I'm not personally a big fan of this technique, but it's not uncommon.
[/quote]
[i]new[/i], [i]class[/i], access specifiers and [i]std::vector[/i] eliminates C as an excuse though.

Share this post


Link to post
Share on other sites
I'm not saying it's great C++ code, but the struct hack is something you can reasonably expect your C++ compiler to handle without incident seeing that it is a C idiom that occurs in headers that C++ compilers are regularly expected to digest in APIs commonly used from C++ code. On non-x86 platforms alignment could be a deal breaker for this particular code, but the DirectX structures pretty much lock it in as it is. I wouldn't use it myself, but I would expect it to work.

Share this post


Link to post
Share on other sites
[quote name='Brother Bob' timestamp='1318258142' post='4871077']
[i]new[/i], [i]class[/i], access specifiers and [i]std::vector[/i] eliminates C as an excuse though.
[/quote]

Except when it is.

If you are serialising and deserialising data then this method is considerably faster than any standard "C++ Way" of doing things as it allows for fast block loading of data with variable length members. Compressing XML to a binary format is one use, as it the serialisation of assets.

Certainly with loading the ability to simply dump something into memory and then fix up pointers/counts internally is going to be faster than loading a bit, reserving some memory 'somewhere else' (string/vector), loading some more into that, returning to your last bit, loading some more and so on. Lower fragmentation, better on the cache and centralised data which is easier to inspect in a memory dump are all things which can be useful.

Would I reach for this as my first solution? Probably not, but I would certainly consider it if the access pattern I was expected mean that this was the optimial solution to the problem.

Share this post


Link to post
Share on other sites
[quote name='Tispe' timestamp='1318221395' post='4870963']
Hello.

I tried different ways to load file contents in to structs but have some trouble. Simply because the structs contain strings, which makes them variable in size. The file is supposed to hold several instanses which each needs to be loaded in to a struct.

I am thinking that the file header contains the number of structs in the file. The struct header can contain the length of the string.


Would you be so kind to show me how I can read a chunk of data from a file and cast that in to a struct that has a string in it? (btw are std::strings bad for this?)

[code]
struct LOADSPRITEOBJECT
{
int NameLength;
std::string name;
D3DXVECTOR3 OffsetPosition;
D3DCOLOR Color;
bool visible;
};
[/code]
[/quote]

I would order that differently if I were you as you are very likely to waste space with the string in front of an aligned data type. So start with you aligned data types or at least with no dynamic length data types which you know will put you on a 16 byte boundary. This will safe you both file size and runtime memory.


the D3DXVECTOR is the aligned data type btw.

Share this post


Link to post
Share on other sites
I got this working:

[code]
//Load file and copy contents to struct
LOADSPRITEOBJECT LoadSpriteStruct;
std::ifstream InFile("spritedatanew.txt", std::ios::in | std::ios::binary);
if(!InFile) {
MessageBox(NULL, L"Unable to open spritedatanew.txt", L"Error", MB_OK);
PostQuitMessage(0);
return;
}

InFile.read((char *)&LoadSpriteStruct.structtype, sizeof(int));
InFile.read((char *)&LoadSpriteStruct.length, sizeof(int));
char* buffer = new char[LoadSpriteStruct.length+1];
InFile.read( buffer, LoadSpriteStruct.length+1);
buffer[LoadSpriteStruct.length] = '\0';
LoadSpriteStruct.name = buffer;
delete [] buffer;
InFile.read((char *)&LoadSpriteStruct.Color, sizeof(D3DCOLOR) );
InFile.read((char *)&LoadSpriteStruct.OffsetPosition, sizeof(D3DXVECTOR3) );
InFile.read((char *)&LoadSpriteStruct.visible, sizeof(bool) );
InFile.close();

MakeSprite(LoadSpriteStruct);
[/code]

[code]
struct LOADSPRITEOBJECT
{
int structtype;
int length;
std::string name;
D3DCOLOR Color;
D3DXVECTOR3 OffsetPosition;
bool visible;
};
[/code]

Share this post


Link to post
Share on other sites
[quote name='Tispe' timestamp='1318242643' post='4871030']The usual way of using in-place memory offsets:
[font="Courier New"]1) int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
2) object->name = ((char*)&object->name) + int(object->name);//did you mean offset?[/font][/quote]1)Offset is the distance from the start of the 'name' field to the end of the structure. The string data itself is written to the file after the structure, so the offset tells you how far forward in the file to jump in order to find the string.
2) Upon deserialisation, 'name' actually contains the above offset value, not a pointer. The offset is relative to the address of the 'name' field, so the address of 'name' is added to the integer value of 'name', resulting in a pointer to the string data.
[quote name='Brother Bob' timestamp='1318252870' post='4871057']Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?[/quote]FWIW, I would also recommend that category of suggestions -- my above example is a similar technique. In my opinion, these in-place memory techniques are far superior to "C++ style" serialisation techniques. For example, in our game engine, deserialising a file that contains hundreds of data structures is a [font="Courier New"]nop[/font]; once the file is read from disk into memory by the OS, it's already usable without any parsing or decoding of it's contents ([i]instead of the above pointer patching on-load, I'd do it on-use by using offset templates instead of pointers in my structures[/i])[code]template<class T> struct Offset {
T* Ptr() { return (T*)( ((char*)this) + offset ); }
T* operator->() { return Ptr(); }
T& operator*() { return *Ptr(); }
private: u32 offset;
};[/code]

Share this post


Link to post
Share on other sites
I think you need to ask yourself how you are going to use this.

If you need random access to structs within the file, use a fixed char array, yes it will waste a bit of space but you can access any struct by its position which is much faster than having to go loop through possible all structs to find the one you are looking for (log(n)).

If you always going to read/write all structs at once (no random access) you can have a null terminated char array or string of the exact size of the string.

Share this post


Link to post
Share on other sites
I could write a sophisticated header type which says where in the file different structs are in that case, like a table of contents in that case. But for now it will be a file which describes the whole GUI so everything is loaded. For a different GUI another file is loaded.

I will probably go with this:
[code]
-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);
[/code]

Share this post


Link to post
Share on other sites
[quote name='Brother Bob' timestamp='1318252870' post='4871057']
Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?[/quote]
Yes. I am seriously suggesting that. I have a feeling you've seriously mis-understood what the code is doing.


[quote name='Brother Bob' timestamp='1318252870' post='4871057']
That in itself is undefined, [/quote]
No, it's very well defined. (Hint: Read the ISO standard on C strings)

[quote name='Brother Bob' timestamp='1318252870' post='4871057']
and then your code doesn't even consider the fact that you're not aligning consecutive structures properly to ensure that their members are aligned.[/quote]


Correct. But adding code to handle that is trival.


[quote name='Brother Bob' timestamp='1318252870' post='4871057']
You cannot use your objects by themselves; they are only good for storing them as pointers in an array given your code to load them.[/quote]
Correct. And that is bad because?

[quote name='Brother Bob' timestamp='1318252870' post='4871057']
Your code will blow up as soon as you try do treat an object as a value.[/quote]
That sir, is impossible. The compiler would inform you it can't be instanced long before that could possibly happen. I'm not *that* dumb ;)

[quote name='Brother Bob' timestamp='1318252870' post='4871057']
What you propose is nothing more than a pointer and a dynamic sized string, but instead of having a safe implementation of the pointer, you're way into the realm of undefined behavior.
[/quote]
It is a safe implementation, and it is fully defined as per the ISO standard. You might not like it, but that's not something relevant to it's validity.

As for "What you propose is nothing more than a pointer and a dynamic sized string", well, look again. You may notice that there is no char pointer - which is the entire point of doing it in the first place! It is the most compact (and efficient) way of loading string data from a file stream.

I'd also gently point out that it's used all over the place, here are a couple of samples from the Win32 SDK:

[code]

typedef struct tagMETARECORD
{
DWORD rdSize;
WORD rdFunction;
WORD rdParm[1]; ///<<<<<<<<<<<
} METARECORD;


typedef struct _RGNDATA {
RGNDATAHEADER rdh;
char Buffer[1]; ///<<<<<<<<<<<
} RGNDATA, *PRGNDATA, NEAR *NPRGNDATA, FAR *LPRGNDATA;[/code]

And now you know about the technique, you'll probably notice it in most middleware libs too ;)

It's not the only technique that exists (there are other, equally nasty looking, but valid methods), but for the scenario above, you're going to struggle to find anything that can match it for performance, and the memory usage of the loaded asset.

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1318289279' post='4871248']
[quote name='Tispe' timestamp='1318242643' post='4871030']The usual way of using in-place memory offsets:
[font="Courier New"]1) int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
2) object->name = ((char*)&object->name) + int(object->name);//did you mean offset?[/font][/quote]1)Offset is the distance from the start of the 'name' field to the end of the structure. The string data itself is written to the file after the structure, so the offset tells you how far forward in the file to jump in order to find the string.
2) Upon deserialisation, 'name' actually contains the above offset value, not a pointer. The offset is relative to the address of the 'name' field, so the address of 'name' is added to the integer value of 'name', resulting in a pointer to the string data.
[/quote]

small problem:
[code]
struct LOADSPRITEOBJECT
{
char* name;
D3DCOLOR Color;
D3DXVECTOR3 OffsetPosition;
bool visible;
};
[/code]
[code]
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
[/code]

What happens is that (char*)(&object+1) - (char*)(&object.name) lands you exactly 3 bytes too far. This is because it assumes that there is only a (char*) written (1 Byte) when there is actually 4 bytes written (int). You must subract 3 bytes from offset.

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1318228575' post='4870984']
The usual way of using in-place memory offsets:
[code]struct Foo {
int bar;
char* name;
int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);[/code]
[/quote]
This is going to blow up right into your face the day you decide to port your code to 64bit. And your data file layout will change with the memory architecture. Ugh.

Share this post


Link to post
Share on other sites
[quote name='Yann L' timestamp='1318460093' post='4872023']This is going to blow up right into your face the day you decide to port your code to 64bit. And your data file layout will change with the memory architecture. Ugh.[/quote]See the caveat in a later post regarding [font="'Courier New"]struct Offset[/font], which I find preferable to patching pointers into loaded blobs. If we do need to store pointers in blobs, I'd use the largest pointer type for the targets that the data file is being built for.[quote name='Tispe' timestamp='1318455059' post='4871992']You must subract 3 bytes from offset.[/quote]Good catch. That's what I get for writing code from memory.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this