Jump to content

  • Log In with Google      Sign In   
  • Create Account


Reading from file to structs


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
22 replies to this topic

#1 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 09 October 2011 - 10:36 PM

Hello.

I tried different ways to load file contents in to structs but have some trouble. Simply because the structs contain strings, which makes them variable in size. The file is supposed to hold several instanses which each needs to be loaded in to a struct.

I am thinking that the file header contains the number of structs in the file. The struct header can contain the length of the string.


Would you be so kind to show me how I can read a chunk of data from a file and cast that in to a struct that has a string in it? (btw are std::strings bad for this?)

	LOADSPRITEOBJECT LoadSpriteStruct;
	std::ifstream inbal("spritedata.txt", std::ios::in | std::ios::binary);
	if(!inbal) {
	MessageBox(NULL, L"Unable to open", L"Error", MB_OK);
	PostQuitMessage(0);
	return;
	}

	inbal.read((char *) &LoadSpriteStruct, sizeof(LoadSpriteStruct));
	//inbal.close();

MakeSprite(LoadSpriteStruct); //init sprite and push in to a vector

void USERINTERFACEBOX::MakeSprite(LOADSPRITEOBJECT &instance)
{
	SPRITEOBJECT Sprite;

	//Set read data
	Sprite.Color = LoadSpriteStruct.Color;
	Sprite.name = LoadSpriteStruct.name;
	Sprite.OffsetPosition = LoadSpriteStruct.OffsetPosition;	
	Sprite.visible = LoadSpriteStruct.visible;

	//Calculate position relative parent box
	Sprite.position = BoxPosition + Sprite.OffsetPosition;

	//Init Sprite and put in array
	SpriteHandler.LoadSprite(Sprite);
	Sprites.push_back(Sprite);

struct LOADSPRITEOBJECT
{
	int NameLength;
	std::string name;
	D3DXVECTOR3 OffsetPosition;
	D3DCOLOR Color;
	bool visible;
};


Sponsor:

#2 Wooh   Members   -  Reputation: 574

Like
0Likes
Like

Posted 09 October 2011 - 11:10 PM

If you want to read and write the whole struct in one line you can use a char array instead of std::string. Another way is to handle each member in the struct separately. You can write a std::string to file by first writing the length of the string and then the string data.

#3 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 10 October 2011 - 12:24 AM

Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.

Got any code on how I can LOAD a std::string from file by first reading the length of the string and then the string data?

struct LOADSPRITEOBJECT
{
        int NameLength;
        std::string name;
        D3DXVECTOR3 OffsetPosition;
        D3DCOLOR Color;
        bool visible;
};


#4 Hodgman   Moderators   -  Reputation: 27883

Like
0Likes
Like

Posted 10 October 2011 - 12:36 AM

The usual way of using variable-length strings:
struct Foo {
	int bar;
	std::string name;
	int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int length = object.name.size();
write( &length, sizeof(int) );
write( object.name.c_str(), length+1 );
write( &object.baz, sizeof(int) );

-- deserialization:
Foo object;
read(&object.bar, sizeof(int));
int length;
read(&length, sizeof(int));
char* buffer = new char[length];
read( buffer, length );
buffer[length] = '\0';
object.name = buffer;
delete [] buffer;
read(&object.baz, sizeof(int));
The usual way of using in-place memory offsets:
struct Foo {
	int bar;
	char* name;
	int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);


#5 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 10 October 2011 - 01:16 AM

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int length = object.name.size();
write( &length, sizeof(int) );
write( object.name.c_str(), length+1 );
write( &object.baz, sizeof(int) );
[/code]


In this case the layout in the file would be:
int bar
int length
char str[length+1]
int baz

right? "bar" could be used to identify what type of struct we are reading?

#6 rip-off   Moderators   -  Reputation: 7701

Like
0Likes
Like

Posted 10 October 2011 - 02:40 AM

In this case the layout in the file would be:

That is the current layout of the data in the file. str[length] is a NUL character.

"bar" could be used to identify what type of struct we are reading?

Yes. Sometimes you can omit such identifiers, as the format of the file only allows certain structures in certain positions.

Note that if you are explicitly writing the length, the NUL terminator can be omitted. Care must be taken in this case to correctly convert the not-NUL-terminated character array into a std::string. There are constructor overloads that take a character pointer and a length, or the assign() member function could be used.

#7 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 10 October 2011 - 04:30 AM

The usual way of using in-place memory offsets:
struct Foo {
	int bar;
    char* name;
	int baz;
};

-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);//did you mean offset?


#8 RobTheBloke   Crossbones+   -  Reputation: 2298

Like
0Likes
Like

Posted 10 October 2011 - 05:50 AM

Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.


Char arrays are fixed size. That doesn't mean they have to waste space though....

struct LOADSPRITEOBJECT
{
        int NameLength;
        D3DXVECTOR3 OffsetPosition;
        D3DCOLOR Color;
        bool visible;
        char name[1];

        LOADSPRITEOBJECT* next() const
        {
       	char* ptr = name + strlen(name) + 1;
       	return (LOADSPRITEOBJECT*)((void*)ptr);
        }

private:
        LOADSPRITEOBJECT();
};


struct LOADSPRITEOBJECT_FILEHEADER
{
  int num;
  LOADSPRITEOBJECT items[1];

private:
  LOADSPRITEOBJECT_FILEHEADER();
}

class LoadOfSpriteObjectsFromFile
{
public:

  void load(const char* filename)
  {
    FILE* fp = fopen(filename,"rb");
    if(fp)
    {
      fseek(fp, 0, SEEK_END); 
      size_t sz = ftell(fp);
      rewind(fp);
      data = new unsigned char[ sz ];
      fread(data, 1, sz, fp);
      fclose(fp);

      objects.resize( header.num );

      LOADSPRITEOBJECT* obj = header->items;
      for(int i=0; i<num; ++i)
      {
        objects[i] = obj;
        obj = obj->next();
      }
    }
  }

private:

  union
  {
    unsigned char* data;
    LOADSPRITEOBJECT_FILEHEADER* header;
  };
  std::vector<LOADSPRITEOBJECT*> objects;
};


#9 Brother Bob   Moderators   -  Reputation: 7786

Like
0Likes
Like

Posted 10 October 2011 - 07:21 AM


Char arrays are fixed size if I read them in one line right? I think its too limiting and may waste space.


Char arrays are fixed size. That doesn't mean they have to waste space though....

struct LOADSPRITEOBJECT
{
        int NameLength;
        D3DXVECTOR3 OffsetPosition;
        D3DCOLOR Color;
        bool visible;
        char name[1];

        LOADSPRITEOBJECT* next() const
        {
   		char* ptr = name + strlen(name) + 1;
   		return (LOADSPRITEOBJECT*)((void*)ptr);
        }

private:
        LOADSPRITEOBJECT();
};


struct LOADSPRITEOBJECT_FILEHEADER
{
  int num;
  LOADSPRITEOBJECT items[1];

private:
  LOADSPRITEOBJECT_FILEHEADER();
}

class LoadOfSpriteObjectsFromFile
{
public:

  void load(const char* filename)
  {
    FILE* fp = fopen(filename,"rb");
    if(fp)
    {
      fseek(fp, 0, SEEK_END); 
      size_t sz = ftell(fp);
      rewind(fp);
      data = new unsigned char[ sz ];
      fread(data, 1, sz, fp);
      fclose(fp);

      objects.resize( header.num );

      LOADSPRITEOBJECT* obj = header->items;
      for(int i=0; i<num; ++i)
      {
        objects[i] = obj;
        obj = obj->next();
      }
    }
  }

private:

  union
  {
    unsigned char* data;
    LOADSPRITEOBJECT_FILEHEADER* header;
  };
  std::vector<LOADSPRITEOBJECT*> objects;
};

Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object? That in itself is undefined, and then your code doesn't even consider the fact that you're not aligning consecutive structures properly to ensure that their members are aligned.

You cannot use your objects by themselves; they are only good for storing them as pointers in an array given your code to load them. Your code will blow up as soon as you try do treat an object as a value. What you propose is nothing more than a pointer and a dynamic sized string, but instead of having a safe implementation of the pointer, you're way into the realm of undefined behavior.

#10 SiCrane   Moderators   -  Reputation: 9413

Like
0Likes
Like

Posted 10 October 2011 - 07:43 AM

Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?

Actually, it's a perfectly normal C idiom that was formalized in C99 with flexible array members. Even the Windows headers use it. Ex: the SYMBOL_INFO structure in dbghelp.h. I'm not personally a big fan of this technique, but it's not uncommon.

#11 Brother Bob   Moderators   -  Reputation: 7786

Like
0Likes
Like

Posted 10 October 2011 - 08:49 AM


Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?

Actually, it's a perfectly normal C idiom that was formalized in C99 with flexible array members. Even the Windows headers use it. Ex: the SYMBOL_INFO structure in dbghelp.h. I'm not personally a big fan of this technique, but it's not uncommon.

new, class, access specifiers and std::vector eliminates C as an excuse though.

#12 SiCrane   Moderators   -  Reputation: 9413

Like
0Likes
Like

Posted 10 October 2011 - 09:39 AM

I'm not saying it's great C++ code, but the struct hack is something you can reasonably expect your C++ compiler to handle without incident seeing that it is a C idiom that occurs in headers that C++ compilers are regularly expected to digest in APIs commonly used from C++ code. On non-x86 platforms alignment could be a deal breaker for this particular code, but the DirectX structures pretty much lock it in as it is. I wouldn't use it myself, but I would expect it to work.

#13 phantom   Moderators   -  Reputation: 6802

Like
0Likes
Like

Posted 10 October 2011 - 10:23 AM

new, class, access specifiers and std::vector eliminates C as an excuse though.


Except when it is.

If you are serialising and deserialising data then this method is considerably faster than any standard "C++ Way" of doing things as it allows for fast block loading of data with variable length members. Compressing XML to a binary format is one use, as it the serialisation of assets.

Certainly with loading the ability to simply dump something into memory and then fix up pointers/counts internally is going to be faster than loading a bit, reserving some memory 'somewhere else' (string/vector), loading some more into that, returning to your last bit, loading some more and so on. Lower fragmentation, better on the cache and centralised data which is easier to inspect in a memory dump are all things which can be useful.

Would I reach for this as my first solution? Probably not, but I would certainly consider it if the access pattern I was expected mean that this was the optimial solution to the problem.

#14 NightCreature83   Crossbones+   -  Reputation: 2674

Like
0Likes
Like

Posted 10 October 2011 - 10:34 AM

Hello.

I tried different ways to load file contents in to structs but have some trouble. Simply because the structs contain strings, which makes them variable in size. The file is supposed to hold several instanses which each needs to be loaded in to a struct.

I am thinking that the file header contains the number of structs in the file. The struct header can contain the length of the string.


Would you be so kind to show me how I can read a chunk of data from a file and cast that in to a struct that has a string in it? (btw are std::strings bad for this?)

struct LOADSPRITEOBJECT
{
	int NameLength;
	std::string name;
	D3DXVECTOR3 OffsetPosition;
	D3DCOLOR Color;
	bool visible;
};


I would order that differently if I were you as you are very likely to waste space with the string in front of an aligned data type. So start with you aligned data types or at least with no dynamic length data types which you know will put you on a 16 byte boundary. This will safe you both file size and runtime memory.


the D3DXVECTOR is the aligned data type btw.


Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, Mad Max

#15 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 10 October 2011 - 01:45 PM

I got this working:

	//Load file and copy contents to struct
	LOADSPRITEOBJECT LoadSpriteStruct;
	std::ifstream InFile("spritedatanew.txt", std::ios::in | std::ios::binary);
	if(!InFile) {
	MessageBox(NULL, L"Unable to open spritedatanew.txt", L"Error", MB_OK);
	PostQuitMessage(0);
	return;
	}

	InFile.read((char *)&LoadSpriteStruct.structtype, sizeof(int));
	InFile.read((char *)&LoadSpriteStruct.length, sizeof(int));
	char* buffer = new char[LoadSpriteStruct.length+1];
	InFile.read( buffer, LoadSpriteStruct.length+1);
	buffer[LoadSpriteStruct.length] = '\0';
	LoadSpriteStruct.name = buffer;
	delete [] buffer;
	InFile.read((char *)&LoadSpriteStruct.Color, sizeof(D3DCOLOR) );
	InFile.read((char *)&LoadSpriteStruct.OffsetPosition, sizeof(D3DXVECTOR3) );
	InFile.read((char *)&LoadSpriteStruct.visible, sizeof(bool) );
	InFile.close();

	MakeSprite(LoadSpriteStruct);

struct LOADSPRITEOBJECT
{
	int structtype;
	int length;
	std::string name;
	D3DCOLOR Color;
	D3DXVECTOR3 OffsetPosition;
	bool visible;
};


#16 SiCrane   Moderators   -  Reputation: 9413

Like
0Likes
Like

Posted 10 October 2011 - 02:52 PM

One detail: std::string::operator=() can throw an exception. If it does, you'll leak the buffer you allocated.

#17 Hodgman   Moderators   -  Reputation: 27883

Like
0Likes
Like

Posted 10 October 2011 - 05:27 PM

The usual way of using in-place memory offsets:
1) int offset = int((char*)(&object+1) - (char*)(&object.name)); //offset address is now between bar and baz?
2) object->name = ((char*)&object->name) + int(object->name);//did you mean offset?

1)Offset is the distance from the start of the 'name' field to the end of the structure. The string data itself is written to the file after the structure, so the offset tells you how far forward in the file to jump in order to find the string.
2) Upon deserialisation, 'name' actually contains the above offset value, not a pointer. The offset is relative to the address of the 'name' field, so the address of 'name' is added to the integer value of 'name', resulting in a pointer to the string data.

Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?

FWIW, I would also recommend that category of suggestions -- my above example is a similar technique. In my opinion, these in-place memory techniques are far superior to "C++ style" serialisation techniques. For example, in our game engine, deserialising a file that contains hundreds of data structures is a nop; once the file is read from disk into memory by the OS, it's already usable without any parsing or decoding of it's contents (instead of the above pointer patching on-load, I'd do it on-use by using offset templates instead of pointers in my structures)
template<class T> struct Offset {
	T* Ptr() { return (T*)( ((char*)this) + offset ); }
	T* operator->() { return Ptr(); }
	T& operator*() { return *Ptr(); }
private: u32 offset;
};


#18 flodihn   Members   -  Reputation: 214

Like
0Likes
Like

Posted 11 October 2011 - 08:14 AM

I think you need to ask yourself how you are going to use this.

If you need random access to structs within the file, use a fixed char array, yes it will waste a bit of space but you can access any struct by its position which is much faster than having to go loop through possible all structs to find the one you are looking for (log(n)).

If you always going to read/write all structs at once (no random access) you can have a null terminated char array or string of the exact size of the string.
www.next-gen.cc NextGen MMO Architecturewww.abydosonline.com Abydos Online

#19 Tispe   Members   -  Reputation: 978

Like
0Likes
Like

Posted 11 October 2011 - 02:17 PM

I could write a sophisticated header type which says where in the file different structs are in that case, like a table of contents in that case. But for now it will be a file which describes the whole GUI so everything is loaded. For a different GUI another file is loaded.

I will probably go with this:
-- serialization:
Foo object;
write( &object.bar, sizeof(int) );
int offset = int((char*)(&object+1) - (char*)(&object.name));
write( &offset, sizeof(int) );
write( &object.baz, sizeof(int) );
write( object.name, strlen(object.name)+1 );

-- deserialization:
void* buffer = readWholeFile();
Foo* object = (Foo*)buffer;
object->name = ((char*)&object->name) + int(object->name);


#20 RobTheBloke   Crossbones+   -  Reputation: 2298

Like
0Likes
Like

Posted 12 October 2011 - 12:22 PM

Are you seriously suggesting a solution by abusing memory like that where you tell the compiler and user you have a one-character array for the string, and then storing the actual string content way outside the array and the object?

Yes. I am seriously suggesting that. I have a feeling you've seriously mis-understood what the code is doing.


That in itself is undefined,

No, it's very well defined. (Hint: Read the ISO standard on C strings)

and then your code doesn't even consider the fact that you're not aligning consecutive structures properly to ensure that their members are aligned.



Correct. But adding code to handle that is trival.


You cannot use your objects by themselves; they are only good for storing them as pointers in an array given your code to load them.

Correct. And that is bad because?

Your code will blow up as soon as you try do treat an object as a value.

That sir, is impossible. The compiler would inform you it can't be instanced long before that could possibly happen. I'm not *that* dumb ;)

What you propose is nothing more than a pointer and a dynamic sized string, but instead of having a safe implementation of the pointer, you're way into the realm of undefined behavior.

It is a safe implementation, and it is fully defined as per the ISO standard. You might not like it, but that's not something relevant to it's validity.

As for "What you propose is nothing more than a pointer and a dynamic sized string", well, look again. You may notice that there is no char pointer - which is the entire point of doing it in the first place! It is the most compact (and efficient) way of loading string data from a file stream.

I'd also gently point out that it's used all over the place, here are a couple of samples from the Win32 SDK:


typedef struct tagMETARECORD
  {
    DWORD   	rdSize;
    WORD        rdFunction;
    WORD        rdParm[1];   ///<<<<<<<<<<< 
  } METARECORD;


typedef struct _RGNDATA {
    RGNDATAHEADER   rdh;
    char            Buffer[1];   ///<<<<<<<<<<< 
} RGNDATA, *PRGNDATA, NEAR *NPRGNDATA, FAR *LPRGNDATA;

And now you know about the technique, you'll probably notice it in most middleware libs too ;)

It's not the only technique that exists (there are other, equally nasty looking, but valid methods), but for the scenario above, you're going to struggle to find anything that can match it for performance, and the memory usage of the loaded asset.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS