Writing a Tree to a File (C++)

Started by
10 comments, last by Jkom329 15 years, 10 months ago
For sake of simplicity, suppose I have a tree with nodes defined as struct NODE { int value; char* string; NODE* left; NODE* right; } I want to write the whole tree to a file. I know that if use write((char*)&node, sizeof node), I'll write the complete node to the file, but I'm guessing that the pointers aren't going to be magically modified in any meaningful way. If the tree were to be restored, they would be useless. My question is, will I have to handle the pointers entirely on my own, or is there an easier way? I also have a variable length string associated with each node that has to be handled. I don't think I need an explanation of how to do this straight up since I've already thought it out. However, if there's an easier way than setting each pointer to the appropriate byte in the output file myself, please feel free to enlighten me.
Advertisement
think about XML. It is essentially a hierarchical n-tree, with data attributes associated with the nodes. This would probably be the optimal solution if you are going for a human-readable format. Just start with your root node and write each successive child node to the file recursively.

You could use a library such as TinyXML to do the XML writing and reading to save you the trouble of writing your own parser. The benefit of using XML is that it is expandable for later use. Let's say you wanted to add a few more data members to a node. XML would allow you to easily do this without having to totally rewrite your serialization code, with support for document type definitions and versioning.
Quote:Original post by Jkom329
For sake of simplicity, suppose I have a tree with nodes defined as

struct NODE
{
int value;
char* string;
NODE* left;
NODE* right;
}

I want to write the whole tree to a file. I know that if use write((char*)&node, sizeof node), I'll write the complete node to the file, but I'm guessing that the pointers aren't going to be magically modified in any meaningful way. If the tree were to be restored, they would be useless.

My question is, will I have to handle the pointers entirely on my own, or is there an easier way? I also have a variable length string associated with each node that has to be handled.

I don't think I need an explanation of how to do this straight up since I've already thought it out. However, if there's an easier way than setting each pointer to the appropriate byte in the output file myself, please feel free to enlighten me.


You could:

- Adjust the pointers as you write them to the file, as you're doing. This is probably fairly complex.

- Preface each structure's data in the file with its *current* address; when loading, set up a temporary associative array mapping the "pointer value" of a structure to the address of its "reconstituted" equivalent. This is fairly simple, and general purpose (it works on any kind of graph, as long as you have a way to make sure you write each node to the file exactly once, and know which members of the node structures need to be "adjusted" using the associative array), but takes extra data in the file and extra memory while loading (after loading, you can throw away the associative arrays).

- Write the values of the nodes in depth-first order, prefacing each with a binary flag to indicate if there is actually a node there. Writing looks like:

to serialize_tree(node_pointer, file):  if node_pointer:    emit_boolean(True, file) # How you represent a boolean value in the file is up to you    # but it will probably be easiest to write a byte that is either '1' or '0'    emit_value(node_pointer->value, file)    serialize_tree(node_pointer->left)    serialize_tree(node_pointer->right)  else:    emit_boolean(False, file)to deserialize_tree(file):  result = null  if read_boolean(file):    result = create_node()    result->value = read_value(file)    result->left = deserialize_tree(file)    result->right = deserialize_tree(file)    return result


This is not general-purpose, but it is simple and will leave a rather smaller file.

BTW, you really really don't want to use char* for strings. You want to use std::string. However, no matter how you represent your string, you will need to be careful about how you serialize the string itself. You can't just write a pointer value, even with "fixing", because the pointed-at data will be nowhere to be found. The usual approach is to write a number giving the length of the string, followed by the string contents. Then when you deserialize, you can read the number, use it to figure out how much to read for the string, and construct a string object for it.
Quote:Original post by Zahlman
[...]- Adjust the pointers as you write them to the file, as you're doing. This is probably fairly complex.[...]
It's not complex at all - you just iterate over the tree and insert into a map the address as the key and a unique identifier (such as a counter that is incremented for each node) as the value. Then, when writing, use the map to convert pointers to indicies. On load, you'd do the opposite - load all the nodes into memory, storing the pointer to each on in a map and then iterating over them all to convert the IDs to pointers again.
Quote:- Preface each structure's data in the file with its *current* address;[...]
If you write thee nodes in a specific order, (such as <self, left, right>) you only need the address of the root node. You know the address of the left node because you know the value of root->left and you know that node immediately follows the root node in the file. You know that you're not recursing down the left subtree anymore whenever the left pointer of a node is null - then you get it's right node next, instead. If both are null, you just reached a leaf and are returning up to higher levels of the tree.

Alternately, you could just use boost::serialization and let it worry about the details of dealing with the pointers.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Extrarius
It's not complex at all - you just iterate over the tree and insert into a map the address as the key and a unique identifier (such as a counter that is incremented for each node) as the value. Then, when writing, use the map to convert pointers to indicies. On load, you'd do the opposite - load all the nodes into memory, storing the pointer to each on in a map and then iterating over them all to convert the IDs to pointers again.


Maybe I'm misunderstanding this, but aren't you effectively suggesting to replace one unique number (pointer) with another unique number (ID) for no apparent reason (unless your ID-type is smaller than sizeof(void*) )?

Quote:Alternately, you could just use boost::serialization and let it worry about the details of dealing with the pointers.


Sounds so much more comfortable, especially because you won't spend hours debugging your code ,-)
f@dzhttp://festini.device-zero.de
Quote:Original post by Trienco
Quote:Original post by Extrarius
It's not complex at all - you just iterate over the tree and insert into a map the address as the key and a unique identifier (such as a counter that is incremented for each node) as the value. Then, when writing, use the map to convert pointers to indicies. On load, you'd do the opposite - load all the nodes into memory, storing the pointer to each on in a map and then iterating over them all to convert the IDs to pointers again.


Maybe I'm misunderstanding this, but aren't you effectively suggesting to replace one unique number (pointer) with another unique number (ID) for no apparent reason (unless your ID-type is smaller than sizeof(void*) )?


He seems to be describing my second approach in response to my description of the first :) I assumed what the OP was doing was converting pointers into file offsets somehow (and never mind that, in hindsight, I have no idea why :) ).
I did it the way I originally thought, and it wasn't a big deal at all. I use a recursive function to iterate through the entire tree, of course, and the function returns a streamoff for the node that it wrote. The root node is written first, and everything thereafter goes wherever the recursive function takes it. I put the string at the end of each node chunk and terminate it with a null character, so I don't even bother storing the char*. Its location is implicit.

Aressera,
XML would probably not be good for this project based on what I know of it. I admit that I don't know much about it, but this tree is going to contain tens of millions of nodes, and is going to have to eventually be compressed. Each node or group of nodes in the file is also going to have to be decompressed on demand. And if XML has this functionality somehow, I'd probably be too lazy to figure it all out. It'd probably be less frustrating if I just pound it out myself in C.

Zahlman,
If you read my reply to Aressera, you'll see why I do not plan to fully reconstitute this tree, let alone use more memory than I really have to. I do like your suggestion on writing depth first, but if I'm reading data directly from the file, it would take too much iteration to get where I need to be. That may actually be preferable though given the space I'd save. I will definitely think on it. Thanks! ... And yes, I am converting pointers into file offsets.

Extrarius,
I'm going to avoid the maps and extra memory for reasons stated above, but your second approach is similar to the depth first by Zahlman's. Although his uses less memory with the bools. I'll have to look into that boost::serialization...

Thanks for the replies! Now I need to work on compression...
Maybe I'm missing something here but can't you just store the address of each node and the address of the nodes each node points to? I don't see how this would store it in any way that wasn't reconstructable.
Everyone hates #1.That's why a lot of idiots complain about WoW, the current president, and why they all loved google so much when it was new.Forget the fact that WoW is the greatest game ever created, our president rocks and the brainless buffons of America care more about how articulate you are than your decision making skills, and that google supports adware, spyware, and communism.
This page seems to have a lot of interesting info.
You could give expat a try for xml parsing, which is stream-oriented and very fast

This topic is closed to new replies.

Advertisement