Sign in to follow this  
thedevdan

'Cheating' with recursion.

Recommended Posts

I succesfully made a parser for an XML-like language:
Units
   Name
      Light Artillery
   Team
      Capitalists
   Damage
      70
   Range
      Maximum
         8
      Minimum
         3
   Speed
      2
   Special Attributes
      Explosive
      Fail-safe Cooldown
Well, I made it with a while loop, and I decided to try using recursion. However, not only does it seem harder to follow, but it seems impossible without sharing data between calls (the number current number of tabs). I have decided not to use recursion here, but would something like this be possible without doing sharing information between calls (for example, having the function in a class, and using members to hold the shared data)? If the question seems garbled, just tell me.

Share this post


Link to post
Share on other sites
Don't use recursion!

You said the code was less readable and you couldn't think of a nice way to do it ... so don't! Recursion is great for some things, but if its just making stuff complicated then don't use it. No point writing code that you'll never be able to update or maintain just because it does it a cool way.

if your set on useing recursion then you'll probably want to store your data in a tree (where "unit" is the root node and the "name", "team" and "range" etc are its children and so on). Recursion works great for trees. However if every unit has exactly the same structure its probably way simple to just use a struct or class and have the data read straight in - that way when you add a new attribute it'll take you 5 seconds rather than having to figure out what the hell your recursion was doing and trying not to break it!

Share this post


Link to post
Share on other sites
My data is stored in a tree. And passing the data as paramters won't work, because data will change in a more recent call, and when that function returns, the calling function has the old data. But, as I said, I am not going to use recursion here. [wink]

I was just wondering if it were possible.

Share this post


Link to post
Share on other sites
you want to pass only the parent pointer. So say you have something like:


struct attribute
{
string name;
int val;

attribute *child;
attribute *nxt;
}

//then you do this:

void readData(attribute *atrib, char *xml)
{
// extrac the data from the current xml section
atrib->name = getNextToken()
atrib->val = toInt( getNextToken())

// see what kind of token is next. It will either be a child,
// a sibbling or the end of this branch.
tokenType tmp = peekNextTokenType()

if (tmp = CHILD)
{
attribute child = new attribute
atrib->child = child;

readData(child, xml)
}
else
atrib->child = NULL;

// here we peek again because if the node has children
// (as with range in your example!) and it also has
// siblings if we don't peek twice we'll miss the siblings
// and cut "speed" and "special atribs" off the tree.
tmp = peekNextTokenType()

if (tmp = SIBBLING)
{
attribute sibling = new attribute
atrib->nxt= sibling;

readData(sibling, xml)
}
else
atrib->nxt = NULL;
}


and there you have it! This is just pseduo code of the top of my head but it should work. You'll have to figure out how to write the token stuff (especially peek) but it should be pretty self explanitory! Although i still say if it works with a while loop then I suggest you stick with that as when you have to recode it later (as you always do) you'll know wtf its doing :P

Share this post


Link to post
Share on other sites
Quote:
Original post by kaysik
you want to pass only the parent pointer. So say you have something like


struct attribute
{
string name;
int val;

attribute *child;
attribute *nxt;
}

then you do this:

void readData(attribute atrib, char *xml)
{
// extrac the data from the current xml section
string tmp = getNextToken()
atrib.name = tmp;
atrib.val = toInt( getNextToken())

// see what kind of token is next
tmp = peekNextTokenType()

if (tmp = CHILD)
{
attribute child = new attribute
atrib.child = child;

readData(child, xml)
}

// see what kind of token is next
tmp = peekNextTokenType()

if (tmp = SIBBLING)
{
attribute sibling = new attribute
atrib.nxt= sibling;

readData(sibling, xml)
}
}


and there you have it! But if it works with a while loop then I suggest you stick with that as when you have to recode it later (as you always do) you'll know wtf its doing :P


Thanks for the reply.

I would do exactly as you described, except I don't know how to undo a getline call! In other words, once I have getlined, I can't go back. So, I can't have a peek method. If I did, I could substitute the shared data with simply keeping the data I need in the file to be retreived. But since getline doesn't let you go back, I can't do that.

If you can't understand that, don't worry. As I write this, I am sleeping.

Share this post


Link to post
Share on other sites
its pretty easy to write your own peek! All you have todo is read the WHOLE file into one huge buffer at the start. Then instead of actually reading it off the disk as you go through you use a pointer to loop through the data. The "getNextToken()" function just passes you the current point and moves its own pointer to right after the next space (or new line). Your peek function just reads from where the current point is to check the next bit of the file but doesn't move the pointer at all. If you make a buffer class its pretty simple.

Don't forget to keep a pointer to the start of the memory buffer though because you'll need that to free it proerly.



// the different types you can have for peek
enum type{TYPE_EOF, TYPE_CHILD, TYPE_SIBLING, TYPE_BRNACH_END};

class buffer
{
public:
buffer();

bool readFile(string fname); // reads in the whole file
void reset(); // frees all data and resets pointers

// returns the next token and updates the "current" pointer
string getNextToken();

// reads from the "current" pointer and checks whats next
// but doesn't change the location of current. You can
// then peek as many times as you like.
type peekNextTokenType();

private:
char *data; // pointer to the start of the data
char *current; // pointer to the current location
}

Share this post


Link to post
Share on other sites
recursion can be used in this case, and can be used efficiently ... but it can be misused even more easily by people that try the naive answers for some "theoretical" reason that doesn't follow the actual consequences.

For example, anyone who would tell you to pass the current indention level as a parameter to the function (and hence ... on the stack) is making a common mistake - thinking that anything which can be viewed as formal abstraction is "good" abstraction. Doing this would cost multiple copies of the indention level value to be made, the creation of silly constant variables (because each pushed "variable" would be constant for it's whole lifetime - instead of just incrementing and decramenting a single shared value), would destroy performance due to being completly impossible for the compiler to generate optimization for (if a single shared variable is used, it can be retained in a register if desirable), and would waste memory (and therefore cause and increase in cache misses).

A "good" abstraction and system reflect the actual underlying algorithm correctly and efficiently. In fact, you can almost always tell your solution is better if it requires less complicated code, less memory, or less time to execute - without significantly penalizing either of the other 2 factors.

"shared" values are not bad, they are ESSENTIAL. The key is just at what level, and to whom, they are shared.

For example, a class is 2 things, a means to hide variables from unwanted access, and a means to efficiently share variables between related function calls - by only passing 1 single pointer, instead of pushing each variable in turn onto the stack).

A Parser class would logically have a "indentionLevel" member variable ... along with all other state which needs to be maintained while parsing. Or depending on your model, perhaps the parser represents the type of parsing, and so has control variables only ... and a ParserState class has the immediate state values needed to be shared during parsing ... For instance, if you wrote a compiler, you would "share" a symbol table ... because that is the heart of the compiler.

Do not confuse "share" with "global" and "bad" ... hell, don't even confuse "global" with "bad" ... the real answer is actually pretty simple:

Everything should be "global" unless it meets one of the following criteria:

1. Some function, somewhere in the code, might want their own copy of the item, that has a different value than the value wanted by some other piece of code. If this is the case, the items should be instantiatable, not pre allocated in a fixed place a fixed number of times (aka global)

Anti 1. Sometimes, even if multiple people need the ability to have multiple copies, there is still a use for having one or more "default" or centrally accessable copies. I do this with my "SystemLogger" which is a globablly accessable instance of my Logger class.

2. Some function, somewhere in the code, would like certain constraints to remain true, that are not inheirantly guaranteed by the underlying enviroment / langauge. In these cases, you will want to write things like classes / access functions.

Anti 2. Sometimes (rarely) the runtime costs of applying the constaints formally (through functions calls and run-time value tests) is too high to be justified, in which case the best route is often to allow limited circumvention of the actual constaints ... much like "friend" functions and "protected" members allow in C++, or also like have an item be accessable directly (aka public), but also through functions .. and simply applying the rule that no where is "allowed" to access it directly (by human mandate) except the few specified places.

Note, each rules applies seperately, and leads to decisions that interact. IE, rule 2 guides the wrapping of data and classes, into classes and higher abstraction classes. While rule 1 guides the usage of member and local variables as opposed to static and global variables.

Don't ask me exactly how I wondered off topic ...

Share this post


Link to post
Share on other sites
You can still call recursively while having things like "indentation level" available "globally" during the recursion. Make them class variables, and use a recursive member function. Something like:


class Parser {
private:
int indentation_level;
public:
Parser(int il = 0): indentation_level(il) {}
TreeNode& parse(InputIterator &i, TreeNode parent = new TreeNode()) {
attach_some_nodes_to_parent(); // meanwhile advancing iterator
indentation_level++;
parse(i, someChildTreeNodeOrOther);
indentation_level--;
attach_more_nodes_to_parent(); // until we reached end of this indentation level
return parent;
}
}



You might also be interested in YAML, it looks similar to the markup you're trying to use.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this