String Parsing

Started by
12 comments, last by CoiN 22 years, 2 months ago
Lo, I''m coding a console spreadsheet program, just one with pretty basic functions (SUM(A1:D2), etc). I''ve now got it so I can fill the spread sheet up and display it too the screen, etc. But I''ve hit a problem. The problem is that I want the user to be able to enter something along the lines of A1 + 4 * C2 or something. This data would be stored in a string and I need to get a total somehow. Any ideas? Thanks in Advance.... CoiN
Advertisement
This might be overkill, but the "right way" is to use a parser generating tool like lex and yacc. Lex and yacc are UNIX tools that generate C code to handle parsing. I am not sure what ports or what other parser generators are available on Windows. Also, lex and yacc have a somewhat steep learning curve, but they are useful to know if you do a lot of parsing. They are often used in compilers ( like gcc ).
As already said above, you have to use parsing algorithms to solve that problem correctly. Search the net for syntax analyzing and parsing. You will find plenty of resources. You can either implement syntax and parse trees by yourself or you get into lex and yacc. They are available for every platform.

If you don''t want to use sophisticated parsing algorithms you could also try to use something like fixed length expressions and use sscanf to parse the string. Or you use brute force by simply parsing the string with get() and peek() and lots of "ifs"
I''ll jump on the bandwagon, too. YACC is a great tool; in my mind, one of the most useful software tools ever written. Once you get the hang of it, you can produce a parser for any LALR(1) grammer you come up with. Trying to produce a parser by hand for languages that allow expressions with multiple operators/precedence, parenthesis, recursion, etc is a daunting task for even experienced programmers.

LEX, in my opinion, is less useful. Its a bigger hammer than you''ll need, and sometimes confuses the boundaries of parser and lexer, IMHO. But, it''s easy to get a simple lexer up and running, just don''t go haywire with it.

Once you have the parser and lexer, it''s simply a matter of producing the runtime machine :O It is a lot of work, but I believe you''d learn a lot and gain a very useful skill. You can also amaze and astound your friends...
I have a feeling you want to write your own code rather than using a prebuilt program to generate it for you.

If so, then its not as hard as it seems.

Trace through the string from right to left, looking for operators. Look in the opposite order of the order of operations. When you find an operation, recurse with the other two sides of the string. If no operator is found, then assume its a number, translate that string into a number, and return it.

This can be diagrammed as an expression tree. What I''m going to put below isn''t a strict expression tree but I think it helps explain how the function works better.

    Expression: 2+2*7+4/2    Operator:     (+)                 /   \                /     \            2+2*7     4/2             (+)      (/)             / \      / \            2  2*7   4   2               (*)               / \              2   7 
Phew! Thank you, TerranFury! I was about to think no one here was familiar with simple expression trees and prefix/infix/postfix evaluation.

You extract from left to right, with higher precedence operators being children of lower ones. That way you can evaluate the expression right out of the tree. Going with TerranFury''s example, you''d evaluate the lowest level node first (2 * 7), then its parent (2 + (2 * 7)), and so on.

Also be careful with parentheses. I have a binary expression tree (with possibly one bug in it on the parentheses) somewhere. Mail me if you want me to look for it and send it to you.

[ GDNet Start Here | GDNet Search Tool | GDNet FAQ | MS RTFM [MSDN] | SGI STL Docs | Google! ]
Thanks to Kylotan for the idea!
And if you really want to get into language/compiler design/construction, try "Principles of Compiler Design" by Aho/Ullman (Addison-Wesley 1977). It is a good introductory book on the subject.

Geez. I just saw the 1977 publish date and realized it was one of my college texts...
Yeah, but I don''t think he needs to get into the "Dragon Book" just for this... When you''re ready to write a compiler, Principles and Compiler Design: Principles and Practice (aka "Son of the Dragon Book") are excellent reads.

[ GDNet Start Here | GDNet Search Tool | GDNet FAQ | MS RTFM [MSDN] | SGI STL Docs | Google! ]
Thanks to Kylotan for the idea!
quote:Originally posted by Oluseyi:
I was about to think no one here was familiar with simple expression trees and prefix/infix/postfix evaluation.


That's exactly what I had been thinking.

Its funny though, because I really am not particularly familiar with expression trees. I was just introduced to them a week or two ago. Aparrently, though, I've had more experience with them then I had realized.

A little while back, I wanted to think of something to program for the sake of programming. I wanted to think of something that seemed difficult, but I didn't want to do anything really large scale - I had gotten over the "Let's make Quake XIV!" mentality most new programmers have back when I learned QB.

I looked at my TI-83+, and I thought to myself, "Let's see if I can do that ."

I had never heard of an expression tree, and I had just discovered recursion. I had no idea what the established algorithms were for doing what I was attempting.

I sat down, and thought for a little, and realized that in order to deal with nested parenthases I would need to use recursion. So I started to sketch out trees on paper - not because I had heard of expression trees, but because trees seemed to be the easiest way to diagram recursion, and I came up with the idea of recursively dividing the string into a left part and a right part around operators.

Luckily, I had made the decision early-on to define an operator as a function pointer. I had never even used a function pointer before, but I had heard of them, and they sounded like the right tool for the job. They were. Now I might have created an operator object instead - but it's really the same thing.

Since then, I've read a little on mathematical expression trees, and I realized that I rediscovered the same techniques described in academic texts. I'm doubt my implimentation is particuarly efficient. I know it's not too elegant. But it works.

I'll clean up the code a little bit and post it back here in a day or two if you still need help. But experiment with any ideas you have; don't be afraid to reinvent the wheel. Because its really very rewarding!



Edited by - TerranFury on February 1, 2002 10:31:43 PM
I didn''t know there was a ''Son of Dragon''. I''ll have to pick it up; I thought the first one was excellent. I actually caught myself reading it for fun (what a nerd, right? )

This topic is closed to new replies.

Advertisement