Jump to content
  • Advertisement
Sign in to follow this  
supagu

regex - how to parse function prototype

This topic is 3857 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a little script language im working on and i want to parse my function prototype using regex, but having troubles coming up with something that works in all cases {} Test {class t, int32 r} OR {}Test{class t, int32 r} should result in: {} Test {class t, int32 r} Test {{r, b} t, int32 r} OR Test{{r, b} t, int32 r} should result in: Test {{r, b} t, int32 r} {} Test Should result in: {} Test hrmm ideas? [Edited by - supagu on March 31, 2008 6:12:00 PM]

Share this post


Link to post
Share on other sites
Advertisement
I'm not quite sure what you're asking. Are you sure that your language is regular? Why don't you use more powerful grammars?
Traditionally, you write a parser in two parts: a scanner and the parser proper. The scanner reads the input character stream, matching it with regular expressions and produces a stream of tokens that represent logical elements of the language. The parser then works on the tokens.

One thing that a scanner usually does is elimination of spaces, which I gather is at least a part of your problem.

Share this post


Link to post
Share on other sites
This would probably be best handled as a grammar. You can tokenize your input quite trivially as:

 '{' '}' ID '{' ID ID ',' ID ID '}'
ID '{' '{' ID ',' ID '}' ID ',' ID ID '}'
'{' '}' ID


Then, your grammar would be:
line : '{' '}' ID ( '{' args '}' )?
| ID ('{' parm '}' )

args : ID ID ( ',' args )?

parm : ID ID ( ',' parm )?
| '{' ID ',' ID '}' ID ( ',' parm )?


Which is fairly short and easy to parse.

Share this post


Link to post
Share on other sites
Normal regular expressions are not powerful enough to handle the task of "balancing" brackets.

Share this post


Link to post
Share on other sites
Computational Models 101:

Finite State Automaton = Regular Expressions
Deterministic Push Down Automaton = Deterministic context-free grammar
Push Down Automata = Context Free Grammar
Linear-bounded Automata = Context Sensitive Grammar
Turing Machine = Unrestricted Grammar

Most computer languages are DCFG -- they can be parsed with a first pass of RegExp tokenization, followed by a finite-lookahead based parser of various kinds.

Trying to balance brackets using a RE will screw you over. Balanced brackets are one of the classic problems that REs cannot solve. The reason why is pretty clear when you realize that all REs are finite state automatons -- each bracket is something you have to remember in the future, and a FSA can only remember a fixed and finite amount of stuff.

In fact, computer languages try to put themselves into a subset of DCFGs, such as LR(1) or LL(1) or LALR or the like. (Note that C++ is annoying in that it requires harder parsing than most other languages)

Share this post


Link to post
Share on other sites
What language do you intend to program this in?

The normal approach is to tokenize - possibly using regular expressions, then parse. If the language is simple enough, the steps can be trivially combined.

Apart from the bracket balancing, your language is very simple - Perl regexes and .Net regexes can balance brackets, so if you're using one of them regexes may still be an option.

[Edited by - Nitage on March 31, 2008 5:03:42 PM]

Share this post


Link to post
Share on other sites
im programming it in c++ using boost.regex which i believe is perl syntax. If regex can't parse this line i should be able to make a custom algorithm to do it.

is 'grammar' something like regular expressions (ie. you give it some strings and it will attempt to parse a line of text)? If so i'll do some googling on the topic :)

Share this post


Link to post
Share on other sites
Have fun, you are delving into the world of Computer Science. Not the Computer Science most people think of, the real mathematical true version of Computer Science.

Share this post


Link to post
Share on other sites
Quote:
Original post by supagu
im programming it in c++ using boost.regex which i believe is perl syntax. If regex can't parse this line i should be able to make a custom algorithm to do it.


Use boost.spirit, or perhaps flex++/bison++.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!