• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Ectara

Reduce/Reduce Conflict Resolution

8 posts in this topic

I have a LALR(1) parser that I'm writing, and I'm having trouble with a frequent occurence in a lot of grammars. There are non-terminals with identical definitions, and the parser is supposed to pick between them to go with the non-terminal it is to reduce them into. Additionally, there is also the expectation that not all tokens that comprise either of these non-terminals be reduced into them; some other non-terminals require it, as is seen in the example below.

As for the parser, it reads in grammar, and then generates a state machine in memory, which then can be used to parse a language described by the grammar. It works a lot like more conventional parser generators, except that rather than outputting code to be compiled into a state machine, it creates a state machine in memory that the parser then follows to process the input. It doesn't have features like combining similar state sequences in the machine into one chain, as that would break the requirement of needing the non-terminal names to be distinct, and to be able to tell which non-terminal it is.

As for the example, take this C grammar excerpt:

[code]
enumeration_const : IDENTIFIER
;

...

type_spec : VOID | CHAR | SHORT | INT | LONG | FLOAT
| DOUBLE | SIGNED | UNSIGNED
| struct_or_union_spec
| enum_spec
| typedef_name
;

...

typedef_name : IDENTIFIER
;

...

primary_exp : IDENTIFIER
| const
| STRING
| '(' exp ')'
;

...

const : INT_CONST
| CHAR_CONST
| FLOAT_CONST
| enumeration_const
;
[/code]

Here, you can see that both enumeration_const and typedef_name clearly expect to be defined as a single identifier, and multiple other non-terminals expect either one or the other. In addition, primary_exp requires an IDENTIFIER that is not reduced to either of them. To resolve the reduce-reduce conflict, the parser currently reduces all identifiers to enumeration_const's, which is not good. My first guess is to somehow use the look-ahead to determine whether or not to reduce, and to what I should reduce based on against which non-terminal I'm checking the look-ahead. It'd have to work for any number of tokens, and not depend on the non-terminal's definition, so that has me stuck. I also can't use a symbol table from the parser to tell whether it is an enumeration or a type name, as it isn't supposed to be language dependant; not all languages have enumurations or named types, and some languages have more symbol types than this. I'd like to resolve it without any magic.

Lex and Yacc are good references, but they cannot be used as substitutes, and will not be. The parser must be completed; if this really truly is impossible for a LALR algorithm, then I'd be interested in learning how to express this concept in a manner that can be parsed by this parser. This parser serves an important purpose, and while each rule and non-terminal can have a callback to execute a function when it encounters one of that type, if I could do it without having to execute any code outside of the parser, it'd be preferable. The parser is written in C, and the language I'm currently testing is C. I have about all relevant standard library functions, and the parser cannot backtrack or guess incorrectly. The parser uses advanced file streams, and can be reading the input from memory, a physical file, a pipe, a network socket, a device node, a cat, etc, so the stream cannot be rewound past a character or two of ungetc(). While I have 32bit integers, I cannot guarantee the presence of a 64bit type.

Any advice would be appreciated, and let me know if more information is needed. Edited by Ectara
0

Share this post


Link to post
Share on other sites
[quote]
My first guess is to somehow use the look-ahead to determine whether or not to reduce, and to what I should reduce based on against which non-terminal I'm checking the look-ahead. It'd have to work for any number of tokens, and not depend on the non-terminal's definition, so that has me stuck.
[/quote]

That would be my guess. Looking around, it looks like there's a transform step for these conflicts that changes the (ambiguous non-terminal) rules from their specified grammar into more complex rules with the 1 step look-ahead.

Unfortunately, my experience was with top down parsing so it's hard to map that experience with how you're doing things. Edited by Telastyn
0

Share this post


Link to post
Share on other sites
Yeah, I see what you're saying. Thank you for taking the time to look it over. It looks like I'll need to build a new state machine for the parser; traversing a table that represents the grammar exactly as specified will only get me so far. I might be able to keep most of the code that traverses it, but I need to make a way to introduce subtle modifications to the grammar.

However, I can't quite tackle removing the ambiguity. Leaving the tokens as-is is the easier part, it would seem, if I could find a way to identify if a non-terminal becomes ambiguous when paired with another. This is all, of course, paired with the assumption that I can even parse this grammar with an LALR(k) parser.

So, I guess my tasks are:

1. Determine when a non-terminal makes another ambiguous.

2. Make a more complex state machine to handle this condition.

3. Find a way to traverse this new machine, after making it.

For 1, I might have to scan every state sequence and compare it with the one that is proposed to be added, and see if they are identical. If they are even one token off, the parser knows how to choose between them already. From there, I might mark it as being ambiguous.

For 2, I might add a special identification node indicating an ambiguity is about to occur and under what name it should be called in this instance, and remove the ambiguous definitions state sequences altogether, leaving a single state sequence named something unique.

For 3, if I follow 2 as described, I could traverse, and see if a non-terminal is matched containing the unique state sequence described earlier, and if so, fix the reference to it to indicate that it is of the type indicated by the identification node as I'm reducing. When the new non-terminal node is pushed on to the stack, it is now correctly resolved.

Does this sound logical?
0

Share this post


Link to post
Share on other sites
[quote name='Ectara' timestamp='1346102252' post='4973912']
Does this sound logical?
[/quote]

It sounds naively as a lot of work for what amounts to a special case, but I might just be misunderstanding things.

It shouldn't be a [i]new [/i]state machine, it should be (assuming you're using a state machine already) refining some of the duplicate branches of the state transitions and replacing them with "them +1 lookahead" to disambiguate them. The state machine itself would behave similarly except for that transform.
0

Share this post


Link to post
Share on other sites
I'm not sure that the problem can be reconciled so easily; considering two non-terminals like so:
[code]
n1 : IDENTIFIER ;
n2 : IDENTIFIER ;

n3 : n1 rule_1 rule_2 rule_3 ... rule_n '*'
n4 : n2 rule_1 rule_2 rule_3 ... rule_n '/'
[/code]

It would seem that I would need to look ahead n + 1 tokens to find the last token that would tell which that identifier should be, if either of them. Seeing as n could be any whole number, the number of tokens I could encounter before knowing what to do would be indeterminable at the time that I need to parse the identifier because I have no room for guesswork.

The method I suggested is the first that comes to mind that would solve the problem of not knowing what it should be by waiting until the non-terminal containing it is reduced to then reduce it to the correct non-terminal.

The first problem I can think of with my method is telling the difference between an ordinary sequence of tokens and the same sequence of tokens that fits the definition of the ambiguous non-terminals. My plan isn't worth it if it doesn't solve all problems presented...
0

Share this post


Link to post
Share on other sites
If I read the internet correctly, if you cannot disambiguate after the first lookahead then the grammar itself is ambiguous as far as LALR(1) is concerned.
0

Share this post


Link to post
Share on other sites
enumeration_consts are identical to identifiers in both lexical and syntactic meaning, but not semantic meaning. Even a GLR parser (which are extended LR parsers that handle conflicts in the parse table by branching the parse stack) won't know what to do (it'll produce an output forest with every identifier-or-enumeration_const permuted).

The solutions are either:[list=1]
[*]Remove enumeration_const from your grammar and do semantic analysis later to identify them.
[*]Treat enumeration_const as a terminal instead, which means the lexer needs to be able to recognize them before feeding them to the parser. This is harder since you require information from your work-in-progress parse tree. This will make your grammar non-ambiguous, though.
[/list]
I generally use #1 (if anything is lexically identical, I treat them all as one terminal and do semantic analysis after the parse to determine what it actually is).
0

Share this post


Link to post
Share on other sites
[quote name='Telastyn' timestamp='1346116536' post='4973955']
If I read the internet correctly, if you cannot disambiguate after the first lookahead then the grammar itself is ambiguous as far as LALR(1) is concerned.
[/quote]
When you put it that way, that makes a lot of sense. I get too caught up with trying to create contingency plans for everything. I'll have to see if I can resolve the conflict with one token of look-ahead; maybe the inverse of what I've been doing: Instead of always checking to see if the look-ahead could create a possible match to know when not to reduce, perhaps I should also check to see if I _should_ reduce. I'd need some way to know that the tokens on the stack could be reduced to the ambiguous non-terminals; maybe if I see that it could match either of them, and they have an ambiguous flag set, then I check to see if the look-ahead indicates that I should reduce to one or the other. If not, then I don't reduce, doing the opposite of what I usually do.


[quote name='Nypyren' timestamp='1346127946' post='4973981']
enumeration_consts are identical to identifiers in both lexical and syntactic meaning, but not semantic meaning. Even a GLR parser (which are extended LR parsers that handle conflicts in the parse table by branching the parse stack) won't know what to do (it'll produce an output forest with every identifier-or-enumeration_const permuted).

The solutions are either:[list=1]
[*]Remove enumeration_const from your grammar and do semantic analysis later to identify them.
[*]Treat enumeration_const as a terminal instead, which means the lexer needs to be able to recognize them before feeding them to the parser. This is harder since you require information from your work-in-progress parse tree. This will make your grammar non-ambiguous, though.
[/list]
I generally use #1 (if anything is lexically identical, I treat them all as one terminal and do semantic analysis after the parse to determine what it actually is).
[/quote]
Unfortunately, this isn't an option; many grammars make use of such facilities. I can't allow the feature not to be used; if a well known version of the C grammar from 1985 ([url="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html"]http://www.lysator.l...-grammar-y.html[/url]) expects the ambiguity to be resolved, then I should probably support it. As for the second point, if I require knowing the purpose of the tree's data while parsing, that restricts the parser to only grammar that gives that amibiguous non-terminal a comparable value. To be more clear, if the language has ambiguous non-terminals, but doesn't store the results of either in any sort of container, then reading the tree won't help determine what it is. Additionally, if the language does store it, this restricts me by having to add new capabilites for each language, if they support different types of names for different things.

EDIT: I've decided to change the state machine a bit; the machine now does a table look-up with the current state and the look-ahead token to find where to go next. While it consumes more memory than my current method, it'll be much faster, and a solution might be more feasible with a new method. I'll report back with any new options that have opened up. Edited by Ectara
0

Share this post


Link to post
Share on other sites
I've been trying for quite a while, but I can't quite get my head around it. How should I design this machine? Is it a web of states that are connected by pointers? Is it supposed to be several tables containing something, with something acting as an index? I'm not quite sure how to implement it; any guess I think of would result in an LL(k) parser.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0