Fast hand-written lexical analyzer

Started by
0 comments, last by Antheus 13 years, 7 months ago
I made a hand written lexical analyzer ~> in a switch-case structure.
I thought about a couple of ideas in order to increase speed:

1. why not to use a Ternary search tree in order to find symbols?
2. What about using OpenMP to scan all lines in parallel and storing symbols in an array of linked lists?
I'm not sure about this one, because of the finding \ns && OpenMP initialization
overheads
3. Any idea on how to benefit from inline assembly (&& SIMD)?

*Fix

[Edited by - Eliad Moshe on October 3, 2010 3:20:31 PM]
Advertisement
Quote:Original post by Eliad Moshe

1. why not to use a Ternary search tree in order to find symbols and storing them in an array of linked lists?
I don't know. Try it.

Quote:2. What about using OpenMP to find all lines in parallel?
I'm not sure about this one, because of the finding \ns && OpenMP initialization
overheads
3. Any idea about how to benefit from inline assembly &&/|| SIMD ?


Lexical analyzers are horribly impossible to parallelize. Due to the way parsing works, everything is sequential. And in order to determine which blocks are independent, one must first parse the content.

Imagine simple code:
{// 20,000 lines}
This block must first be parsed entirely sequentially until closing parenthesis is matched. But that code could be malformed:
{  { // incorrectly opened block  // 20,000 lines} // error, previous block


It's simply a pathologically serial problem. Even finding individual lines requires sequential scan. Parsers are also FSMs - those are inherently sequential, at least without a priori knowledge of which parts are independent (most commonly individual files).

Besides, all parsing today is limited by either memory latency or by branching.

This topic is closed to new replies.

Advertisement