Jump to content
  • Advertisement
Sign in to follow this  
Eliad Moshe

Fast hand-written lexical analyzer

This topic is 2820 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I made a hand written lexical analyzer ~> in a switch-case structure.
I thought about a couple of ideas in order to increase speed:

1. why not to use a Ternary search tree in order to find symbols?
2. What about using OpenMP to scan all lines in parallel and storing symbols in an array of linked lists?
I'm not sure about this one, because of the finding \ns && OpenMP initialization
overheads
3. Any idea on how to benefit from inline assembly (&& SIMD)?

*Fix

[Edited by - Eliad Moshe on October 3, 2010 3:20:31 PM]

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by Eliad Moshe

1. why not to use a Ternary search tree in order to find symbols and storing them in an array of linked lists?
I don't know. Try it.

Quote:
2. What about using OpenMP to find all lines in parallel?
I'm not sure about this one, because of the finding \ns && OpenMP initialization
overheads
3. Any idea about how to benefit from inline assembly &&/|| SIMD ?


Lexical analyzers are horribly impossible to parallelize. Due to the way parsing works, everything is sequential. And in order to determine which blocks are independent, one must first parse the content.

Imagine simple code:
{
// 20,000 lines
}
This block must first be parsed entirely sequentially until closing parenthesis is matched. But that code could be malformed:
{
{ // incorrectly opened block
// 20,000 lines
} // error, previous block


It's simply a pathologically serial problem. Even finding individual lines requires sequential scan. Parsers are also FSMs - those are inherently sequential, at least without a priori knowledge of which parts are independent (most commonly individual files).

Besides, all parsing today is limited by either memory latency or by branching.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!