Jump to content

  • Log In with Google      Sign In   
  • Create Account






Optimizing is hard

Posted by ApochPiQ, 19 July 2011 · 167 views

As I hinted in a comment on a previous entry, I've finished deploying boost::spirit::lex in the Epoch compiler. My hope was that, by reducing byte-level backtracking, I would realize a substantial speed gain in the parser. It seemed like a logical enough assumption, and so I gave it a shot.

The initial working implementation was actually an order of magnitude slower. This turned out to be because I was making a dumb excess copy of a string for every token in the input program, then destroying it immediately; by simply operating on a subset of the original input string, I eliminated this wastage and dropped things back to sane levels.

Unfortunately, all things considered, the lexer actually slowed the parser down by a few milliseconds on the 20KB test program. Eventually I tracked that down as well - turns out I was doing some redundant lookahead trying to be clever when the lexer had obviated the need for byte-level lookahead entirely. Removing the pointless lookahead correspondingly improved parse times. The test case edged down to around 17.5ms, which is about 1ms faster than without lex.

Of course I'm actually doing this on a 2MB input file and not the 20KB original test case, because that's the only way to get enough data to make profiling runs worthwhile. So in reality parses are in the 1.7 second range for a 2MB input.

The upside is that the backtracking done on the byte level was masking a lot of inefficiencies at the higher grammar level, mainly to do with optional chunks of code. For example, an "if" statement may have one or more "elseif" statements and an optional trailing "else." Expressing this naively in the grammar is really slow, because it involves a lot of backtracking: "ok, I have an if... now what's next? Uh oh, what's next isn't an else! So fail that, and try again to just match the if by itself..." and so on.

I'm culling these dumb inefficiencies one at a time, and so far things are looking good. As of this writing, I'm down to 1.65 seconds on the 2MB file. (Note that due to constant overhead of spinning up the parsing system, the actual runtime on a real 20KB is closer to 18ms than 16.5ms, but the gains on larger inputs are definitely worth it.)


And now, more profiling! Yayy!




What kind of parser are you using? Sound like switching to a more conventional LALR system might be a big win for you.
boost::spirit::qi generates LL(inf) parsers. Obviously this is not as algorithmically efficient as some alternatives, but it's proving fast enough. The main issue though is Epoch's grammar is runtime mutable by the program itself, i.e. the program is allowed to specify extensions to its own syntax. I don't know of any LALR parser generators that allow for this.

July 2014 »

S M T W T F S
  12345
6789101112
13141516171819
20212223 24 2526
2728293031  

Recent Comments

PARTNERS