The preprocessor is horrendously underspecified in both the C and C++ standards. I have come across a pseudo-code algorithm written by Dave Prosser1 which was purportedly used by the C standardisation committee as the basis for the C standards wording on preprocessor macro replacement. A google search reveals multiple sources claiming that Prosser's algorithm is "correct". Such a search also reveals a post to a mailing list about cpplib, which I understand to be the current preprocessor used in gcc, stating that although cpplib does not implement the exact algorithm, it is functionally identical2.
As this is well and good, but when I attempt to manually apply Prosser's algorithm to a particular token set I get different results than if I apply the Visual C++, gcc 3.3.1 or Borland 5.82 preprocessors.
The token set in question is:
I have written out my macro expansion out for your inspection. Braces delimit a sequence of tokens and each token is followed by its hide-set, delimited by angle brackets.
from the algorithm are represented as a brace delimited sequence of brace delimited sequences of tokens. Indentation represents nested expansions/substitutions. Some trivial steps have been elided:
expand({j<>, (<>, kk<>, (<>, k<>, )<>, ,<>, kk<>, (<>, k<>, )<>, )<>, (<>, l<>, )<>})
expand(subst({jj<>, (<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {}) . {(<>, l<>, )<>})
subst({jj<>, (<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {})
subst({(<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>})
subst({x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>})
subst({,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>} . expand({kk<>, (<>, k<>, )<>}))
expand({kk<>, (<>, k<>, )<>})
expand(subst({x}, {x}, {{k<>}}, {kk}, {}) . {})
subst({x}, {x}, {{k<>}}, {kk}, {})
subst({}, {x}, {{k<>}}, {kk}, {} . expand({k<>}))
expand({k<>})
{k<>}
subst({}, {x}, {{k<>}}, {kk}, {k<>})
{k<kk>}
expand({k<kk>})
{k<kk>}
subst({,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>})
subst({y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>})
subst({)<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>} . expand({kk<>, (<>, k<>, )<>}))
expand({kk<>, (<>, k<>, )<>})
expand(subst({x}, {x}, {{k<>}}, {kk}, {}) . {})
subst({x}, {x}, {{k<>}}, {kk}, {})
subst({}, {x}, {{k<>}}, {kk}, {} . expand({k<>}))
expand({k<>})
{k<>}
subst({}, {x}, {{k<>}}, {kk}, {k<>})
{k<kk>}
expand({k<kk>})
{k<kk>}
subst({)<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>, k<kk>})
subst({}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>, k<kk>, )<>})
{jj<j>, (<j>, k<j, kk>, ,<j>, k<j, kk>, )<j>}
expand({jj<j>, (<j>, k<j, kk>, ,<j>, k<j, kk>, )<j>, (<>, l<>, )<>})
expand(subst({x<>, ##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {}) . {(<>, l<>, )<>})
subst({x<>, ##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {})
subst({##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {k<j, kk>})
subst({##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {k<j, kk>})
subst({}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, glue({K<j, kk>}, {k<j, kk>}))
glue({K<j, kk>}, {k<j, kk>})
{kk<j, kk>}
subst({}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {kk<j, kk>}))
{kk<j, jj, kk>}
expand({kk<j, jj, kk>, (<>, l<>, )<>})
{kk<j, jj, kk>} . expand({(<>, l<>, )<>})
{kk<j, jj, kk>, (<>} . expand({l<>, )<>})
{kk<j, jj, kk>, (<>, l<>} . expand({)<>})
{kk<j, jj, kk>, (<>, l<>, )<>} . expand({})
{kk<j, jj, kk>, (<>, l<>, )<>}
kk(l)
Where am I going wrong? Is Prosser's algorithm incorrect? Are Visual C++, gcc and Borland all incorrect? Is my expansion incorrect? Do you know of any better references showing how the C/C++ preprocessor is supposed to function?
Thanks,
Σnigma
1
. Note that the annotated version contains an error in the return expression of the expansion of a function-like macro within expand. Refer to the original version for the correct version.
2