C/C++ preprocessor macro replacement

Started by
9 comments, last by Tom Sloper 7 years, 1 month ago
The preprocessor is horrendously underspecified in both the C and C++ standards. I have come across a pseudo-code algorithm written by Dave Prosser1 which was purportedly used by the C standardisation committee as the basis for the C standards wording on preprocessor macro replacement. A google search reveals multiple sources claiming that Prosser's algorithm is "correct". Such a search also reveals a post to a mailing list about cpplib, which I understand to be the current preprocessor used in gcc, stating that although cpplib does not implement the exact algorithm, it is functionally identical2. As this is well and good, but when I attempt to manually apply Prosser's algorithm to a particular token set I get different results than if I apply the Visual C++, gcc 3.3.1 or Borland 5.82 preprocessors. The token set in question is:
#define j(x, y) jj(x, y)
#define jj(x, y) x ## y
#define kk(x) x

j(kk(k), kk(k))(l)
Applying the preprocessors of the aforementioned compilers gives me the output:
l
Whereas manually applying Prosser's algorithm gives me:
kk(l)
I have written out my macro expansion out for your inspection. Braces delimit a sequence of tokens and each token is followed by its hide-set, delimited by angle brackets. actuals from the algorithm are represented as a brace delimited sequence of brace delimited sequences of tokens. Indentation represents nested expansions/substitutions. Some trivial steps have been elided:
expand({j<>, (<>, kk<>, (<>, k<>, )<>, ,<>, kk<>, (<>, k<>, )<>, )<>, (<>, l<>, )<>})
expand(subst({jj<>, (<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {}) . {(<>, l<>, )<>})
	subst({jj<>, (<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {})
	subst({(<>, x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>})
	subst({x<>, ,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>})
	subst({,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>} . expand({kk<>, (<>, k<>, )<>}))
		expand({kk<>, (<>, k<>, )<>})
		expand(subst({x}, {x}, {{k<>}}, {kk}, {}) . {})
			subst({x}, {x}, {{k<>}}, {kk}, {})
			subst({}, {x}, {{k<>}}, {kk}, {} . expand({k<>}))
				expand({k<>})
				{k<>}
			subst({}, {x}, {{k<>}}, {kk}, {k<>})
			{k<kk>}
		expand({k<kk>})
		{k<kk>}
	subst({,<>, y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>})
	subst({y<>, )<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>})
	subst({)<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>} . expand({kk<>, (<>, k<>, )<>}))
		expand({kk<>, (<>, k<>, )<>})
		expand(subst({x}, {x}, {{k<>}}, {kk}, {}) . {})
			subst({x}, {x}, {{k<>}}, {kk}, {})
			subst({}, {x}, {{k<>}}, {kk}, {} . expand({k<>}))
				expand({k<>})
				{k<>}
			subst({}, {x}, {{k<>}}, {kk}, {k<>})
			{k<kk>}
		expand({k<kk>})
		{k<kk>}
	subst({)<>}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>, k<kk>})
	subst({}, {x, y}, {{kk<>, (<>, k<>, )<>}, {kk<>, (<>, k<>, )<>}}, {j}, {jj<>, (<>, k<kk>, ,<>, k<kk>, )<>})
	{jj<j>, (<j>, k<j, kk>, ,<j>, k<j, kk>, )<j>}
expand({jj<j>, (<j>, k<j, kk>, ,<j>, k<j, kk>, )<j>, (<>, l<>, )<>})
expand(subst({x<>, ##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {}) . {(<>, l<>, )<>})
	subst({x<>, ##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {})
	subst({##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {k<j, kk>})
	subst({##<>, y<>}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {k<j, kk>})
	subst({}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, glue({K<j, kk>}, {k<j, kk>}))
		glue({K<j, kk>}, {k<j, kk>})
		{kk<j, kk>}
	subst({}, {x, y}, {{k<j, kk>}, {k<j, kk>}}, {j, jj}, {kk<j, kk>}))
	{kk<j, jj, kk>}
expand({kk<j, jj, kk>, (<>, l<>, )<>})
{kk<j, jj, kk>} . expand({(<>, l<>, )<>})
{kk<j, jj, kk>, (<>} . expand({l<>, )<>})
{kk<j, jj, kk>, (<>, l<>} . expand({)<>})
{kk<j, jj, kk>, (<>, l<>, )<>} . expand({})
{kk<j, jj, kk>, (<>, l<>, )<>}
kk(l)
Where am I going wrong? Is Prosser's algorithm incorrect? Are Visual C++, gcc and Borland all incorrect? Is my expansion incorrect? Do you know of any better references showing how the C/C++ preprocessor is supposed to function? Thanks, Σnigma 1http://www.spinellis.gr/blog/20060626/index.html. Note that the annotated version contains an error in the return expression of the expansion of a function-like macro within expand. Refer to the original version for the correct version. 2http://sourceware.org/ml/gdb-patches/2002-03/msg00315.html
Advertisement
Did you try feeding it to boost::wave and seeing what it produced?
Im in no way qualified to answer this but it seems like your preprocessor is only making one pass and the others making at least two.

Since the "kk" in your result is generated "on the fly" by "k ## k" the preprocessor wont recognize it as a macro on its first pass and hence not evaluate it. Since "kk(1)" evaluates to "1" the only thing you have to do to get the same result as the other preprocessors is evaluating it again (ei making another pass)
Quote:Original post by SiCrane
Did you try feeding it to boost::wave and seeing what it produced?
wave gives the same result as the compilers I tested. I'm certainly leaning heavily towards this being the "correct" interpretation, de facto if not de jure. The remaining question though is why - what is the correct algorithm for macro replacement?
Quote:Original post by Thunder Sky
Im in no way qualified to answer this but it seems like your preprocessor is only making one pass and the others making at least two.

My expansion using Prosser's algorithm does make a second pass (the last six lines of the expansion), but the kk macro is in the hide-set for the first token and thus it is not expanded.

Σnigma
Ok, this is what I think is happening. 6.10.3.1 of the C99 draft says "After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list, unless preceded by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been expanded."

So the preprocessor expansion of j(kk(k), kk(k))(l) would perform j() expansion, and then kk() as an argument expansion. And then it performs jj() expansion on its replacement list, but when the jj() expansion is performed the kk() expansion isn't considered to have been performed yet, since the kk() expansion that was perfomed was the kk() expansion on the arguments of the j() expansion, and hasn't been added to the list of expansions on the replacement list.
Assuming I'm understanding you correctly, that doesn't explain why
#define e(x) ee(x)#define ee(x) x(y)#define f(x) fe(f(x))
results (from all five methods) in f(y), not f. Substituting the above into your argument:

So the preprocessor expansion of e(f(x)) would perform e() expansion, and then f() as an argument expansion. And then it performs ee() expansion on its replacement list, but when the ee() expansion is performed the f() expansion isn't considered to have been performed yet, since the f() expansion that was perfomed was the f() expansion on the arguments of the e() expansion, and hasn't been added to the list of expansions on the replacement list.

Σnigma
You're going to have to settle for a "damned if I know" from me, since, well....
You're probably better off asking this somewhere more focused on C or C++ like comp.lang.c++.moderated

I think you need to reconsider the way you create a new token.

When you create a new token via paste-operator I suspect you attach it a blue paint equal to the intersection of the blue paints of each of 2 operands.

GCC implements indeed the same prosser's algo, but gcc creates a new token with empty blue paint.

Assuming I'm understanding you correctly, that doesn't explain why ...

You answered that question yourself right at the beginning:

The preprocessor is horrendously underspecified in both the C and C++ standards.

As I'm sure you know since you mention the standards, an enormous amount of functionality is left up to the implementation.

For an frightening amount of source code out there the compiler does whatever the compiler actually does, with bugs in the source, bugs in the compiler, bugs in the linker, bugs in the operating system, and even bugs in the processors. The entire digital world is run every day using code that violates the language standards with results that are completely undefined. That's the reality programmers get to deal with, and it mostly works out okay.

One compiler gives l and the other compiler gives kk(l). Compilers give different results. Wrap those in preprocessor conditionals for the compilers you care about and solve the next problem.

This topic is closed to new replies.

Advertisement