Jump to content
  • Advertisement
matt77hias

Token representation

Recommended Posts

If you want to parse a file format containing certain tokens, what is the best way of representing these tokens in your C++ code?

  • constexpr (thus also inline) variables
  • pre-processor macros
  • ...

It is likely that only a reader and a writer will use the tokens (2x TUs).

Edited by matt77hias

Share this post


Link to post
Share on other sites
Advertisement

Strings.

 

 

In a more serious vein, there is no "best." What makes sense at one scale is totally messy at another scale. What works well for one type of serialization/parser setup is going to be a disaster for others.

Personally I prefer to just stash a bunch of static const char * variables in a header or something and let the linker deduplicate them.

Share this post


Link to post
Share on other sites
1 hour ago, ApochPiQ said:

Personally I prefer to just stash a bunch of static const char * variables in a header or something and let the linker deduplicate them.

Do you still prefer this in the presence of C++17's inline variables?

Share this post


Link to post
Share on other sites

What are you looking for? Universal agreement on "the best" of $random subject?

In that case, you won't get it. Just the fact there are several solutions already contradicts existence of "the best". (If one was really better than all others, everybody would use that, and all the others would not be on the table as viable alternatives.)

Perhaps you should stop worrying about "best" and just pick "any that seems ok" instead. It will work fine.

 

FWIW: I don't put tokens in C++ code, I put them in a lex file, and generate C/C++ code, together with a generated Yacc parser.

Edited by Alberth

Share this post


Link to post
Share on other sites
9 minutes ago, Alberth said:

FWIW: I don't put tokens in C++ code, I put them in a lex file, and generate C/C++ code, together with a generated Yacc parser.

A full lexer+parser is way overkill for most file formats. Don't understand me wrong, a lexer+parser is great for programming languages, but for the common data files for models, etc., you'll probably end up with a faster and smaller alternative by writing your own scanner.

Share this post


Link to post
Share on other sites

You didn't specify any class of file format in your question, nor excluded any class of formats.

If anything, you have just shown there is no universal agreement on "best".

Share this post


Link to post
Share on other sites

I did a simple .h file parser so as to generate runtime information for c structs.

I load the entire file in memory as a char* and my token struct is defined as follow

struct token
{
	u32 Len;
	char *String;
};

The char* only points to a part of the memory, it doesn't generate a new string and the Len tells you where the string stops. Notice that this doesn't play well with other functions that expects a zero terminated string, whenever I need to pass a token to any of these functions I just have a TokenToString() that creates a new zero terminated char*

Share this post


Link to post
Share on other sites

Your lexer should be a DFA built out of switch-on-character statements and state variable(s), so your tokens should be "stored" spread out across case statement character constants in certain states.  Each token you recognize should have two parts - the token type ID (integer) and possibly the substring that you parsed which resulted in this token type which you then use as a value (string, enum, integer, etc).  Your lexer should then pass those tokens to your parsing engine (SLR/LR/whatever) - or the parsing engine can have a loop that calls "GetNextToken".  The parsing engine should primarily be dealing with the token type IDs to guide its state(s), but if you have something like a math expression evaluator you can immediately evaluate subexpressions in the middle of the parse instead of generating an AST if you want to.

But if you're not writing a high performance parser it really doesn't matter what you do.  Since you're even asking this question at all implies you do care, though.

Edited by Nypyren

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!