# Token representation

## Recommended Posts

If you want to parse a file format containing certain tokens, what is the best way of representing these tokens in your C++ code?

• constexpr (thus also inline) variables
• pre-processor macros
• ...

It is likely that only a reader and a writer will use the tokens (2x TUs).

Edited by matt77hias

##### Share on other sites

Strings.

In a more serious vein, there is no "best." What makes sense at one scale is totally messy at another scale. What works well for one type of serialization/parser setup is going to be a disaster for others.

Personally I prefer to just stash a bunch of static const char * variables in a header or something and let the linker deduplicate them.

##### Share on other sites
1 hour ago, ApochPiQ said:

Personally I prefer to just stash a bunch of static const char * variables in a header or something and let the linker deduplicate them.

Do you still prefer this in the presence of C++17's inline variables?

##### Share on other sites

What are you looking for? Universal agreement on "the best" of \$random subject?

In that case, you won't get it. Just the fact there are several solutions already contradicts existence of "the best". (If one was really better than all others, everybody would use that, and all the others would not be on the table as viable alternatives.)

Perhaps you should stop worrying about "best" and just pick "any that seems ok" instead. It will work fine.

FWIW: I don't put tokens in C++ code, I put them in a lex file, and generate C/C++ code, together with a generated Yacc parser.

Edited by Alberth

##### Share on other sites
9 minutes ago, Alberth said:

FWIW: I don't put tokens in C++ code, I put them in a lex file, and generate C/C++ code, together with a generated Yacc parser.

A full lexer+parser is way overkill for most file formats. Don't understand me wrong, a lexer+parser is great for programming languages, but for the common data files for models, etc., you'll probably end up with a faster and smaller alternative by writing your own scanner.

##### Share on other sites

You didn't specify any class of file format in your question, nor excluded any class of formats.

If anything, you have just shown there is no universal agreement on "best".

##### Share on other sites

I did a simple .h file parser so as to generate runtime information for c structs.

I load the entire file in memory as a char* and my token struct is defined as follow

struct token
{
u32 Len;
char *String;
};

The char* only points to a part of the memory, it doesn't generate a new string and the Len tells you where the string stops. Notice that this doesn't play well with other functions that expects a zero terminated string, whenever I need to pass a token to any of these functions I just have a TokenToString() that creates a new zero terminated char*

##### Share on other sites

Your lexer should be a DFA built out of switch-on-character statements and state variable(s), so your tokens should be "stored" spread out across case statement character constants in certain states.  Each token you recognize should have two parts - the token type ID (integer) and possibly the substring that you parsed which resulted in this token type which you then use as a value (string, enum, integer, etc).  Your lexer should then pass those tokens to your parsing engine (SLR/LR/whatever) - or the parsing engine can have a loop that calls "GetNextToken".  The parsing engine should primarily be dealing with the token type IDs to guide its state(s), but if you have something like a math expression evaluator you can immediately evaluate subexpressions in the middle of the parse instead of generating an AST if you want to.

But if you're not writing a high performance parser it really doesn't matter what you do.  Since you're even asking this question at all implies you do care, though.

Edited by Nypyren

## Create an account

Register a new account

1. 1
2. 2
3. 3
4. 4
Rutin
19
5. 5

• 13
• 14
• 9
• 9
• 9
• ### Forum Statistics

• Total Topics
632927
• Total Posts
3009250
• ### Who's Online (See full list)

There are no registered users currently online

×