Jump to content
  • Advertisement
Sign in to follow this  
DesCr

What is a token?

This topic is 5126 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement
In general, I think "token" means "a simple and easy-to-handle thing that represents a more complicated thing"

When you tokenize a string, it means you break the string up into words, so that it's easier to work with. Like, if you tokenize the string "fluffy, bunny" or the string " fluffy       bunny "

then in both cases you end up with the tokens "fluffy" and "bunny".

Tokens aren't always strings, but I think that's the most common usage.

Share this post


Link to post
Share on other sites
There are five kinds of tokens: identifiers (names), keywords (template, int etc), literals (constants), operators (* etc) and other seperators.

Share this post


Link to post
Share on other sites
I'd say that a token is the largest chunk of whatever is being processed that can be processed in one go by whatever is doing the processing. How's that for a general purpose definition? :)

Chris's definition is what I'd use if I were talking about a programming language (or trying to figure out what a compiler were talking about). I think pinacolada said the same thing as I did but differently. :)

Share this post


Link to post
Share on other sites
Well, tokenizing is usually just something that splits a very large thing, into a container of smaller things, although at least in Java, and Python, there is a split command, that will return a container with the resulting objects from that split. So, a token, is just a string, that you can do anything that you want to do to it, as long as it is legal for your language definition of a string.

Share this post


Link to post
Share on other sites
I use C++, and I guess the definition I'm looking for is "tokenizing" in terms of writing a scripting language.

I'm ALSO wondering what the C++ token is I see. It comes up now and then, but not much is in my books. I believe, when I see them, they are just a name with an underscore in front of it.

Example:

_name;

That confuses me, because it doesn't seem like a full statement, since there's no specifier. Thanks guys.

Share this post


Link to post
Share on other sites
In parsing theory a token is the smallest string of characters with meaning which doesn't rely on other tokens.

An example would be a number, roughly "[0-9]+[.0-9]*"
This is also called a lexeme.(or lexical element)

The lexer divides the input stream into these tokens which are fed to a parser which decides how to interpret them.



Share this post


Link to post
Share on other sites
So let's say I have a script file that I want to read in. Let's say there's lines:

NEWINT gold (200)
NEWINT silver (100)

If I write a program that does this:

While(!EOF) {
-Read until space or newline (to get one word)
-If word == NEWINT
--Read next word
--Allocate new integer / create pointer
--Index the integer by it's name (next word)
--Initialize
}

Very simple psuedo above, so I hope it convey's my way of thinking.

Does it mean that I'm tokenizing the script by reading each word and determinging what it is?

Share this post


Link to post
Share on other sites
I have to ask,are you doing this as a learning experience or to get things done(with a game I would guess).

If you are trying to get things done, do yourself a favor and get a scripting library. I suggest Lua(my favorite), Angelscript or Small. They have been tested, are free, and will be faster than a solution you could roll yourself.

If you are just learning, buy a good book on the subject compilers and computer languages are both very interesting topics.

Typical human-being parsers are usually what we call recursive descent parsers. They recursively call subroutines looking for particular language features, usually pushing the results of what they find onto a stack of some kind. Look at this:

Here is your line:
NEWINT <varname> <left-parentheses> <number> <right-parentheses>

and here is the pseudo-code

if parse_newint_command() then ....store integer, varname, etc

function parse_newint_command()
if token!='NEWINT' then return false;
next_token(); // move from NEWINT to the next token
if not parse_var() then return false;
next_token();
if token!='(' then return false;
..
end


The trick here is to save the place you are in the token list before you call parse_newint_command() and if it fails, restore to that point.

Sorry if this is unclear.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!