What is a token?
In general, I think "token" means "a simple and easy-to-handle thing that represents a more complicated thing"
When you tokenize a string, it means you break the string up into words, so that it's easier to work with. Like, if you tokenize the string "fluffy, bunny" or the string " fluffy bunny "
then in both cases you end up with the tokens "fluffy" and "bunny".
Tokens aren't always strings, but I think that's the most common usage.
When you tokenize a string, it means you break the string up into words, so that it's easier to work with. Like, if you tokenize the string "fluffy, bunny" or the string " fluffy bunny "
then in both cases you end up with the tokens "fluffy" and "bunny".
Tokens aren't always strings, but I think that's the most common usage.
There are five kinds of tokens: identifiers (names), keywords (template, int etc), literals (constants), operators (* etc) and other seperators.
I'd say that a token is the largest chunk of whatever is being processed that can be processed in one go by whatever is doing the processing. How's that for a general purpose definition? :)
Chris's definition is what I'd use if I were talking about a programming language (or trying to figure out what a compiler were talking about). I think pinacolada said the same thing as I did but differently. :)
Chris's definition is what I'd use if I were talking about a programming language (or trying to figure out what a compiler were talking about). I think pinacolada said the same thing as I did but differently. :)
Well, tokenizing is usually just something that splits a very large thing, into a container of smaller things, although at least in Java, and Python, there is a split command, that will return a container with the resulting objects from that split. So, a token, is just a string, that you can do anything that you want to do to it, as long as it is legal for your language definition of a string.
I use C++, and I guess the definition I'm looking for is "tokenizing" in terms of writing a scripting language.
I'm ALSO wondering what the C++ token is I see. It comes up now and then, but not much is in my books. I believe, when I see them, they are just a name with an underscore in front of it.
Example:
_name;
That confuses me, because it doesn't seem like a full statement, since there's no specifier. Thanks guys.
I'm ALSO wondering what the C++ token is I see. It comes up now and then, but not much is in my books. I believe, when I see them, they are just a name with an underscore in front of it.
Example:
_name;
That confuses me, because it doesn't seem like a full statement, since there's no specifier. Thanks guys.
In parsing theory a token is the smallest string of characters with meaning which doesn't rely on other tokens.
An example would be a number, roughly "[0-9]+[.0-9]*"
This is also called a lexeme.(or lexical element)
The lexer divides the input stream into these tokens which are fed to a parser which decides how to interpret them.
An example would be a number, roughly "[0-9]+[.0-9]*"
This is also called a lexeme.(or lexical element)
The lexer divides the input stream into these tokens which are fed to a parser which decides how to interpret them.
So let's say I have a script file that I want to read in. Let's say there's lines:
NEWINT gold (200)
NEWINT silver (100)
If I write a program that does this:
While(!EOF) {
-Read until space or newline (to get one word)
-If word == NEWINT
--Read next word
--Allocate new integer / create pointer
--Index the integer by it's name (next word)
--Initialize
}
Very simple psuedo above, so I hope it convey's my way of thinking.
Does it mean that I'm tokenizing the script by reading each word and determinging what it is?
NEWINT gold (200)
NEWINT silver (100)
If I write a program that does this:
While(!EOF) {
-Read until space or newline (to get one word)
-If word == NEWINT
--Read next word
--Allocate new integer / create pointer
--Index the integer by it's name (next word)
--Initialize
}
Very simple psuedo above, so I hope it convey's my way of thinking.
Does it mean that I'm tokenizing the script by reading each word and determinging what it is?
I have to ask,are you doing this as a learning experience or to get things done(with a game I would guess).
If you are trying to get things done, do yourself a favor and get a scripting library. I suggest Lua(my favorite), Angelscript or Small. They have been tested, are free, and will be faster than a solution you could roll yourself.
If you are just learning, buy a good book on the subject compilers and computer languages are both very interesting topics.
Typical human-being parsers are usually what we call recursive descent parsers. They recursively call subroutines looking for particular language features, usually pushing the results of what they find onto a stack of some kind. Look at this:
Here is your line:
NEWINT <varname> <left-parentheses> <number> <right-parentheses>
and here is the pseudo-code
The trick here is to save the place you are in the token list before you call parse_newint_command() and if it fails, restore to that point.
Sorry if this is unclear.
If you are trying to get things done, do yourself a favor and get a scripting library. I suggest Lua(my favorite), Angelscript or Small. They have been tested, are free, and will be faster than a solution you could roll yourself.
If you are just learning, buy a good book on the subject compilers and computer languages are both very interesting topics.
Typical human-being parsers are usually what we call recursive descent parsers. They recursively call subroutines looking for particular language features, usually pushing the results of what they find onto a stack of some kind. Look at this:
Here is your line:
NEWINT <varname> <left-parentheses> <number> <right-parentheses>
and here is the pseudo-code
if parse_newint_command() then ....store integer, varname, etcfunction parse_newint_command() if token!='NEWINT' then return false; next_token(); // move from NEWINT to the next token if not parse_var() then return false; next_token(); if token!='(' then return false; ..end
The trick here is to save the place you are in the token list before you call parse_newint_command() and if it fails, restore to that point.
Sorry if this is unclear.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement