Sign in to follow this  

Token insertion (?)

This topic is 4105 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I have a script language in which I am now able to tokenize. Once I have tokenized a line of "raw code", I place a token into a "std::vector" for further processing at a later stage. My questions is once I have built a list of tokens, I have to search though the list to find the appropriate token. If the list has (theorically) hundreds of tokens I am now faced with the dreaded O(n) problem, which is very slow. Is there a way to make this faster? One possable solution to the problem is to use a "std::map" along with ether a 'hash' function or 'handler' to perform the needed insertion and look-up. I'd like to get some feed back on the problem and perhaps some hints, suggestion or comments please. Thank you, Sabrina.

Share this post


Link to post
Share on other sites
It sounds like you're implementing jumps or function calls. In that case, the usual approach is to keep a map with the appropriate identifiers as you suggested that holds pointers/indices to the desired destination.

Otherwise, please be more specific :)

Share this post


Link to post
Share on other sites

Hi guys,

First, I was somewhat vague about what I am trying to ask, so let me back up and explain.

Here is a very primitive script file that has a variable and corresponding value. (The 'script' file is just a fancy ".ini" file.)
[SOURCE]
{
Size = 1;
}
[/SOURCE]


When I scan the file to create Tokens, I get the following:
[SOURCE]
TokenType: LEFTBRACE
TokenName: {
TokenValue: NONE

TokenType: LITERAL (Variable)
TokenName: Size
TokenValue: NONE

TokenType: ASSIGNMENT
TokenName: =
TokenValue: NONE

TokenType: INTEGER
TokenName: 1
TokenValue: 1

TokenType: SEMICOLON
TokenName: ;
TokenValue: NONE

TokenType: RIGHTBRACE
TokenName: }
TokenValue: NONE
[/SOURCE]


Once I have generated a token, I place said token into a list (std::vector). When the EOF marker ('\0' since I'm reading from a buffer and not a file.) has been reached, I pass the token list to the Parser. Once the Parser has the list, it itor's though the list validating each token. (evaluate the "grammar rules" of my script.)

Second, let me back up and ask this question (I think Kylotan hinted on this).
Is this the proper methodology of parsing scripts? That is, instead of creating a list of tokens and passing that list off to the parser, should I be instead be creating a token THEN pass that token to the parser? Is this what you meat Kylotan by .."processed linearly", if not could you clearify please?

Any thoughts, comments, suggestions?

Thanks,


Sabrina

Share this post


Link to post
Share on other sites
There is no 'proper' methodology; only different methodologies that suit different tasks. What you do with those tokens depends on several things, such as how you're going to execute the language, how large/long the token stream is going to be, whether tokenisation is context dependent and requires knowledge of previous token order, etc.

Share this post


Link to post
Share on other sites
The tokenisers and compiler programs I've worked with process the tokens one at a time (with a bit of read ahead). Is there any particular reason for reading in the whole thing at once then parsing? It's possible to design a tokeniser to only tokenise when needed.

Parsers I've written call scanner.scan() then access scanner.token then repeat (scan also returns the token for simple operations).

Share this post


Link to post
Share on other sites
a simple tokenizer i made uses one big string and has a 'current token' char ptr.

The tokens in the string were delimited by a null character and two for the last. then on 'advance to next' i just do while (*curtkn) ++curtkn; return ++curtkn;

this could be optimized to not have to inc to next token but hey if most of ur tokens are only 3-5 chars its not going to make much differences.

Share this post


Link to post
Share on other sites

This topic is 4105 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this