Jump to content
  • Advertisement
Sign in to follow this  
petemurray

Help me plz :(

This topic is 4602 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, Im very worried as I do not know how to tokenize in Scheme. I am unsure about tokenization in Scheme, and searching it on google doesn't help. It is very confusing. I am trying to tokenize this: "Hey you !" --> "Hey" "you" " " "!" Is tokenization the only way in Scheme? Or is there another way using recursion? As I want to add these separate words to a list, but I do not know how... as I am only used to adding one letter. Is there anyone out there that can help me? :(

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
I'd start by transforming your strings into lists of characters, using the standard string->list function:


> (string->list "Hey you !")
(#\H #\e #\y #\space #\y #\o #\u #\space #\!)
>


Now you can solve the problem using list manipulations, which is what scheme is good at.

For example, now you can check if the first character in your sentence is a space (which is the token delimiter):


> (eqv? #\space (car (string->list "Hey you !")))
#f
> (eqv? #\space (car (string->list " Hey you !")))
#t


Can you take it from there?

Share this post


Link to post
Share on other sites
I understand what you are saying.
I also understand what you are saying.. going through the list and checking one by one whether the character is a space or not.

But I do not know how to separate "Hey you" into "Hey" " " "you".

I am not sure how to separate them.. i thought about adding each letter to a list.. but I don't know how to separate each word individually.

please help :(

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Well, this is homework, right? So I can only give you hints.

Take smaller steps. How about getting just the first word in the sentence out? For example, this stores characters in an accumulator until it encounters a space or the list is empty, and then it stops, returning the first word. If it encounters a space, it returns nothing:



(define (get-first-word accum char-list)
(cond ((null? char-list) (list->string (reverse accum)))
((eqv? #\space (car char-list)) (list->string (reverse accum)))
(else (get-first-word (cons (car char-list) accum) (cdr char-list)))))

> (get-first-word '() (string->list "hey you"))
"hey"
> (get-first-word '() (string->list "baaa baa"))
"baaa"
> (get-first-word '() (string->list " hey you"))
""


Now, this is a possible approach, but it still doesn't do what you want it to: you need a way of keeping track of how much of the string you've tokenized, and you need a way to tokenize spaces. That's up to you.

Share this post


Link to post
Share on other sites
Ok, im fairly lost when it comes to programming.
BUt i have an idea..

What if i use the get-first-word in another recursive operation.

Eg:
If first char of string is = #\space, then add (get-first-word of first char)to "list1".. apply to rest of the list. (recursive).

If first char is char, then add (get-first-word of first char) to "list1" and repeat.

Does this sound right?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I intentionally wrote get-first-word as an example that should not be very useful, so you'd do the assignment yourself. It lacks important functionality.

Think about the algorithm before you think about its implementation in a particular language.

Remember that in recursion, you've got to get closer to a base case at every call so that the recursion terminates. Let's see how WE would tokenize a string, say "Hello cruel world!", in a recursive manner. This is the important bit, once we've done that, we just have to translate it into whatever programming language we want, which should be simple if we know the syntax.

We have two variables involved our recursion: a list of tokens and the REMAINING string. The recursion ends when the remaining string is empty, and the list of tokens is returned.

Starting case:

tokens = ()
remaining-string = "Hello cruel world!"

if string-not-empty? then get-first-token

tokens = ("Hello")
remaining-string = " cruel world!"

if string-not-empty? then get-first-token

tokens = ("Hello" " ")
remaining-string = "cruel world!"

if string-not-empty? then get-first-token

tokens = ("Hello" " " "cruel")
remaining-string = " world!"

if string-not-empty? then get-first-token

tokens = ("Hello" " " "cruel" " ")
remaining-string = "world!"

if string-not-empty? then get-first-token

tokens = ("Hello" " " "cruel" " " "world!")
remaining-string = ""

base case of the recursion: the string is empty.
RETURN tokens.

Share this post


Link to post
Share on other sites
Ok so I understand what you are saying.
I am visualising it all in my head as I type.
Except, I have one more question.
Tokens and remaining string confuses me. If I am to get the first word from "hey you" --> "hey"
I understand how to add that to a list.
However, how do you "cut" the words out from remaining string? (I am sorry my scheme knowledge is very basic.)
Like usually, if you are just extracting one letter at a time, you just say (remaining string - 1). But this one is far more complex.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
The 'cutting' of the token out of the string is what I intentionally left out of the example get-first-word function.

If you look at get-first-word, you'll notice that it 'cuts' the list of chars as it goes along; so when accum is returned, char-list is exactly the original list minus the token stored in accum: it is what you were looking for. You just need a way for the function to return it in addition to the token!

Instead of returning just (list->string (reverse accum)), you could return a cons pair of the formatted accum and char-list, ie

(cons (list->string (reverse accum)) char-list)

and extract them later with car and cdr.

Remeber that get-first-word doesn't take spaces into account as tokens, so you'll have to write your own version that behaves correctly. Now it is a function that takes lists of characters and returns a (token . remaining-list) cons pair. In this way you keep track of how much of the string you've already parsed.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!