Jump to content
  • Advertisement
Sign in to follow this  
xeloj

How to strip away punctuation from a string variable?

This topic is 4257 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have this small problem. Say I want to read words from a file, but anytime I read in a word like "Hello!", I want to strip away the "!" and then enter the word into my data structure. What would be the best way to go about that? I've been racking my mind trying to think of a solution with what I know. I know string variables can be treated like arrays. I was thinking of searching inside the string until I got to an index value that was equal to "!" or "." or "?", and then resetting that value to NULL. Would something like that work or is there a better, more elegant, solution?

Share this post


Link to post
Share on other sites
Advertisement
This is kind of what I have tested so far. It works but it just seems so heavy handed.

		word[0] = tolower(word[0]); //Convert first letter of all words to lower case
for(int i = 0; i<20; i++)
{
if(word == '.')
{
word = ' ';
}
}


Share this post


Link to post
Share on other sites
if you were doing it that way ,rather than testing each punctuation mark seperatly do it with an or operator "||"



word[0] = tolower(word[0]); //Convert first letter of all words to lower case
for(int i = 0; i<20; i++)
{
if(word == '.'||word == '!')
{
word = ' ';
}
}



to be honest i cant think of a better way either but i dont do much data manipulation with strings sorry




Share this post


Link to post
Share on other sites
Quote:
Original post by Wardyahh
if you were doing it that way ,rather than testing each punctuation mark seperatly do it with an or operator "||"

*** Source Snippet Removed ***

to be honest i cant think of a better way either but i dont do much data manipulation with strings sorry



Actually that's how I did it, I just put up one puntuation mark up there for an example. Thanks for the suggestion anyway.

Here's the real problem though. Later on I want to only keep unique words in my data structure, right?

So say I get "Hello!", and then I strip away the "!" and replace it with NULL. So the array would look like this:
[H][e][l][l][o][null], right?

But what happens if I get "Hello" at the beginning of the sentence? Will the last index of that would be NULL, or would it even exist? What is the last index of a string variable? Is it NULL or something else?

Share this post


Link to post
Share on other sites
You didn't say which programming language you are using. If it's C++ you can do:


std::string my_string = "This. Is. a Test. . .";

my_string.erase(std::remove_if(my_string.begin(), my_string.end(),
std::bind2nd(std::equal_to<char>(), '.')), my_string.end());

Share this post


Link to post
Share on other sites
a.) Letter case doesn't affect punctuation. The call to tolower is unnecessary.
b.) You're using an array of chars as a string, which is half of your problem:


#include <algorithm>
#include <locale>
#include <string>

...

// a lot of the constructs below are confusing; look them up. also look up the header
// files i included above.
//
replace_if(word.begin(), word.end(), bind2nd(ispunct(), locale("")), ' ');



c.) You're using C or C++, which is the rest of your problem. In more dynamic languages, you'd just use a substitution regular expression and be done. [smile]

Happy hacking!

Share this post


Link to post
Share on other sites
Quote:
Original post by Oluseyi
a.) Letter case doesn't affect punctuation. The call to tolower is unnecessary.
b.) You're using an array of chars as a string, which is half of your problem:

*** Source Snippet Removed ***

c.) You're using C or C++, which is the rest of your problem. In more dynamic languages, you'd just use a substitution regular expression and be done. [smile]

Happy hacking!


a)The reason I am making each first letter lower case is because I'm not sure if "hello" and "Hello" will be treated the same when compared with each other (which I need to do later on). So I'm just converting every word to start with a lower case. Are they considered the same or not?

b) Sorry for not making this more clear. I am using a string variable but I'm just treating it like a char array. i.e - string myword = "hello", then accessing myword[0]

c) Can you explain this a bit more for a beginner? =)

Thanks for the code snippet btw, I'll analyze it.

Is there a way to give people positive points for helping you out here?

Share this post


Link to post
Share on other sites
Quote:
Original post by Omid Ghavami
You didn't say which programming language you are using. If it's C++ you can do:

*** Source Snippet Removed ***


I am using C++. I've never seen that before.

In the following code snippet, can this be modified to check for multiple characters or do I have to do a new block of code for each type of punctuation?

std::equal_to<char>(), '.'))

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!