Public Group

# How to strip away punctuation from a string variable?

This topic is 4257 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have this small problem. Say I want to read words from a file, but anytime I read in a word like "Hello!", I want to strip away the "!" and then enter the word into my data structure. What would be the best way to go about that? I've been racking my mind trying to think of a solution with what I know. I know string variables can be treated like arrays. I was thinking of searching inside the string until I got to an index value that was equal to "!" or "." or "?", and then resetting that value to NULL. Would something like that work or is there a better, more elegant, solution?

##### Share on other sites
This is kind of what I have tested so far. It works but it just seems so heavy handed.

		word[0] = tolower(word[0]); //Convert first letter of all words to lower case		for(int i = 0; i<20; i++)		{			if(word == '.')			{				word = ' ';			}		}

##### Share on other sites
if you were doing it that way ,rather than testing each punctuation mark seperatly do it with an or operator "||"

  		word[0] = tolower(word[0]); //Convert first letter of all words to lower case		for(int i = 0; i<20; i++)		{			if(word == '.'||word == '!')			{				word = ' ';			}		}

to be honest i cant think of a better way either but i dont do much data manipulation with strings sorry

##### Share on other sites
Quote:
 Original post by Wardyahhif you were doing it that way ,rather than testing each punctuation mark seperatly do it with an or operator "||"*** Source Snippet Removed ***to be honest i cant think of a better way either but i dont do much data manipulation with strings sorry

Actually that's how I did it, I just put up one puntuation mark up there for an example. Thanks for the suggestion anyway.

Here's the real problem though. Later on I want to only keep unique words in my data structure, right?

So say I get "Hello!", and then I strip away the "!" and replace it with NULL. So the array would look like this:
[H][e][l][l][o][null], right?

But what happens if I get "Hello" at the beginning of the sentence? Will the last index of that would be NULL, or would it even exist? What is the last index of a string variable? Is it NULL or something else?

##### Share on other sites
You didn't say which programming language you are using. If it's C++ you can do:

std::string my_string = "This. Is. a Test. . .";my_string.erase(std::remove_if(my_string.begin(), my_string.end(),   std::bind2nd(std::equal_to<char>(), '.')), my_string.end());

##### Share on other sites
a.) Letter case doesn't affect punctuation. The call to tolower is unnecessary.
b.) You're using an array of chars as a string, which is half of your problem:

#include <algorithm>#include <locale>#include <string>...// a lot of the constructs below are confusing; look them up. also look up the header// files i included above.//replace_if(word.begin(), word.end(), bind2nd(ispunct(), locale("")), ' ');

c.) You're using C or C++, which is the rest of your problem. In more dynamic languages, you'd just use a substitution regular expression and be done. [smile]

Happy hacking!

##### Share on other sites
Quote:
 Original post by Oluseyia.) Letter case doesn't affect punctuation. The call to tolower is unnecessary.b.) You're using an array of chars as a string, which is half of your problem:*** Source Snippet Removed ***c.) You're using C or C++, which is the rest of your problem. In more dynamic languages, you'd just use a substitution regular expression and be done. [smile]Happy hacking!

a)The reason I am making each first letter lower case is because I'm not sure if "hello" and "Hello" will be treated the same when compared with each other (which I need to do later on). So I'm just converting every word to start with a lower case. Are they considered the same or not?

b) Sorry for not making this more clear. I am using a string variable but I'm just treating it like a char array. i.e - string myword = "hello", then accessing myword[0]

c) Can you explain this a bit more for a beginner? =)

Thanks for the code snippet btw, I'll analyze it.

Is there a way to give people positive points for helping you out here?

##### Share on other sites
Quote:
 Original post by Omid GhavamiYou didn't say which programming language you are using. If it's C++ you can do:*** Source Snippet Removed ***

I am using C++. I've never seen that before.

In the following code snippet, can this be modified to check for multiple characters or do I have to do a new block of code for each type of punctuation?

std::equal_to<char>(), '.'))

1. 1
2. 2
Rutin
16
3. 3
4. 4
5. 5

• 13
• 26
• 10
• 11
• 9
• ### Forum Statistics

• Total Topics
633735
• Total Posts
3013593
×