Jump to content
  • Advertisement
Sign in to follow this  
raydey

C++ String question

This topic is 3623 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, so I'm sitting here at 2am trying to get this c++ coursework sorted. Haven't run into many big hiccups but I just wanted to ask this. Basically, to cut it short I have to read in words from an input file. However, the words in the input file may have uppercase characters, punctuation, numbers etc and when I read them in I only want the words in lowercase format. I want all punctuation and numbers omitted from the strings (except for hyphens and apostrophes). I currently have this code:
void LowerCase(string &s)
{
	for(int i = 0; i <= s.size(); i++)
	{
		// if the character is an uppercase letter
		if(s >= 65 && s <= 90)
		{
			s += 32;		// convert it to a lowercase
		}

		// if the character is not a lowercase letter
		if(!(s >= 97 && s <= 122))
		{
			// as long as the character is not a hyphen or an apostrophe, remove it
			if(!((s == 39) || (s == 45)))
			{
				s.erase(i, 1);
			}
		}
	}
}

However, I've tested it out with a short story. It seems to be working fine until I hit certain punctuation marks such as quotation marks, exclamation marks and they only ever seem to be at the end of the word. Any help on this would be appreciated. Cheers guys. Ray

Share this post


Link to post
Share on other sites
Advertisement
Here's a hint: when you call erase(), the size of the string changes, what does that do to your loop iteration condition?

Share this post


Link to post
Share on other sites
Rule of thumb: use pure functions (functions that do not modify their arguments) if you can. Try writing a fuunction declared like this.

string LowerCase(const string& s);

Share this post


Link to post
Share on other sites
SiCrane you beautiful beautiful genius! All I needed was to put i -= 1 after the erase statement. Also needed to change the condition in the for loop to i < s.size() and not i <= s.size()

Cheers

Bregma I understand what you're saying, but I definitely want to change the argument, which is one of the sole reasons I passed it by reference.

Share this post


Link to post
Share on other sites
Function names should match what they do. Your function converts to lowercase - and that's fine. But it also strips certain characters - this should be part of the name too. In a few months time, you'll call LowerCase() on a string and forget about how it strips those characters.

Instead of writing LowerCaseAndStrip(), you could write two functions: LowerCase() and StripNonAlphaExcept(). StripNonAlphaExcept() could take two parameters, the string to be stripped and another string containing the characters you don't want removed. Should the need arise, you can write another function composed of the two (this is if you had a pure function, you could write a similar version for the version where you modify the argument):

std::string LowerCaseAndStrip(const std::string &s)
{
return LowerCase(StripNonAlphaExcept(s,"-\'"));
}



Other than that, you know you should use symbolic constants instead of magic numbers? Better still, you can use the character values themselves as the constants:

if(s >= 'A' && s <= 'Z')
{
s += 'a' - 'A';
}



Of course there is already a way to do that, using std::tolower() from <cctype>:

#include <cctype>
#include <string>
#include <algorithm>

char tolower_helper(char c) { return static_cast<char>(std::tolower(c)); }

void LowerCase(std::string &s)
{
std::transform(s.begin(),s.end(),s.begin(),&tolower_helper);
}



Similarly, StripNonAlphaExcept() could be implemented in terms of std::remove_if and a suitably creative predicate functor.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Here's a hint: when you call erase(), the size of the string changes, what does that do to your loop iteration condition?


An additional hint: when erasing members poses a problem like this, you may want to iterate from the end to the beginning instead. :-)

Alex

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Try writing a fuunction declared like this.

string LowerCase(const string& s);

Unfortunately, that type of function signature is much less efficient when you want to modify the contents of the argument. Ex:

t = LowerCase(t);

In this case you'll generate, at a minimum, an extra temporary construction, destruction and assignment operator call. Since this is an assignment and not an initialization, RVO can't be applied directly to t. It gets worse if RVO isn't applied to the temporary that is used to assign to t.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by Bregma
Try writing a fuunction declared like this.

string LowerCase(const string& s);

Unfortunately, that type of function signature is much less efficient when you want to modify the contents of the argument. Ex:

t = LowerCase(t);

In this case you'll generate, at a minimum, an extra temporary construction, destruction and assignment operator call. Since this is an assignment and not an initialization, RVO can't be applied directly to t. It gets worse if RVO isn't applied to the temporary that is used to assign to t.


Agreed. But basically it's more a design question:

- Do I want LowerCase() simply to modify an existing string?
- Or do I want to derive a modified string from an existing one (e.g. if the original string must be preserved)?

Alex

Share this post


Link to post
Share on other sites
The function name is misleading either way.

The function is nothing but character replacement. Even if it did not modify strings directly, one should understand that modifying ASCII characters in this way to convert between lower/upper case applies only and only to US (and related) alphabets.

Implementing a function that actually does upper/lower case conversion is not even remotely a trivial task, and requires either per-alphabet tables, or uses Unicode.

Unfortunately, this type of coursework is based on dated concepts. While ASCII is still used, and still useful, one should keep in mind that majority of computer users do not use English alphabet, and large portion of them doesn't even use an equivalent to what ASCII considers an alphabet.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!