Sign in to follow this  

C++ String question

This topic is 3314 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, so I'm sitting here at 2am trying to get this c++ coursework sorted. Haven't run into many big hiccups but I just wanted to ask this. Basically, to cut it short I have to read in words from an input file. However, the words in the input file may have uppercase characters, punctuation, numbers etc and when I read them in I only want the words in lowercase format. I want all punctuation and numbers omitted from the strings (except for hyphens and apostrophes). I currently have this code:
void LowerCase(string &s)
{
	for(int i = 0; i <= s.size(); i++)
	{
		// if the character is an uppercase letter
		if(s[i] >= 65 && s[i] <= 90)
		{
			s[i] += 32;		// convert it to a lowercase
		}

		// if the character is not a lowercase letter
		if(!(s[i] >= 97 && s[i] <= 122))
		{
			// as long as the character is not a hyphen or an apostrophe, remove it
			if(!((s[i] == 39) || (s[i] == 45)))
			{
				s.erase(i, 1);
			}
		}
	}
}

However, I've tested it out with a short story. It seems to be working fine until I hit certain punctuation marks such as quotation marks, exclamation marks and they only ever seem to be at the end of the word. Any help on this would be appreciated. Cheers guys. Ray

Share this post


Link to post
Share on other sites
SiCrane you beautiful beautiful genius! All I needed was to put i -= 1 after the erase statement. Also needed to change the condition in the for loop to i < s.size() and not i <= s.size()

Cheers

Bregma I understand what you're saying, but I definitely want to change the argument, which is one of the sole reasons I passed it by reference.

Share this post


Link to post
Share on other sites
Function names should match what they do. Your function converts to lowercase - and that's fine. But it also strips certain characters - this should be part of the name too. In a few months time, you'll call LowerCase() on a string and forget about how it strips those characters.

Instead of writing LowerCaseAndStrip(), you could write two functions: LowerCase() and StripNonAlphaExcept(). StripNonAlphaExcept() could take two parameters, the string to be stripped and another string containing the characters you don't want removed. Should the need arise, you can write another function composed of the two (this is if you had a pure function, you could write a similar version for the version where you modify the argument):

std::string LowerCaseAndStrip(const std::string &s)
{
return LowerCase(StripNonAlphaExcept(s,"-\'"));
}



Other than that, you know you should use symbolic constants instead of magic numbers? Better still, you can use the character values themselves as the constants:

if(s[i] >= 'A' && s[i] <= 'Z')
{
s[i] += 'a' - 'A';
}



Of course there is already a way to do that, using std::tolower() from <cctype>:

#include <cctype>
#include <string>
#include <algorithm>

char tolower_helper(char c) { return static_cast<char>(std::tolower(c)); }

void LowerCase(std::string &s)
{
std::transform(s.begin(),s.end(),s.begin(),&tolower_helper);
}



Similarly, StripNonAlphaExcept() could be implemented in terms of std::remove_if and a suitably creative predicate functor.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Here's a hint: when you call erase(), the size of the string changes, what does that do to your loop iteration condition?


An additional hint: when erasing members poses a problem like this, you may want to iterate from the end to the beginning instead. :-)

Alex

Share this post


Link to post
Share on other sites
Quote:
Original post by Bregma
Try writing a fuunction declared like this.

string LowerCase(const string& s);

Unfortunately, that type of function signature is much less efficient when you want to modify the contents of the argument. Ex:

t = LowerCase(t);

In this case you'll generate, at a minimum, an extra temporary construction, destruction and assignment operator call. Since this is an assignment and not an initialization, RVO can't be applied directly to t. It gets worse if RVO isn't applied to the temporary that is used to assign to t.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by Bregma
Try writing a fuunction declared like this.

string LowerCase(const string& s);

Unfortunately, that type of function signature is much less efficient when you want to modify the contents of the argument. Ex:

t = LowerCase(t);

In this case you'll generate, at a minimum, an extra temporary construction, destruction and assignment operator call. Since this is an assignment and not an initialization, RVO can't be applied directly to t. It gets worse if RVO isn't applied to the temporary that is used to assign to t.


Agreed. But basically it's more a design question:

- Do I want LowerCase() simply to modify an existing string?
- Or do I want to derive a modified string from an existing one (e.g. if the original string must be preserved)?

Alex

Share this post


Link to post
Share on other sites
The function name is misleading either way.

The function is nothing but character replacement. Even if it did not modify strings directly, one should understand that modifying ASCII characters in this way to convert between lower/upper case applies only and only to US (and related) alphabets.

Implementing a function that actually does upper/lower case conversion is not even remotely a trivial task, and requires either per-alphabet tables, or uses Unicode.

Unfortunately, this type of coursework is based on dated concepts. While ASCII is still used, and still useful, one should keep in mind that majority of computer users do not use English alphabet, and large portion of them doesn't even use an equivalent to what ASCII considers an alphabet.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
The function name is misleading either way.



You're right but that point has already been raised.

Quote:

The function is nothing but character replacement. Even if it did not modify strings directly, one should understand that modifying ASCII characters in this way to convert between lower/upper case applies only and only to US (and related) alphabets.

Implementing a function that actually does upper/lower case conversion is not even remotely a trivial task, and requires either per-alphabet tables, or uses Unicode.


I doubt that his *coursework* requires him to implement a method that works beyond the English alphabet.

I might be wrong though. :-)

Quote:

Unfortunately, this type of coursework is based on dated concepts. While ASCII is still used, and still useful, one should keep in mind that majority of computer users do not use English alphabet, and large portion of them doesn't even use an equivalent to what ASCII considers an alphabet.


As a *basic exercise* for working with strings, it sounds appropriate though.

Alex

Share this post


Link to post
Share on other sites

This topic is 3314 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this