Replace part of string in C++

Started by
46 comments, last by Servant of the Lord 12 years, 2 months ago
What Servant of the lord proposed is a huge improvement in readability, but substituting strings is expensive. I think it would be even cleaner and much more efficient to fill out a map (or unordered_map) from tag to value and then use a function that does all the substitutions in one pass.

std::map<std::string, std::string> tags;
tags["NAME"] = playerName;
tags["AGE"] = playerAge;
//...

std::string template = "Hello /NAME/, you're looking mighty spry for a /CLASS/ of /AGE/ years. [...]";

std::string text = apply_template(template, tags);
Advertisement

What Servant of the lord proposed is a huge improvement in readability, but substituting strings is expensive.

Premature optimization (on a text-based game) leads the OP down roads that are wastes of time... not program execution time, but the OP's development time. You're trading off program simplicity so the OP can understand it (Important!) for unnecessary and unneeded program execution speed (Not important). In the current situation, it's not a trade-off that's worth doing.

Besides, it's just as easy to alter the string inplace with a minor modification, if desired.
void ReplaceAll(std::string &textToModify, const std::string &toReplace, const std::string &replaceWith)
{
size_t pos = textToModify.find(toReplace);
while(pos != std::string::npos)
{
textToModify.replace(pos, toReplace.size(), replaceWith);
//Just to make sure we don't go into an infinite loop, if 'replaceWith' contains a match of 'toReplace'.
pos += replaceWith.size();
pos = textToModify.find(toReplace, pos);
}
}


It actually makes the code cleaner as well, but with the removal of the ability of using the function directly in a parameter of another function.

[quote name='alvaro' timestamp='1330545199' post='4917922']
What Servant of the lord proposed is a huge improvement in readability, but substituting strings is expensive.

Premature optimization (on a text-based game) leads the OP down roads that are wastes of time... not program execution time, but the OP's development time. You're trading off program simplicity so the OP can understand it (Important!) for unnecessary and unneeded program execution speed (Not important). In the current situation, it's not a trade-off that's worth doing.
[/quote]

The OP should develop a sense for what work is involved in what he is doing, and this is a good opportunity to realize that your implementation takes time O(n*k) to run, where n is the number of characters in the input and k is the number of tags (assuming tags and values are reasonably short), while the implementation I propose takes time O(n*log(k)), or even O(n) if you use unordered_map.

However, I proposed the alternative primarily because having alternatives is a good thing, and because I think the resulting calling code is actually more readable (it better expresses the intent of the code, leaving the details of whether you substitute the words one by one or all at the same time to the implementation).

Also, try not to get so defensive: I started by praising your proposal. smile.png

... it's easily remedied by using snprintf() instead.
[/quote]
This prevents buffer overruns, but doesn't correct the behaviour.


The OP should develop a sense for what work is involved in what he is doing, and this is a good opportunity to realize that your implementation takes time O(n*k) to run, where n is the number of characters in the input and k is the number of tags (assuming tags and values are reasonably short), while the implementation I propose takes time O(n*log(k)), or even O(n) if you use unordered_map.
[/quote]
Pity the poor user who is faced with a description long enough to warrant Big O analysis!

Also, try not to get so defensive: I started by praising your proposal. smile.png

Good advice. I do seem to be rather defensive lately. mellow.png
Ok, I am really tired to do much coding now but thanks A LOT for the replies and tommorow morning I will surely improve my code. For the moment, and just for the particular use I want to use the code for this seems to work flawlessly indeed.

void replace (string &amp; line, string &amp; name){

size_t pos;

pos = line.find("/NAME/");

while(pos != string::npos){

if (pos != string::npos) line.replace( (line.begin() + pos), (line.begin() + pos + 6), name);

pos += name.size()+1;
pos = line.find("/NAME/", pos);
} //while

}// replace FUNCTION

//
//
//


int main (){
...
replace(line, name);
...
}


This is a very rough solution I came up in like 10 minutes. Tommorow I will probably try all solutions suggested, though the map-tags solution seems really good.

Also some questions if you can answer please:

namespace {
static const std::string NAME_PLACEHOLDER = "/NAME/";
}


Why namespace and what is the name of the namespace?



I will look it up tommorow but just a small briefing, what are maps?


Also, I used iterators instead of the name's length as the function replaces the rest of the text if the name is larger than 6 characters.


Again, thanks a lot for you replies.

For the moment, and just for the particular use I want to use the code for this seems to work flawlessly indeed.
[/quote]
Note that you have a duplicate condition - you are checking if pos is npos twice each iteration - once in the loop condition and once in the body.

You should generally avoid having the same constant multiple times if you can. Here you have the /NAME/ constant three times, twice as string constants and once as the integer 6.The only thing that can happen is that you will change it in one place and forget to change one or both of the others. This is particularly insidious in this example because you might not notice this error if (as is probably the general case) the template string contains 0 or 1 instances of /NAME/, rather than more than 1.

We can reduce one occurrance by folding the find call into the loop condition. Some people dislike this style - they prefer loop conditions that are simple predicates. For example:


void replace (string &line, const string &name)
{
static const string NAME = "/NAME/";
size_t pos = 0;
while( (pos = line.find(NAME, pos)) != string::npos )
{

line.replace( (line.begin() + pos), (line.begin() + pos + NAME.length()), name);
pos += name.size() + 1;
}
}



Why namespace and what is the name of the namespace?
[/quote]
In this usage, it is an "anonymous" namespace. That is, declarations within it are only visible inside the translation unit that introduced this namespace. It allows more than one source file to have classes, types, functions, constants, etc with the same name.

IIf the constant would appear in a header file, then it is public and you shouldn't use an anonymous namespace.


I will look it up tommorow but just a small briefing, what are maps?
[/quote]
Map -> a "mapping". So map<string, string> is a mapping of strings to strings. Like how in a dictionary you can lookup definitions using a word - you could model this as map<Word, Definition>. In fact, in some languages maps are called dictionaries.


Also, I used iterators instead of the name's length as the function replaces the rest of the text if the name is larger than 6 characters.
[/quote]
I cannot parse this. I'm guessing it relates to your use of line.begin(): I'm not sure it is buying you much. The overloaded functions do the same thing. I'd use the non-iterator overload because you don't have iterators to start with. Converting them to iterators is extra work, and distracts from the intent of the algorithm.
Ok, so I tried almost everything mentioned although finally I didn't realize what's the use of templates as the input of the name must always be a string. Anyway, of course you can use templates if you want but I don't see the use for this particular program. It just gets more complicated for the programmer. Do we gain more speed as the template executes each time it's called? Also can you elaborate a little what OP is?

Anyway, what I finally came up with is the following code which as long as I tested it works for every case. That is if the user inputs a string with a space in between characters, if there are multiple arguments of the same type in a line and if there isn't any argument in a line.

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <string>
#include <fstream>
#include <sstream>
#include <map>

using namespace std;


class ReadGameFile {

public:

string line;
fstream file;
int count;
string args[7];


void enterDetails (){

for (count = 0; count <= 6; count++){

if (count == 0){ cout << "Please enter name: "; getline(cin, args[count]); }

else if (count == 1){ cout << "Please enter age: "; getline(cin, args[count]); }

else if (count == 2){ cout << "Please enter weight: "; getline(cin, args[count]); }

else if (count == 3){ cout << "Please enter class: "; getline(cin, args[count]); }

else if (count == 4){ cout << "Please enter race: "; getline(cin, args[count]); }

else if (count == 5){ cout << "Please enter hair color: "; getline(cin, args[count]); }

else if (count == 6){ cout << "Please enter pant size: "; getline(cin, args[count]); }

}

readFile();

}
void readFile (){

file.open("Rebels_after_start.txt");

while(file.good()){

getline(file,line);

repTags(line, args);

cout << line << '\n';
}

file.close();

}
void repTags(string &line, string args[]){

static map<int,string> tags;
size_t pos = 0;

tags[0] = "/NAME/";
tags[1] = "/AGE/";
tags[2] = "/WEIGHT/";
tags[3] = "/CLASS/";
tags[4] = "/RACE/";
tags[5] = "/HAIR-COLOR/";
tags[6] = "/PANT-SIZE/";

for (count = 0; count <= 6; count++){
while ( (pos = line.find(tags[count])) != string::npos){

line.replace(pos, tags[count].size(), args[count], 0, args[count].size());
pos += args[count].size() + 1;

}// while
}//for


} // mapTEXT FUNCTION
}; // ReadGameFile CLASS

int main (){

ReadGameFile game;

game.enterDetails();

return 0;

}



So the output I get is something like this:


Hello Commander Shepard, you're looking mighty spry for a infiltrator of 26 years.
I see you are still dying your hair (I don't have any hair... :/), which is mightily becoming of a human like yourself.
You're putting on a few pounds, however. You look like you weigh about 70 lbs, and your belly is bulging out quite alot. Is that a size 32 pantaloons you are wearing?

You need to get into shape before your journey, Commander Shepard, if you don't mind me pointing out the obvious.[/quote]


Now, I want to ask is there any other way to optimize the code and make it faster/better/more readable/more efficient?

If, so can you propose some ideas?

Again thanks a lot.

Ok, so I tried almost everything mentioned although finally I didn't realize what's the use of templates as the input of the name must always be a string. Anyway, of course you can use templates if you want but I don't see the use for this particular program. It just gets more complicated for the programmer. Do we gain more speed as the template executes each time it's called?
[/quote]
I believe alvaro was not talking about the C++ language feature when they said "template", but rather that the string featuring replacement sequences acts as a kind of "template string".


Also can you elaborate a little what OP is?
[/quote]
OP generally stands for "Original Poster".


Now, I want to ask is there any other way to optimize the code and make it faster/better/more readable/more efficient?
[/quote]
My comments on your current source are:

  • The string constants ("name", "/NAME/" (etc) and the number of constants 6 is hard coded in a number of places.
  • You have code duplication in the loop body of enterDetails
  • Your replacement loop doesn't leverage the algorithmic gains that alvaro hinted were possible.
  • Don't abuse static here.

In particular, for number 3 your solution is probably less efficient - you spend time building and iterating the map, but the algorithm is unchanged. The result is more work being done.

Here is my attempt at fixing some of these flaws:


#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <string>
#include <fstream>
#include <sstream>
#include <map>
#include <stdexcept>
#include <iterator>

using namespace std;

const char Placeholder = '/';

const string PlayerDetails[] = {
"name", "age", "weight", "class", "race", "hair color", "pant size"
};

const unsigned NumDetails = sizeof(PlayerDetails) / sizeof(PlayerDetails[0]);

typedef map<string, string> PlayerInfo;

string makePlaceholder(const string &attribute) {
string result;
result.push_back(Placeholder);
for(string::size_type i = 0 ; i < attribute.size() ; ++i) {
char c = attribute;
if(c == ' ') {
c = '-';
} else {
c = toupper(c);
}
result.push_back(c);
}
result.push_back(Placeholder);
return result;
}

PlayerInfo enterDetails(istream &input, ostream &output) {
PlayerInfo result;
for(unsigned i = 0 ; i < NumDetails ; ++i) {
const string &attribute = PlayerDetails;
output << "Please enter " << attribute << ": ";
string line;
getline(input, line);
string placeholder = makePlaceholder(attribute);
result.insert(make_pair(placeholder, line));
}
return result;
}

string readFile(const string &filename) {
fstream stream(filename.c_str());
if(!stream) {
throw runtime_error("Failed to open: " + filename);
}
return string((istreambuf_iterator<char>(stream)), istreambuf_iterator<char>());
}

void replaceTags(string &tagged, const PlayerInfo &player) {
string::size_type begin = 0;
while( (begin = tagged.find(Placeholder, begin)) != string::npos) {
string::size_type end = tagged.find(Placeholder, begin + 1);
if(end == string::npos) {
stringstream error;
error << "No matching " << Placeholder << " at position " << begin;
throw runtime_error(error.str());
}
int length = (end - begin) + 1;
string tag = tagged.substr(begin, length);

PlayerInfo::const_iterator it = player.find(tag);
if(it == player.end()) {
stringstream error;
error << "Unknown tag: '" << tag << "' between " << begin << " and " << end;
throw runtime_error(error.str());
}
tagged.replace(begin, length, it->second);
}
}

int main() {
PlayerInfo player = enterDetails(cin, cout);

// Show player information (debug)
for(PlayerInfo::iterator i = player.begin() ; i != player.end() ; ++i) {
cout << i->first << " -> " << i->second << '\n';
}
cout << "\n---------\n";

string tagged = readFile("Rebels_after_start.txt");

cout << tagged;
std::cout << "\n---------\n";

replaceTags(tagged, player);

cout << tagged;
std::cout << "\n---------\n";
}

Note that I have one definitive list of string constants. All other parts of the program infer information from this. So the user prompts use it, the /PLACEHOLDER/ values are built from these strings (uppercased with spaces replaced with hyphens). Even the number of constants is inferred from the size of the array.

Essentially you can now add or remove player attributes by making a single change to that array.

You can ignore the implementation of readFile() if you like. Essentially it gets "iterators" for the file, and constructs a string from these iterators. The result is the contents of the file in a string.

Perhaps the most interesting part is the implementation of replaceTags. It is relatively efficient - it only makes a single pass over the input string. It looks for patterns of two forward slashes, and attempts to determine the replacement using the player map. Unknown tags or cases where only a single forward slash is found are handled via exceptions.

One important gain is that the player attribute map is built once. If we were displaying reams of text to the user, we can keep re-using the same mapping over and over.

However, the result is that the code is becoming less clear and harder to read. Some of Servant Of The Lords earlier examples were better from this point of view, it is immediately clear what this code wants to do:


text = ReplaceAll(text, "/NAME/", playerName);
text = ReplaceAll(text, "/AGE/", playerAge);
text = ReplaceAll(text, "/WEIGHT/", playerAge);
text = ReplaceAll(text, "/CLASS/", playerClass);
text = ReplaceAll(text, "/RACE/", playerRace);
text = ReplaceAll(text, "/HAIR-COLOR/", playerHairColor);
text = ReplaceAll(text, "/PANT-SIZE/", playerPantSize);

In the above code, it is less clear that this is going to happen.
Ok thank you very much. In general is it preferable to do a more efficient code or a more readable one?

In enterDetails wouldn't be better if we initialized the line variable before the for loop?

I will optimize my code and come back.

Again thanks.

This topic is closed to new replies.

Advertisement