Archived

This topic is now archived and is closed to further replies.

emileej

Quick find and replace on strings

Recommended Posts

How is this done the fastest? Is there a standard function I missed which will do the job fast? What I am lookin for it a function like this:
int replaces=FindAndReplace(char *str,char *search,char *replace);
"On a long enough timeline the survival rate of everyone drops to zero" - Fight Club [edited by - emileej on November 15, 2003 8:01:25 AM]

Share this post


Link to post
Share on other sites
strstr() finds an occurance of a substring in a string, returning the offset of the substring in the string if it finds one.

Replacing it would be another story because you might need to resize the string. I wonder if std::string has a library for this?

Share this post


Link to post
Share on other sites
Hey I could use some of you optimization freaks in here... The functions needs to be run quite a number of times doin a lot of replacing.
More specified function:

int FindAndReplace(char *str,const char *search,const char *replace);
//...
char *str="Hey you fuckhead I am so fucking tired of your fucking my fucking girlfriend.";
printf("Replaces: %d Result: %s",FindAndReplace(str,"fuck","OOPS"),str);
/*
Output should be:
Replaces: 4 Result: Hey you OOPShead I am so OOPSing tired of your OOPSing my OOPSing girlfriend.
*/

Share this post


Link to post
Share on other sites
Heres an attempt that failed (Got failed assert when running the code):

int strreplace(char *str,const char *search,const char *replace){
int found=0;
for(int i=0,j;i<strlen(str);i++){
for(j=0;j<strlen(search);j++)
if(str[i+j]!=search[j])
break;
if(j==strlen(search)){
found++;
int startlen=i,endlen=strlen(str)-i-strlen(search),newlen=startlen+strlen(replace)+endlen;
char *first=(char*)malloc(sizeof(char)*startlen);
memcpy(first,str,sizeof(char)*startlen);
char *second=(char*)malloc(sizeof(char)*endlen);
memcpy(second,str+i+strlen(search),sizeof(char)*endlen);
realloc(str,sizeof(char)*newlen+1);
memcpy(str,first,sizeof(char)*startlen);
free(first);
memcpy(str+startlen,replace,sizeof(char)*strlen(replace));
memcpy(str+startlen+strlen(replace),second,endlen);
free(second);
}
}
return found;
}

Share this post


Link to post
Share on other sites
It will not sacrifice speed?

*sigh* I won''t even comment on that.

There is a build in function?

std::string::find and std::string::replace.



[ Start Here ! | How To Ask Smart Questions | Recommended C++ Books | C++ FAQ Lite | Function Ptrs | CppTips Archive ]
[ Header Files | File Format Docs | LNK2001 | C++ STL Doc | STLPort | Free C++ IDE | Boost C++ Lib | MSVC6 Lib Fixes ]

Share this post


Link to post
Share on other sites
quote:
Original post by emileej
Anyone here who wanna?


The greatest performance gain you can ever get when programming is when you turn a non-working application into a working one.

Time for you to learn how to program. Use the C++ library.
(yes, I''m tired of debunking the same myths over and over again.)



[ Start Here ! | How To Ask Smart Questions | Recommended C++ Books | C++ FAQ Lite | Function Ptrs | CppTips Archive ]
[ Header Files | File Format Docs | LNK2001 | C++ STL Doc | STLPort | Free C++ IDE | Boost C++ Lib | MSVC6 Lib Fixes ]

Share this post


Link to post
Share on other sites
To emileej:

Using std make your life simpler...if your code is standard you can share it with others.
You can derive your own classes form std by adding new functions.
It will sacrifice speed? Why? The std::string for example is simply a template vector of char and its routines are similar to the routines you may implement to do the same thing...but they are already debugged and tested!!! Why reinvent the wheel?

Share this post


Link to post
Share on other sites
quote:
Original post by emileej
Look - if this is the kinda stuff you intend to post then please leave this thread.


Look - you're worrying about performance when you don't even have a working function. You really think that's smart ?

Look - do you think you're going to be a good programmer if you keep believing in rumours about performance that may have been true 15 years ago ?

Look - don't you think that library implementers strive to produce the fastest and best code they can ? Or are professional library writers just idiots and you know better ?

Heck. The source code for the standard string class is usually available. Why don't you just go and check it out ? That will answer all your questions about performance.

And by the way, just in case you haven't noticed, I did answer your question in my first post. But of course, like hundred of programmers before you, you will have to RTFM for it to be of any use. Not that I expect you to.

So...

You use std::string::find to find the offset of your substring if it is present. Then you use std::string::replace to replace it with the new string.


std::string foo = "abcdefghijkl";
std::string old_str = "efgh";
std::string new_str = "********";

foo.replace( foo.find(old_str), old_str.size(), new_str );


There. Happy ? Now you can cut and paste it and move on, until your next problems sees you rushing back for somebody to hold your hand because you're unwilling to learn and run away when sombody tells you something you don't want to hear.

And before you ask, yes, I'm old, bitter and crotchety.


[ Start Here ! | How To Ask Smart Questions | Recommended C++ Books | C++ FAQ Lite | Function Ptrs | CppTips Archive ]
[ Header Files | File Format Docs | LNK2001 | C++ STL Doc | STLPort | Free C++ IDE | Boost C++ Lib | MSVC6 Lib Fixes ]


[edited by - Fruny on November 15, 2003 8:46:53 AM]

Share this post


Link to post
Share on other sites
Now take off your "MR. Ass" hat and try to think clearly...
quote:
Look - you''re worrying about performance when you don''t even have a working function. You really think that''s smart ?
No I dont, but I dont think that creating one function just to throw it away is very smart either - do you?
quote:
Look - do you think you''re going to be a good programmer if you keep believing in rumours about performance that may have been true 15 years ago ?
If you were thinkin clearly you might have asked "why oh why do you need it to be fast" or "are you sure you need it to be fast".
quote:
Look - don''t you think that library implementers strive to produce the fastest and best code they can ? Or are professional library writers just idiots and you know better ?
Dammn its hard to take any of your crap seriously... I have read some threads in here where people needed some task done quick and where the stl was not quick enough.
quote:
Heck. The source code for the standard string class is usually available. Why don''t you just go and check it out ? That will answer all your questions about performance.
Because it it my experience that somewhere in here there are ppl who might just catch my thread and make a simple quick reply without flaming me wid comments like "learn to program" - man youre like the lowest scum I have ever dug up in here.
quote:
There. Happy ? Now you can cut and paste it and move on, until your next problems sees you rushing back for somebody to hold your hand because you''re unwilling to learn and run away when sombody tells you something you don''t want to hear.
Youre right I wont listen to comments from an ass like you knowing that I can get better feedback. No I dont copy/paste I look up the functions/methods and perhaps ask some questions regarding them before I use them.

Youre old you say? Well this would have been so much easier if you have acted at least 15yrs.

Share this post


Link to post
Share on other sites
1) Build a working prototype version of your system/module using something based on std::string or the stdlib.h/string.h CRT functions.


2) See if the above prototype is fast enough for the real world volumes of data your app will handle (i.e. get some test data and test..). If it is, then you don''t need to waste your time going any deeper *at this stage* (you might want to when the whole program is written).


3) Analyse what part of the prototype is taking the most overall time - there are a few candidates: search algorithm, replace algorithm (and related data structures), file I/O (if the whole search string won''t fit into memory), memory handling speed (usage patterns of data structures, allocator overhead) - etc...


4) Look closely at whichever of those contributes the largest overhead.

If for example you find that the string searching is taking the largest time, investigate alternative "string searching algorithms" - there is published research into string searching algorithms going right back to the birth of computer science (over 50 years worth!) - you don''t need to invent your own just yet.

The following site found from a quick google for "string searching algorithms" is particularly good (Java demos, C source code, pros & cons etc):
http://www-igm.univ-mlv.fr/~lecroq/string/

There are also many many books dedicated to string searching algorithms, some from the general point of view, many from a pretty domain specific point of view. Every algorithms book has at least a chapter dedicated to search algorithms in general - many of the concepts are applicable to string searches.



--
Simon O''Connor
3D Game Programmer &
Microsoft DirectX MVP

Share this post


Link to post
Share on other sites
quote:
No I dont, but I dont think that creating one function just to throw it away is very smart either - do you?


Actually it is. Building prototypes is a VERY smart thing to do!

In most industries its usual practice -:
e.g. an architect building a working scale model of an earthquake resistant building so they can make sure it works without risking lives and millions of dollars (/yen/euro etc).

e.g. a car designer hand making rough (but working) parts for a new car and then testing them to see if they work, and if there''ll be any problems before working out how to make the part cheaper and more efficient (i.e. optimisation).


Personally whenever I have simply sat down and written code, even if I''ve been through some sort of design process, the resulting code almost always changes in some ways. Once it''s actually running there are unforseen things to add that the design didn''t think of etc. The resulting code becomes messy and usually ends up having a "code tidy" and is even re-engineered from time to time.

If you take the time to see how much of your code gets re-written/tidied over time, you''ll be surprised!

The key thing about a prototype is you know you''ll be throwing it away - its only there to "learn from your mistakes", it''s only there to give you hindsight. The code in your prototype doesn''t need to be particularly clean or of release quality, it doesn''t need to be particularly optimal (though implementations of algorithms should be "true"), it doesn''t even need to be in the language you''ll be writing the final system in (for your string manipulation VB would be fine for example).

As long as the prototype (of the complete system or module) is finished and working, the hindsight it gives you is great.


</soapbox>

Share this post


Link to post
Share on other sites
Well, one reason your code fails, is because you''re trying to allocate to a pointer that won''t save the address on exit, but WILL delete the original pointer it points to, so when you exit this string, and try to free the pointer again, it fails.

Please pass a reference to a pointer, or a pointer to the pointer.


int SearchAndReplace(char **str, char *search, char *replace)
{
char *ptrStr = *str;
char *newStr;
int Offset=0;
int StrSize;
int SearchSize, ReplaceSize, DiffSize;
int Found=0;
StrSize = strlen(ptrStr);
SearchSize = strlen(search);
ReplaceSize = strlen(replace);
DiffSize = ReplaceSize-SearchSize; //Store the difference!
while (ptrStr[Offset]) //Until end of string
{
//strcmpi does a non-case sensitive compare!
if (!strcmpi(&ptrStr[Offset],search)) //Strings equal?
{
++Found; //Found another!
if (DiffSize>0) //New string is longer?
{
//Allocate the new string!
newStr = malloc(StrSize+Diff+1);
memcpy(newStr,ptrStr,Offset);//Copy up to this point!
memcpy(newStr+Offset,replace,ReplaceSize);
memcpy(newStr+Offset+ReplaceSize,ptrStr+Offset+SearchSize,StrSize-Offset);
newStr[StrSize+Diff] = 0; //Mull terminate it!
free(ptrStr);
ptrStr = newStr;
*str = newStr; //Update our passed in pointer!
}
else //No need to update size!
{
memcpy(newStr,ptrStr,Offset);//Copy up to this point!
memcpy(newStr+Offset,replace,ReplaceSize);
memcpy(newStr+Offset+ReplaceSize,ptrStr+Offset+SearchSize,StrSize-Offset);
newStr[StrSize+Diff] = 0; //Mull terminate it!
}
}
++Offset;
}
return Found;
}


Keep in mind, this hasn''t been tested so no pomises that it''s 100% correct, I typed it all in this window and did not do any sort of testing with it, so use at your own risk .

Share this post


Link to post
Share on other sites
Backing up Fruny here so you get an idea that you are off-base.

This is yet another pointless micro-optimization thread where you reject the easier and safer solution because it is behind a class. Worse, you cry when people tell you that you are wasting your time.

I am tiring of the same questions and the same rejected answers being dispensed on this board. It seems as if everyone who might possibly be writing a game is so obsessed and infatuated with the fact that yes, 100% of their code must be the ABSOLUTE fastest. Despite the 80/20 or 90/10 rule. Despite the fact that they have never used a profiler in their life. And despite the fact that people get shot down multiple times before.

Guess what? std::string is implemented in terms of the mem* functions on VC++. So its using pretty much the same functions you are.

Using char* is not automatically faster. I know this is sad and one would love to think that they are somehow lower-level because they are carefuly and meticulously matching their new[]''s with delete[]''s in scopes and reallocating then copying all by hand, perhaps with memcpy for that C style feeling. STL is made to be inlined and works fine in performance-sensitive apps. To argue otherwise shows your ignorance of the STL in general.

quote:
Dammn its hard to take any of your crap seriously... I have read some threads in here where people needed some task done quick and where the stl was not quick enough.

BS. I have been here quite awhile and haven''t seen that.

Share this post


Link to post
Share on other sites
quote:
Original post by antareus
Backing up Fruny here so you get an idea that you are off-base.

This is yet another pointless micro-optimization thread where you reject the easier and safer solution because it is behind a class. Worse, you cry when people tell you that you are wasting your time.

I am tiring of the same questions and the same rejected answers being dispensed on this board. It seems as if everyone who might possibly be writing a game is so obsessed and infatuated with the fact that yes, 100% of their code must be the ABSOLUTE fastest. Despite the 80/20 or 90/10 rule. Despite the fact that they have never used a profiler in their life. And despite the fact that people get shot down multiple times before.

Guess what? std::string is implemented in terms of the mem* functions on VC++. So its using pretty much the same functions you are.

Using char* is not automatically faster. I know this is sad and one would love to think that they are somehow lower-level because they are carefuly and meticulously matching their new[]'s with delete[]'s in scopes and reallocating then copying all by hand, perhaps with memcpy for that C style feeling. STL is made to be inlined and works fine in performance-sensitive apps. To argue otherwise shows your ignorance of the STL in general.

quote:
Dammn its hard to take any of your crap seriously... I have read some threads in here where people needed some task done quick and where the stl was not quick enough.

BS. I have been here quite awhile and haven't seen that.




Ahh, but I wrote my own string class that I use in my game engine. The buffer allocates extra space for when adding stuff into the string, thereby not needing to re-allocate the buffer each time it's added to . So, i am uber elite (Although, not necessarily lower-level as you put it). Yes, people do tend to over-optimize, and I am even guilty of this from time to time, but I normally know what will need to be optimized (10 years programming experience helps), and rather than just getting it to work, then optimizing it, I optimize it first.

After using strings in my database engine, I found it was much faster when using pre-buffered strings after implementing it. Now, I re-use my string class in most things, even if it's not in a crucial section, it doesn't hurt (like helping keep my loading times down when processing file names, reading in strings from file, etc). Now, I can say that if his engine is a chat type program that will be filtering messages, this is a function that WILL be crucial to the overall speed of the server, so optimizing it now rather than later is probably a good idea, instead of writing it twice.


--- Edit ---
You're as sick of people optimizing as I am of people telling me not to optimize. There is a happy medium to optmizing, and writing the program with optimizations, in sections you KNOW have to be optimized, is not evil. Now, I do agree, some people don't know when and where to optimize, and there are those that do. The ones saying not to optimize prematurely can't differentiate the two, and just tell everyone not to optimize. Like the discussion on optimizing an inner loop of a software rasterizing engine... you KNOW it's going to be the most commonly called procedure, so why would you want to just get it working first, then optimize later? That's doing twice as much work. Same thing here, it's going to be one of, if not the, most commonly called function (and string searching is rather slow), so why not optimize it the first time and be done with it? Why write it twice and waste your time? Yes, for this simply filling it in with a trivial std::string::find/replace would have been extremely simplistic, but later if it wasn't fast enough, he'd have to go through his program and change over all the std::strings with char arrays, or a custom string class, change all function prototypes, replace any function that he used strings in, etc. It'd be twice as much work replacing all that crap as it would to just write it with speed in mind in the first place.

[edited by - Ready4Dis on November 15, 2003 11:58:15 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by Ready4Dis


Ahh, but I wrote my own string class that I use in my game engine. The buffer allocates extra space for when adding stuff into the string, thereby not needing to re-allocate the buffer each time it's added to . So, i am uber elite




I have to agree with antareus and Fruny here.

The above quote just proves their point even further, std::string already gives you the ability to reserve memory.


std::string example;
example.reserve(1000); //reserve memory for 1000 characters


You could also just write another allocator for std::basic_string to allocate more memory than needed (although most implementations of the standard allocator already do this anyway).

Its just another example of over optimization gone crazy, writing your own string class to do something you could do very simply with std::basic_string, which is written and tested by the most experienced C++ programmers in the world.

quote:

Yes, for this simply filling it in with a trivial std::string::find/replace would have been extremely simplistic, but later if it wasn't fast enough, he'd have to go through his program and change over all the std::strings with char arrays, or a custom string class, change all function prototypes, replace any function that he used strings in, etc. It'd be twice as much work replacing all that crap as it would to just write it with speed in mind in the first place.



Or a far more likely situation would be that you end up writing your own string class (ready4dis::string) and using that instead, only to find that it is full of bugs and causing serious problems in your software. You then have to go back and replace every ready4dis::string with std::basic_string.



[edited by - Jingo on November 15, 2003 12:44:31 PM]

Share this post


Link to post
Share on other sites
One of the popular string search algorithms is the Boyer-Moore algorithm. It is an older algorithm and as a consequence, there are a few variations that exists. It is fast and easy to implement. I didn''t check out the link provided earlier, so you may have already come across this method.

Mike

Share this post


Link to post
Share on other sites
quote:
The ones saying not to optimize prematurely can''t differentiate the two, and just tell everyone not to optimize.

I''m sure Knuth would love to hear that.

Optimization implies something exists. Many many threads are about optimizing something in a product that isn''t even functional. I really don''t see whats so wrong with doing the simplest thing that works (string class over char*) and then going back with a profiler and hitting the areas where it really hurts.

Programmers usually have a decent idea what area of the code gets used the most. Nothing wrong with tweaking, but it seems like people overlook the fact that the greatest gains are made by making algorithmic, rather than silly changes like bit-shifting or using char* everywhere because you''re scared of std::string. This isn''t an excuse to write sloppy, slow code, its more of people seem to enjoy introducing additional complexity into their programs based on ancient optimization principles.

Share this post


Link to post
Share on other sites
That''s because optimization is "cool." Nevermind the fact that a working program is cooler.

Don''t use std::string, char* is faster!
Don''t use std::vector, new int[] is faster!
Don''t use std::sort, qsort is faster!
Don''t write a game, it''s more fun to bicker over whether
if(a && b) 

or
if(a) if(b) 

is faster!
Don''t listen to respected posters, the guy that agrees with you is right!

/sarcasm

Share this post


Link to post
Share on other sites