#### Archived

This topic is now archived and is closed to further replies.

# Find & Replace on large strings?

This topic is 5712 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey, I was wondering if there was an easy way to do find & replace substrings of a string that has more elements than size_type can hold. I have a string that could exceed several megabytes long, and I was wanting to replace all the ~''s with ~~(and vice-versa in another function) The only way I could thing about doing it was(in pseudo code) while ok = false loop through the string 1 at a time with an iterator if there''s a tilde, add a tilde to the next position ---break the loop because the iterator is invalidated if we acually made it to the end, set ok=true Is there a prettier way to do it? IT seems like i''d be spending a lot of time checking the same areas over and over again

##### Share on other sites
You should be able to do it in one or two passes.

Since the destination string can be longer than the source string, you will probably have to create space to hold the destination string.

A FASTER METHOD:

Create space for the new string:
Either (A) find the number of occurences of search_string in the source_string. The new string length will be:
source_string_length + occurences * (replace_string_length - source_string_length)
- Or (B) just overestimate the size of the new string. (If replace_string_length > search_string_length, max possible size = ((source_string_length / search_string_length) + 1) * source_string_length.

Then, just search through the string for all occurences of search_string. Each time you find one, append all the text _between_ this occurence and the previous occurence (or start of string), then append replace_string. Also, make sure all text after the last occurence is also appended to the text file.

EG
search_string = "ll"
replace_string = "zzz"
source_string = "hello all peoples"
dest_string = "hezzzo azzz peoples"

"hello all peoples" contains 2 "ll"s.
so new string length = 17 + 2 * (3-2) = 19

BEGIN OPERATION:
# first occurence here
hello all peoples

append "he" - he (text between start of string and first occurence of "ll")
append "zzz" - hezzz

# second occurence here
hello all peoples

append "o a" - (text between first and second occurence of "ll") - hezzo a
append "zzz" - hezzzo azzz

No more occurances
append " peoples" - (text between last occurence of "ll" and end of string) - hezzzo azzz peoples

You now have a source_string and a dest_string to do what you want with.

Be careful about making sure dest_string is terminated with '\0' if it is a C text string.

(If replace_string_length <= search_string_length, you can perform the algorithm using source_string as the dest_string)

If you want a fast string search algorithm, look up the KMP algorithm.

[edited by - Krylloan on May 30, 2002 5:00:13 AM]

##### Share on other sites
I think the fast & easiest (not necessarily efficient ) way is to have another destination string twice as big as the source (considering your source string is entirely tilde).

And then:
k = j = 0;while (not reach source's end)  if source[k] = '~'    destination[j++] = '~';  destination[j++] = source[k++];

Just a rough idea.

[edited by - DerekSaw on May 30, 2002 5:12:09 AM]

##### Share on other sites
quote:
Original post by bucky0
Hey, I was wondering if there was an easy way to do find & replace substrings of a string that has more elements than size_type can hold. I have a string that could exceed several megabytes long

Unless I''m mistaken, on most platforms size_type is going to be an unsigned long, making it have a capacity of over 4 billion, eg. 4GB.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files ]

##### Share on other sites
quote:
Original post by Kylotan
Unless I''m mistaken, on most platforms size_type is going to be an unsigned long, making it have a capacity of over 4 billion, eg. 4GB.

size_type might be just signed long, since they use the value ''-1'' as some special indicator... so we (might) are down to 2GB.