Jump to content

  • Log In with Google      Sign In   
  • Create Account


Best way to remove a substring from a C string?

  • You cannot reply to this topic
5 replies to this topic

#1 Blednik   Members   -  Reputation: 117

Like
0Likes
Like

Posted 22 June 2014 - 05:20 AM

I'm poking around with the C string functions. Don't ask, I'd personally just use C++ for the task, but it has to be pure C in this case. Anyway here's the problem. I have a null-terminated string, and I have to remove a certain substring from it that begins on location a, and ends on location b. I'm doing this on windows in MinGW. Consider this sample:

char * mystr = "sometestingtennisbabesjunk";const int a = 4;const int b = 11;removesubstring(mystr, a, b);
The result would be: "sometennisbabesjunk", characters that were removed from location 4 to 11: "testing".

What is the best way in C to do this? Is there a dedicated C string function for this task? I couldn't find one.

Sponsor:

#2 Olof Hedman   Crossbones+   -  Reputation: 2647

Like
5Likes
Like

Posted 22 June 2014 - 05:37 AM

memcpy memmove is what you need.

 

you need to copy everything from b to the end of the string to the position a.

 

so something like this:

 

memmove(mystr + a, mystr + b, strlen(mystr) - b + 1);

 

The +1 is for also copying the null terminator.

 

That will leave some unused space at the end of the string.

If this is a prolem, you could copy the result to a newly allocated string which has the right number of bytes. 

Maybe using strdup

 

 

edit: Fixed it tongue.png As vstrakh points out below, memmove is the one to use when regions overlap.


Edited by Olof Hedman, 22 June 2014 - 12:33 PM.


#3 Bacterius   Crossbones+   -  Reputation: 8157

Like
2Likes
Like

Posted 22 June 2014 - 05:43 AM

Also if you implement it manually make sure you validate your bounds, that is:

if ((b < a) || (b > strlen(str))))
    /* abort! */

Otherwise you're just setting yourself up for scribbling all over your stack or heap.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#4 vstrakh   Members   -  Reputation: 283

Like
7Likes
Like

Posted 22 June 2014 - 05:44 AM

memcopy behaviour is undefined when regions overlapped, so you actually need memmove.

 

And be careful to not operate on string constants. mystr variable only holds pointer to string literal data, which is constant.



#5 Blednik   Members   -  Reputation: 117

Like
0Likes
Like

Posted 22 June 2014 - 06:22 AM

Also if you implement it manually make sure you validate your bounds, that is:

if ((b &amp;amp;lt; a) || (b &amp;amp;gt; strlen(str))))    /* abort! */
Otherwise you're just setting yourself up for scribbling all over your stack or heap.
Shouldn't the sanity checks be like:
len = strlen(str);
if ( (!len) || (b <= a) || (a >= len) ) {... abort}
if (b > len) b = len;
EDIT: Stupid HTML

#6 Bacterius   Crossbones+   -  Reputation: 8157

Like
2Likes
Like

Posted 22 June 2014 - 06:33 AM

 

Also if you implement it manually make sure you validate your bounds, that is:

if ((b &amp;amp;lt; a) || (b &amp;amp;gt; strlen(str))))    /* abort! */
Otherwise you're just setting yourself up for scribbling all over your stack or heap.
Shouldn't the sanity checks be like:
len = strlen(str);
if ( (!len) || (b <= a) || (a >= len) ) {... abort}
if (b > len) b = len;
EDIT: Stupid HTML

 

 

Depends how you want to use your function. I personally prefer to not allow nonsensical input at all, thus if b is beyond the string, I would reject it. If you prefer to clamp it to the string's length instead like some string functions do, that's fine too, as long as you document that behaviour. If you don't want to allow a == b, that's fine as well. And of course you should reject null char* pointers, forgot that one (though it is obvious). Notice that !((b < a) || (b > len)) implies len > 0 (or a = b = 0) since a and b are unsigned.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis






PARTNERS