Archived

This topic is now archived and is closed to further replies.

ZealousElixir

substr

Recommended Posts

Hey guys. I was hacking around this afternoon and I came up with a pretty nifty implementation of a substr()-style function. I was wondering if there are any optimizations that could be easily made without the introduction of additional function calls. Anyway, here's the code. Let the flaming commence.
      
char *substr(char *base, int start, int length)
{
    int i = 0, j = 0, baselength = 0;
    
    while(base[i++] != '\0')
        baselength++;
    if(length > baselength)
        return "=NULL=";
    char* substring = new char[length + 1];
    for (i = start; i < length + start; i++, j++)
        substring[j] = base[i];
    substring[j] = '\0';
    return substring;
}
  
Thanks. Later, ZE. [EDIT: code formatting (still doesn't look right)] //email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links
[if you have a link proposal, email me.] Edited by - zealouselixir on February 6, 2002 1:25:55 PM

Share this post


Link to post
Share on other sites
Because I am the man.


Seriously, good idea. I'll fix that.

Later,
ZE.

P.S. Here's the fix:

    
if(length > baselength)
return "=NULL=";

// the above turns into...


if(length > baselength)
length = baselength;


In addition, if you want it to be as fool-proof as possible, add the following lines at the top:
  
if(!base)
return "\0";



//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]



Edited by - zealouselixir on February 6, 2002 1:37:59 PM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
your code won''t work. It may also cause core-dump. If you need help with your code, just say so.

Share this post


Link to post
Share on other sites
As far as I can tell, it won''t cause core dump, as long as you delete the substring when you''re finished with it. If you''re not concerned with memory leakage, you don''t even have to do that . Seriously though, if there are some issues I''m overlooking, tell me.

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

Share this post


Link to post
Share on other sites
The most obvious problem is that you don''t verify that start < baselength. The second problem is that it''s impossible to tell whether it''s ok to delete the returned string or not without examining the string (sometimes you return a string allocated with new, sometimes you return a constant string). A third (related) problem is that you can''t distinguish a success from a failure - consider if base actually contained the string "=NULL=".

-Mike

Share this post


Link to post
Share on other sites
Aiiight, here's version 3.

      
char *substr(char *base, int start, int length)
{
char *substring;
if(!base)
{
substring = new char[1];
substring[0] = '\0';
return substring;
}

int i = 0, j = 0, baselength = 0;

while(base[i++] != '\0')
baselength++;

if(length > baselength)
length = baselength;

if(start > baselength)
{
substring = new char[1];
substring[0] = '\0';
return substring;
}

substring = new char[length + 1];

for (i = start; i < length + start; i++, j++)
substring[j] = base[i];

substring[j] = '\0';

return substring;
}


Now you always have to delete the returned string, you can't go out of any boundaries, and (this wasn't a concern after version 2, but Mike didn't read all the follow-up posts) an empty string is simply returned when an error can't be recovered from. Now we just need optimizations

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

[EDIT: formatting and spelling]


Edited by - zealouselixir on February 6, 2002 9:25:53 PM

Share this post


Link to post
Share on other sites
1. Just return a null pointer instead of constructing a null string on the free store.
2. Prefer pre-increment to post increment, unless post-increment is necessary.
3. Reuse the standard library copy instead of writing your own.
4. Use the preprocessor to specify that the checks only be performed in debug mode (optional!).

  
#include <algorithm>
 
char* substr (char* base, int start, int length)
{
using std::copy;
 
if (!base) return 0;
 
#ifndef NDEBUG
{
int i = 0, baselength = 0;
 
while (base[i++] != ''\0'') ++baselength;
 
if (length > baselength) length = baselength;
if (start > baselength) return 0;
}
#endif // NDEBUG

 
char* substring = new char [length + 1];
 
char* begin = base + start;
char* end = being + length;
 
copy (begin, end, substring);
 
*end = ''\0'';
 
return substring;
}


I suppose that you could use std::memcpy to copy words at a time instead of bytes. std::memcpy should be highly optimized already. Outside of that, there isn''t much you can do to make your code execute faster.

Share this post


Link to post
Share on other sites
I think that''s a bad use of a debug-only check, as it will result in code that executes fine without any warnings in debug mode but crashes in release mode. Better to use assertions in debug mode if you want to insist that the function is given valid inputs. Otherwise choose between the range-checking that you have now and throwing exceptions when stuff is out of range.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost ]

Share this post


Link to post
Share on other sites

I think the best way to do it will be something like it
I havent tested it

  
char* substr (char* base, int start, int length)
{
int i;
char *ret=NULL;
char cont=1;

if(base=!=NULL)
{
if((length>=0)&&(start>=0)) //length & start must be > 0

{
for(i=0; i<start; i++) //To see if the base string had finished before ''start''

{
if(base[i]==''\0'')
{
cont = 0;
}
}
if(cont)
{
ret = new char[length];
if(ret!=NULL)
{
for(i=start; i<length+start; i++) //It doesnt matter if during the for the base strings finishes, because of the ''\0'' character will also be copied

{
ret[i]=base[i];
}
ret[length-1]=''\0'';
}
}
}
}

return ret;
}


Share this post


Link to post
Share on other sites
  
char *substr(const char *src, size_t start, size_t length)
{
char *ret = NULL;

if ((start >= 0) && (length >= 0)) {
if ((start + length) <= strlen(src)) {
if ((ret = new(char[(length + 1)])) != NULL) {
strncpy(ret, (src + start), length);
ret[length] = 0;
}
}
}
return (ret);
}


furthermore, since "size_t" is usually "unsigned int" you can remove the line "if ((start >=0) && (length >= 0)) {". or just declare as "unsigned int" instead of "size_t".

and if you absolutely need to validate if "src" is a valid string pointer use "IsBadStringPtr" under Win32.

To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.

Share this post


Link to post
Share on other sites
quote:
Original post by null_pointer
1. Just return a null pointer instead of constructing a null string on the free store.
2. Prefer pre-increment to post increment, unless post-increment is necessary.
3. Reuse the standard library copy instead of writing your own.
4. Use the preprocessor to specify that the checks only be performed in debug mode (optional!).



1. passing a NULL pointer to cout causes a crash.
2. there is absolutely no reasoning behind that
3. (also to jenova) there was a prerequisite that no additional function calls be made
4. why? they don''t slow it down much, and, like Kylotan said, it would cause mysterious crashes in release mode

Later,
ZE.



//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

Share this post


Link to post
Share on other sites
1. Does it? Odd.
2. Pre-increment is easier for the compiler to optimise as post-increment involves creating a temporary (which must then be optimised away).
3. std::copy is likely to be inserted inline anyway, in most implementations.
4. Some debug-only checks can still be useful, providing you use them to check for things that should never happen in release mode. That means that they should stop execution or at least log that an error has occurred. This is to highlight errors in the code that called that function. You''d also be surprised how much ''if'' checks can slow down a function, due to the way that the processor reads ahead.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost ]

Share this post


Link to post
Share on other sites
1. isn''t it though?
2. perhaps. technicality.
3. it''s also another file to include and link
4. right, but you won''t even *need* a debug check if you make sure that at any given time, all the parameters are valid. I doubt I''d be all that surprised at the execution time, even using the if''s in release mode

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

Share this post


Link to post
Share on other sites
for the handicap''d.



  
char *substr(const char *src, size_t start, size_t length)
{
char *ret = NULL, *str = (char *)src;
size_t cnt, len = 0;

while (*(str++) != 0) len++;
if ((start >= 0) && (length >= 0)) {
if ((start + length) <= len) {
if ((ret = new(char[(length + 1)])) != NULL) {
for (cnt = 0; cnt < length; cnt++) ret[cnt] = src[(start + cnt)];
ret[length] = 0;
}
}
}
return (ret);
}


To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.

Share this post


Link to post
Share on other sites
why so complex? i was thinking i might as well have a crack at i and this is what i came up with, seems pretty save to me, there''s probably a safer way, but i think this a simple and efficient way, i''d like to hear your opinions

  
/* WARNING: hasn''t been tested */
char* substr(const char* const base, uint offset, uint length)
{
if(!base)
return 0;

// maybe not the best idea to create on heap, but safe

char* substring = new char[strlen(base)];

return static_cast<char*>(memcpy(substring, &base[offset], length));
}

Share this post


Link to post
Share on other sites
also if you declared to the user that you make no gurantees on the state of the base string after the call, you could do:

    
/* WARNING: hasn't been tested */
char* substr(char* base, uint offset, uint length){

if(base[offset + length + 1])
// for safer version if(strlen(base) >= (offset + length + 1))

{
memcpy(base, &base[offset], length + 1);
base[offset + length + 1] = '\0';
}
return base;
}


that's faster, but again there's no *proper* bounds checking and base is not guranteed to be in the same state as it was previously, but it does the job


/* just some stylistic editing */

Edited by - abdulla on February 9, 2002 5:40:58 AM

Share this post


Link to post
Share on other sites
just outta left field, a question, can''t you you calculate strlen by saying "sizeof(string)/sizeof(char)" for an ascii locale and wchar for some other extended locale, haven''t tried, but i''m assuming strlen is there for a reason :\

Share this post


Link to post
Share on other sites
quote:
Original post by ZealousElixir

quote:
--------------------------------------------------------------------------------
Original post by null_pointer
1. Just return a null pointer instead of constructing a null string on the free store.
2. Prefer pre-increment to post increment, unless post-increment is necessary.
3. Reuse the standard library copy instead of writing your own.
4. Use the preprocessor to specify that the checks only be performed in debug mode (optional!).
--------------------------------------------------------------------------------

1. passing a NULL pointer to cout causes a crash.
2. there is absolutely no reasoning behind that
3. (also to jenova) there was a prerequisite that no additional function calls be made
4. why? they don''t slow it down much, and, like Kylotan said, it would cause mysterious crashes in release mode


1. That is the point of returning a null pointer. You don''t want code to fail silently in debug mode. I would prefer exceptions, though.

2. Wrong. It forces the compiler to do more work and besides, getting into the habit of using post-increment unnecessarily is error-prone.

3. Most of the standard algorithms are simple shorthand for frequently written code; they don''t do memory allocation or throw exceptions or other wild stuff, because that would seriously cripple their usefulness. If you want to write your own for loop, then do it; but, reusing the standard algorithms is a Good Thing.

4. Wrong. (a) In case you didn''t notice, there are several conditional statements and a for loop in there. (b) Those checks are not necessary in release mode. (c) Kylotan was discussing the method of reporting the error, not the fact that the check was only made in a debug build. If you disagree, ask him.

Share this post


Link to post
Share on other sites
quote:
Original post by ZealousElixir
1. isn''t it though?
2. perhaps. technicality.
3. it''s also another file to include and link
4. right, but you won''t even *need* a debug check if you make sure that at any given time, all the parameters are valid. I doubt I''d be all that surprised at the execution time, even using the if''s in release mode

2 - technicalities are what optimization is all about.
3 - no, there''s no linking involved with most implementations of std::copy. And are you trying to optimize run-time or development time?
4 - the best way you can prove all your parameters are valid is to put checks inside the function. Putting checks in the calling code is going to be harder to guarantee to be correct.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost ]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
  char *substr(const char *string, unsigned int offset, unsigned int length)
{
int strLength = 0;
char *newString = NULL;
char *source = NULL;
char *dest = NULL;

// strlen

source = (char*)string;
while(*source++) strLength++;

// sanity checks

// that there is a string

// that we''ve been given a length

// that the substring they wan''t isn''t out of range

if(!string || !length || offset+length > strLength)
return NULL;

// alloc

newString = new char[length+1];
if(!newString)
return NULL;

// memcpy

source = (char*)(string + offset);
dest = newString;
for(int i = 0; i < length; i++) *dest = *source;

// make sure it''s terminated

newString[length] = 0;

// return it

return newString;
}

Share this post


Link to post
Share on other sites