Jump to content

  • Log In with Google      Sign In   
  • Create Account


optimizing my code (memory usage, postfix and prefix notation, "continue")


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 Nick C.   Members   -  Reputation: 176

Like
0Likes
Like

Posted 27 March 2013 - 08:47 AM

Hello all

I don't usually write posts here, as I can find most information in the books I have, but I really have an interesting question here.
I read some info on sequence points and order of evaluation in C++, but I'm still not sure about this issue.

I have the following code (tiny part of a bigger function ofcourse).


line is a string, ll is an int, j is an int and splitchar is a char
The object of this piece of code is to increase j by 1, as long as the characer of line with index j, equals splitchar.
The character of line with index j, before execution of the inner while loop code, always equals splitchar, quaranteed. So j will be increased by at least one.

//some code
int ll = line.length();
int  j(0);
while(j < ll)
{
//some code that uses and changes j, but nothing very important. It doesn't influence the following code or the subject of this topic.
while (j < ll - 1 && line[j] == splitchar)
++j;
//some code
}

I changed the inner while loop to

while (j++ < ll - 1 && line[j] == splitchar)
continue;

or

while (j < ll - 1 && line[++j] == splitchar)
continue; 

 

which are pretty much the same. Trust me, I've tested everything in detail ;)
Note: postfix or prefix notation is very important here!

 

Now, my questions are;

1. Does it matter if I write "continue", or empty braces ("{}") in the last two examples, and how does this affect the performance?
2. Which one of these three would have the best performance?

3. Is there an even better possibility?

Performance is not very important in this program, but I really want to know the little details, because it WILL matter in future projects.
Thanks in advance smile.png

Nick


Edited by Nick C., 27 March 2013 - 08:53 AM.


Sponsor:

#2 BitMaster   Crossbones+   -  Reputation: 3882

Like
-1Likes
Like

Posted 27 March 2013 - 08:52 AM

Edit: Ignore this post, I didn't think it through properly and SiScrane already corrected me.



while (j++ < ll - 1 && line[j] == splitchar)
continue;
or


while (j < ll - 1 && line[++j] == splitchar)
continue;


Both versions are not well-defined C++. The value of j is undefined (see this Wikipedia page). A colleague of mine recently ran into problems because they were working on code that relied on that but then had to change compiler version.

Edited by BitMaster, 28 March 2013 - 07:54 AM.


#3 SiCrane   Moderators   -  Reputation: 9540

Like
6Likes
Like

Posted 27 March 2013 - 09:01 AM

&& is a sequence point in C++. j is only accessed once on each side of the && in both versions so this looks well defined. Nonetheless, I would avoid loop conditions with side effects.

Also, I would avoid variables that only include the letter lower case l. It's too easy to mistake l for 1 in many fonts used in programming.

#4 BitMaster   Crossbones+   -  Reputation: 3882

Like
0Likes
Like

Posted 27 March 2013 - 09:07 AM

Really? Well, you keep learning. Still, I would agree with SiCrane and avoid the loops even if they are well defined.

#5 swiftcoder   Senior Moderators   -  Reputation: 9846

Like
5Likes
Like

Posted 27 March 2013 - 09:38 AM

Performance is not very important in this program, but I really want to know the little details, because it WILL matter in future projects.

It really won't.

 

Write all 3 versions of the loop, compile it with all optimisations turned on, and look at the resulting assembly. Even money says they all generate the same assembly.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#6 wintertime   Members   -  Reputation: 1640

Like
-2Likes
Like

Posted 27 March 2013 - 12:45 PM

Oh those quirky optimization attempts could well go wrong cause it can be undefined behavior when the increment of a variable thats used twice inside a statement happens. The last is always different from the original as j would be increased before accessing the array, which it was not in the original.

Also abusing the continue statement in this way when there is no jump needed looks ugly and a few years ago some compilers without good optimization could even have added a useless jump instruction.

 

Edit: slight clarification for language lawyers


Edited by wintertime, 28 March 2013 - 05:55 AM.


#7 Ravyne   Crossbones+   -  Reputation: 7116

Like
1Likes
Like

Posted 27 March 2013 - 03:05 PM

Nick C., on 27 Mar 2013 - 07:54, said:


Performance is not very important in this program, but I really want to know the little details, because it WILL matter in future projects.

It really won't.

 

QFE. It won't.

 

It'd be better[note] to use the std:string member function find_first_of like so:

size_t pos = line.find_first_of(splitchar);


if(pos != string::npos)

{

//do your stuff here.

}

 

[Note] Err. Sorry, reading comprehension fail. Nonetheless the above is good advice for finding the first occurence of splitchar that I'll build on in a bit, so I'll leave it be.

 

 

It'd be better[really, this time] to use the std:string member function find_first_not_of like so (if you know that line begins with splitchar):

size_t pos = line.find_first_not_of(splitchar);


if(pos != string::npos)

{

//do your stuff here.

}

 

If you don't know that line begins with splitchar, then you can combine these two member functions like so:

 

size_t pos = line.find_first_not_of(splitchar, line.find_first_of(splitchar));


if(pos != string::npos)

{

//do your stuff here.

}


Edited by Ravyne, 27 March 2013 - 03:21 PM.


#8 wintertime   Members   -  Reputation: 1640

Like
2Likes
Like

Posted 28 March 2013 - 05:30 AM

while (j < ll - 1 && line[j] == splitchar)
++j;


while (j < ll - 1 && line[++j] == splitchar)
continue; 

Even though I got -2 for pointing out how this is a failed optimization attempt, that code is not equivalent and you should stop trying to microoptimize such irrelevant things when you are likely to introduce bugs.


Edited by wintertime, 28 March 2013 - 05:31 AM.


#9 BitMaster   Crossbones+   -  Reputation: 3882

Like
0Likes
Like

Posted 28 March 2013 - 07:53 AM

Even though I got -2 for pointing out how this is a failed optimization attempt, that code is not equivalent and you should stop trying to microoptimize such irrelevant things when you are likely to introduce bugs.

I don't think you were downvoted for suggesting the optimization was useless and an irrelevant micro optimization. You were downvoted because you said it's invoking undefined behavior. It is well defined. I made the same mistake because I didn't think it through properly more than four hours before you though and I was already corrected.

#10 Ravyne   Crossbones+   -  Reputation: 7116

Like
0Likes
Like

Posted 28 March 2013 - 12:05 PM

Also abusing the continue statement in this way when there is no jump needed looks ugly and a few years ago some compilers without good optimization could even have added a useless jump instruction.

 

Its hardly abusing continue. The fact that a compiler might have mishandled it in the past is not an indication of some terrible practice going on. It's probably good to avoid using looping structures solely for side-effects, as someone else pointed out, and there are better tools as I pointed out myself, but if you did it anyways, using continue is probably a better option than empty braces or a semicolon. At least it stands out and says precisely what the intent was. Empty braces might invite the thought that the programmer forgot to fill in the loop body, and an empty statement (a single semicolon) is first of all very easy to miss, and causes really strange errors if you should ever forget or mistakenly delete it.



#11 Yrjö P.   Crossbones+   -  Reputation: 1412

Like
0Likes
Like

Posted 28 March 2013 - 05:16 PM

Empty braces might invite the thought that the programmer forgot to fill in the loop body, and an empty statement (a single semicolon) is first of all very easy to miss, and causes really strange errors if you should ever forget or mistakenly delete it.

I pretty much never find myself needing a loop without a body, but when I do, I put a single semicolon indented on the next line. It stands out pretty well and doesn't look like you forgot to write a body.

#12 Nick C.   Members   -  Reputation: 176

Like
0Likes
Like

Posted 02 April 2013 - 09:45 AM

Okay, thanks everyone for the replies (and sorry for my late reply).
It seems that I should have included my entire function to prevent confusion here, as some people don't really know what I want to archieve here.

So: the object of this function was to split a string on a character. However, the function that Ravyne showed here (yes, I know there are even more possible solutions) doesn't really do what I want, as I have a few more requirements xd. That function was actually one of the first I came up with.
Let me explain you a few other things.
- If the string doesn't contain that character, the function stores the entire string in a vector (size 1)
-if the string is empty, the function stores an empty string ("") in a vector (also size 1)
-If the character occurs as first character an empty string will be stored as first element in the vector
-If there are multiple duplicates of that character following, it ignores all those.

An example:
vector<string> splittedString;
char splitChar = '.';
string stringToSplit = ".test...12..3.";

After the instruction
SplitLine(stringToSplit, splittedString, splitChar);
splittedString contains the strings
0.  (empty)
1. test
2. 12

3. 3
4.  (empty)

Not trying to argue, just saying how it is... Like wintertime said, those two loops don't exactly do the same thing, but they do if you actually see the entire function. As I said before, I've tested all the possibilities comprehensively. So, without further ado, my entire function. You can use all my possibilities, they all do the same thing.
 











inline void DaeToAniConverter::SplitLine(string line, vector<string>& splittedline, char splitchar)
{
	//vector with string segments should contain at least one string
	if (line == "")
	{
		splittedline.push_back("");
		return;
	}
        int ll = line.length();
	splittedline.clear();
	int prevIndex(0);
	int  j(0);
	while(j < ll)
	{
		int tempJ = line.find_first_of(splitchar, j);
		j = tempJ >= 0 ? tempJ : ll;
		//skip multiple equal characters
		splittedline.push_back(line.substr(prevIndex, (j-prevIndex)));
		while (j < ll - 1 && line[++j] == splitchar)
		{}
		prevIndex = j;
	}

}

 

As you can see here, the main loop is only executed as many times as there will be string segments (in my last example, 5 times), so not character per character. I actually do use line.find_first_of.

And about that performance issue, you are all right smile.png. It really doesn't make a big difference, and a good structure is sometimes more important than performance.
Thanks to everyone else for your constructive remarks.
Again (but now you know what I want to archieve here), if there is a better way to archieve this functionctionality, I would gladly hear about it.

I'm just a 19 year old student, programming in my free time. Everyone makes mistakes, right? smile.png

gz
Nick

Edit: This may be important. This function is part of a program that reads a file, converts it to a custom format, and writes it to a new file. I'm talking about pretty big files, so this function is usually executed at least hundred times (usually a few hundred). Right now it only takes 2-5 seconds for the program to do its thing, so that's not too bad smile.png.


Edited by Nick C., 02 April 2013 - 10:01 AM.


#13 Ravyne   Crossbones+   -  Reputation: 7116

Like
0Likes
Like

Posted 02 April 2013 - 12:57 PM

Let me explain you a few other things.

  1. If the string doesn't contain that character, the function stores the entire string in a vector (size 1)
  2. If the string is empty, the function stores an empty string ("") in a vector (also size 1)
  3. If the character occurs as first character an empty string will be stored as first element in the vector
  4. If there are multiple duplicates of that character following, it ignores all those.

 

The second rule is redundant. An empty string won't contain the character, and therefore rule 1 applies, and since the string is empty, rule 1 will store an empty string.

 

I get why you want to eat multiple occurring characters, but I don't get why you want an empty string if its the first character (and last, by your example?)






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS