Sign in to follow this  
Gage64

Unity Parameter value optimization?

Recommended Posts

This thread got me thinking. Suppose I do this:
struct Obj {
    int a;
    char b;

    Obj(int a, char b):a(a), b(b) {}
};

std::vector<Obj> vec;
Obj obj(5, 'b');
vec.push_back(obj);
In this code, first Obj's constructor is called, then it's copy constructor is called to copy it into the vector, then it's destructor is called. But suppose that I have no need for obj. I only want to create an object "inside" the vector, so to speak. If it was possible to create the object directly into the vector's memory, then the copy constructor and destructor would not need to be called. This situation kind of reminds me of the return value optimization, so my question is: is there something like this with parameters? That is, if I change the above code to this: vec.push_back(Obj(5, 'b')); would the compiler be able to optimize the temporary away? Also, I've heard that there's also the named return optimization, where the compiler can optimize away the temporary even if it's created as an object with a name (i.e., not just in the return statement). Can the compiler do something like this with the first code snippet? Thanks in advance.

Share this post


Link to post
Share on other sites
In theory, yes. You could write code such as:

void foo(Bar a);
int main() { foo(Bar()); }


And no temporary value would be created, skipping the copy construction and instantiating the argument directly in the function.

In practice, what happens is that the push-back function usually looks like:

void push_back(const value_type &t)
{
if (size() == capacity()) { reserve(capacity() * 2); }
new (this->_data + size()) value_type(t);
++ this->_size;
}


Meaning the calling function cannot know where the object should be instantiated, and thus no optimization can be applied. Of course, some compilers may apply some additional optimizations to their associated SC++L implementations, but I don't think this one happens.

Share this post


Link to post
Share on other sites
It's tricky. Very tricky. Mostly because for your very simple example, it may behave differently than a *real* example.

Compilers are actually surprisingly stupid when you really confront them with something that is semantically obvious, but syntactically complicated, like what you are putting forward here. It is clear what your intent is. However, with all programming languages, there isn't always a direct mapping between what a programmer means, and what a programmer says [thus, logically different code that is *meant* to be the same]. For example, what you have here.

Consider this:
Obj c;
c = Obj(a,b);
vs:
Obj c(a,b)
or even:
Obj c = Obj(a,b)
What you MEAN, is clear. You mean for these all to be the same. What you SAY, isn't the same. And the compiler can only mess with what you say. The first case resolves into a fully completing empty constructor followed by an assingment. The second is the (int,char) constructor, and it turns out that the third case is also the (int,char) constructor [though in a rather strange form].

So.... The std::vector<T>::pushback(T&) function will most certainly be inlined. In all likelihood, so will your constructors and your assignment in this simple case. And once it is all inlined, the compiler will see these redundant assignments of variables, and hack them up. It will optimize the code, but it will likely only actually end up being expressed as you are describing in the most simplistic of cases. Cases where everything gets inlined, and everything gets chewed up, and everything means the same thing [which turns out very frequently to be semantically equivalent, but not structurally equivalent]. It turns out that this isn't going to be a specific optimization with its own fancy name [at least not a name I'm familiar with], but instead an effect of function inlining and an a few expression simplification tricks.

But again, as always, it depends heavily on the individual compiler.

And yes, a compiler can optimize away a return value pretty easily in many cases.

*EDIT* Toohrvyk brings up a good point about the complications brought up by using the vector class. On a side note though, it doesn't have to know exactly where the data will go to grind on the assignments and simplify the equations if everything has been inlined successfully.

Share this post


Link to post
Share on other sites
I think this can be solved if you could do something like this:

void push_back(arguments)
{
if (size() == capacity()) { reserve(capacity() * 2); }
new (this->_data + size()) value_type(arguments);
++ this->_size;
}


Then this:

vec.push_back(Obj(5, 'b'));

would become this:

vec.push_back(5, 'b');

That is, if push_back could accept a variable number of arguments, then it could construct the object directly in memory, and then there would be no need to create a temporary object.

I think this is what's being requested in the other thread that I linked. I guess doing this in C++ is very complicated, if it's even possible.

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
I think this can be solved if you could do something like this:


There are hundreds of ways this could be solved, with small patches to the language. But ultimately, it's the entire language philosophy and semantics that cause the issue: the C++ timidly supports the distinction between initialized and uninitialized memory, meaning it's good enough to cause problems but not good enough to solve them elegantly.

Share this post


Link to post
Share on other sites
btw, sometimes, to avoid the temporary you can just grow your vector first and take a reference on the last element. The resize will internally call the default constructor of T though.


std::vector<T> myVector;
...
size_t size = myVector.size();
myVector.resize( size + 1 ); // default constructor called
T& myElement = myVector[size]; // initialize the element afterwards

Share this post


Link to post
Share on other sites
Quote:
Original post by fboivin
The resize will internally call the default constructor of T though.
Which is exactly why it doesn't help the situation any. You are also breaking encapsulation in doing this, and requiring that your objects are designed with entirely public fields for this to have any performance impact at all. In short, you're trading one performance problem that is likely rather mild, for a design problem that can be more significant.
Quote:
Original post by ToohrVyk
There are hundreds of ways this could be solved, with small patches to the language. But ultimately, it's the entire language philosophy and semantics that cause the issue: the C++ timidly supports the distinction between initialized and uninitialized memory, meaning it's good enough to cause problems but not good enough to solve them elegantly
I'm not going to go into it too much for fear of starting some sort of language war, but you bring up a good point about C++ in particular. This issue really is only the tip of the iceberg of reasons that compiler writers hate C and C++. There are so many little details of the C and C++ specifications that make certain compile time optimization intensely difficult or even completely impossible that are rather simple in other languages. Couple this with adoption of common practices that are not actually part of the standard resulting in code that compiles fine if optimizations that rely on standards compliance are all turned off, and breaks horribly if optimizations are turned on, and you have a monster of a mess on your hands.

[Edited by - Drigovas on July 8, 2008 6:29:29 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by fboivin
btw, sometimes, to avoid the temporary you can just grow your vector first and take a reference on the last element. The resize will internally call the default constructor of T though.

*** Source Snippet Removed ***


No, in that case also unnecessary assignment operator call will be made, if the compiler is not "smart enough" to optimize it. In perfect case we should be able to construct the element *directly* in the vector.

Share this post


Link to post
Share on other sites
Hell, why doesn't C++ have constructor inheritance? If so, I could write this:
template <class T> struct _init_ : T
{
void* operator new( size_t, T* Ptr )
{
return Ptr;
}

void operator delete( void* )
{
}

void operator delete( void*, T* )
{
}
};

#define CONSTRUCT( type, ptr, args ) ( (void) new(ptr) _init_<type> args )
#define DESTRUCT( type, ptr ) ( delete _init_<type> )

class Foo
{
Foo( int, int );

....
};

....

CONSTRUCT( Foo, pFoo, (1,2) );





Is there any way around? Or do I have to write this stupid code?
template <class T> struct _init_
{
T Data;

_init_() : Data() {}

template <class A1>
_init_( A1 a1 ) : Data(a1) {}

template <class A1, class A2>
_init_( A1 a1, A2 a2 ) : Data(a1, a2) {}

template <class A1, class A2, class A3>
_init_( A1 a1, A2 a2, A3 a3 ) : Data(a1, a2, a3) {}

....

void* operator new( size_t, T* Ptr )
{
return Ptr;
}

void operator delete( void* )
{
}

void operator delete( void*, T* )
{
}
};





[Edited by - 1hod0afop on July 8, 2008 9:55:25 AM]

Share this post


Link to post
Share on other sites
The push_back does *a lot* of stuff, including calling custom allocator.

As such, function cannot optimize in-place construction. This is problem of value semantics, and not something that can be solved by compiler.


If you find that this in particular is a problem, the solution is this:
struct Obj {

int a;
char b;

Obj() {}
Obj(int a, char b):a(a), b(b) {}
};


int main(int argc, char**argv)
{
std::vector<Obj> vec;

int last = vec.size();
vec.resize(last+1);
vec[last] = Obj(5, 'b');

return 0;
}



It results in the following
	int last = vec.size();
vec.resize(last+1);
00401F1C lea ecx,[eax+1]
00401F1F lea edx,[esp+8]
00401F23 mov dword ptr [esp+24h],eax
00401F27 call std::vector<Obj,std::allocator<Obj> >::resize (401130h)
vec[last] = Obj(5, 'b');
00401F2C mov eax,dword ptr [esp+0Ch]
00401F30 mov ecx,5
00401F35 mov dword ptr [eax],ecx
00401F37 mov byte ptr [esp+4],62h
00401F3C mov ecx,dword ptr [esp+4]



It should be noted that this is exactly what push_back does, but it allows you to control resizing more accurately. One that is done however, the assignment can be performed directly.

Share this post


Link to post
Share on other sites
One solution I found so far is this
struct cust_init_tag {};
void* operator new( size_t, void* Ptr, cust_init_tag )
{
return Ptr;
}
void operator delete( void*, void*, cust_init_tag ) {}

template <class T> struct _destruct_
{
T Value;
void operator delete( void* ) {}
};
template <class T> void destruct( T* p )
{
delete reinterpret_cast< _destruct_<T>* >(p);
}



And now I can invoke constructors and destructors like normal functions:
struct Foo
{
Foo( int, int );
~Foo();

...

};

Foo* pFoo = (Foo*) ::malloc( sizeof (Foo) );

new(pFoo, cust_init_tag()) Foo( 1, 2 );
destruct( pFoo );

::free( pFoo );




But looks messy. Please tell me if there's better way.

And my boss still wants me to implement the CRAZY arraylist. =(

Share this post


Link to post
Share on other sites
I'll again recommend that you take a look at boost::object_pool. It allows you to do this:


void func() {
boost::object_pool<Foo> pool;
Foo *ptr = pool.construct(1, 2);
// Destructor for Foo is called when the pool goes out of scope
}

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
In theory, yes. You could write code such as:

void foo(Bar a);
int main() { foo(Bar()); }


And no temporary value would be created, skipping the copy construction and instantiating the argument directly in the function.

In practice, what happens is that the push-back function usually looks like:

void push_back(const value_type &t)
{
if (size() == capacity()) { reserve(capacity() * 2); }
new (this->_data + size()) value_type(t);
++ this->_size;
}


Meaning the calling function cannot know where the object should be instantiated, and thus no optimization can be applied. Of course, some compilers may apply some additional optimizations to their associated SC++L implementations, but I don't think this one happens.


Couldn't it inline the push_back implementation and then delay the object construction until after the desired location is known, constructing in place?

Share this post


Link to post
Share on other sites
Quote:
Original post by Gage64
I'll again recommend that you take a look at boost::object_pool.


I saw it already. It's basically like this:

template <class T0> element_type* construct( T0 ) { ... }
template <class T0, class T1> element_type* construct( T0, T1 ) { ... }
template <class T0, class T1, class T2> element_type* construct( T0, T1, T2 ) { ... }
template <class T0, class T1, class T2, class T3> element_type* construct( T0, T1, T2, T3 ) { ... }

...
...





And my boss doesn't like it. It seems to me not genuine, either.
Anyways, thanks very much for your effort. The boost library gave me a lot of ideas.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Couldn't it inline the push_back implementation and then delay the object construction until after the desired location is known, constructing in place?


It can only swap the initialization and the memory test-and-reallocate if it can prove that the two are independent (otherwise, it would risk delaying a constructor call that manipulates the vector or the underlying allocator), which therefore can only happen if the constructor is entirely available. And, if the constructor is simple enough to be managed, it usually means that construct-then-copy isn't costly enough to warrant the optimization effort.

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
Isn't C++0x going to solve this issue with move semantics? Hooray, another meaning for the double-ampersand :)
I can't recall how likely this is at the moment. It would be great if it did, and I would hope that it would remove the copy here then.

However, I wouldn't think that too often a constructor call would be cheap but the corresponding copy-constructor isn't. When copying is expensive you can get around that by wrapping your type in a smart pointer and storing that instead, or other times by providing an efficient std::swap specialisation and using that on back() after a push_back() of a default-constructed object.

Op:
Many would frown upon the use of a oxymoron term "arraylist" btw. No kind of container could have the full advantages of both a list and an array, and typically wherever a concocted abomination like this used, people end up suffering with the disadvantages of both.
Lets just say I'm glad you think your boss's idea is CRAZY.

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Many would frown upon the use of a oxymoron term "arraylist" btw. No kind of container could have the full advantages of both a list and an array, and typically wherever a concocted abomination like this used, people end up suffering with the disadvantages of both.


"ArrayList" is not an oxymoron—a list is just an ordered sequence of elements, which is perfectly compatible with the concept of array. Consider for instance the Java standard library, which provides a base List interface implemented by ArrayList and LinkedList classes (each with their own specific performance constraints).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628328
    • Total Posts
      2982099
  • Similar Content

    • By abarnes
      Hello!
      I am a game development student in my second year of a three year program and I would like to start building my portfolio. I was thinking of creating some games to show what I can do to potential employers since I wont have any work related experience when I graduate. But as I'm sure you all know there are tons of ways to approach developing/designing a game and I'm curious if anyone had any insight as to any "standards" that come with this? Is it okay to use game engines like Unity, Unreal, Game Maker etc? Or would it be better to make a game from scratch to better show case your skills? Any and all advice will be greatly appreciated!
    • By Hilster
      Hello 2D Artists,
      I've started making a 2D Puzzle Adventure game for mobile and I'm looking for someone who would want in on creating assets for the game. The core of the programming is pretty much complete, you can walk within the grid laid out and push boxes, when there is an object on top of a pressure pad it will activate the linked objects or if there is one object with multiple linked pressure pads it requires you to activate all points for the object to become active. 

      The level iteration for the game is quick and simple, a Photoshop file that is made of individual pixels that represents objects is put into the game and it creates the level out of those pixels with the assigned objects.
      The objects that need sprites created so far is the character, box, pressure pad, door, trap door, the walls, the stairs and the tiled background.
      I intend to add more objects so the amount I'd like to add will be extended.
      My motivations for posting here is to have something that looks nice to be able to display on my portfolio, so if you're looking for a working game that you can place your art into and improve the look of your portfolio then we're in business.
      Please reply with a few past examples of your art below and I'll be in touch!
    • By thefollower
      Hi
      I have set up my TcpClient to connect to my server and that works fine. But i am a bit confused how i read messages from the network stream with it continuously running via async, without ever closing the connection ?
      My TcpClient Manager class has:
       
      public async Task<bool> Connect(string address, int port) { try { await _tcpClient.ConnectAsync(address, port); IsConnected = true; return true; } catch(Exception e) { Debug.Log(e); return false; } } public async Task<int> Read(byte[] readBuffer) { if (!IsConnected) return -1; using (var networkStream = _tcpClient.GetStream()) { try { var bytesRead = await networkStream.ReadAsync(readBuffer, 0, readBuffer.Length); return bytesRead; } catch (Exception e) { Debug.Log(e); IsConnected = false; return -1; } } }  
      So i thought to just run a co-routine and call Read constantly to get the most recent message, but that doesn't make much sense to me since a co-routine would be blocked with the await. How is this actually done? The MS Docs don't have very good Async examples with the TcpClient class so i don't know fully get how to keep calling Read correctly.
    • By NUITGaming
      Landscaping back ground maid in Unreal 4.18
  • Popular Now