How do I properly return a private class member vector using move semantics?

Started by
80 comments, last by Oberon_Command 7 years, 3 months ago

Hi

I have a class named "Window" that has a private vector of messages called m_windowMessages.

This class has a public method named "moveMessages" and I want it to return the private vector of messages using move semantics without messing up the private vector so that I can continue to push new messages on it after "moveMessages" returns.

But I don't know what solution to use, I find some of them buggy, I think m_windowMessages get mangled and it has to renew its internals. Could the std::tuple ble the problem, should i just use a struct?


class Window
{
public:
	Window();
	~Window();
	bool initialize(HINSTANCE hInstance);

	std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> moveMessages();

	bool postMessage(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);

private:
	bool filterMessage(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);

	HWND m_hWnd = nullptr;
	HINSTANCE m_hInstance = nullptr;
	WNDCLASSEX m_wc;

	std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> m_windowMessages;
};

Window::Window()
{
	ZeroMemory(&m_wc, sizeof(WNDCLASSEX));
	
	m_wc.lpfnWndProc = (WNDPROC)[](HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)->LRESULT		// non-capture lambda
	{
		Window* p_Window = nullptr;
		if (message == WM_NCCREATE)
		{
			p_Window = reinterpret_cast<Window*>(((LPCREATESTRUCT)lParam)->lpCreateParams);
			if (p_Window == nullptr)
				throw;

			SetLastError(0);
			if (SetWindowLongPtr(hWnd, GWLP_USERDATA, reinterpret_cast<LONG_PTR>(p_Window)) == 0 && GetLastError() != 0)
				throw;
		}

		if ((p_Window = reinterpret_cast<Window*>(GetWindowLongPtr(hWnd, GWL_USERDATA))) == nullptr)
			return DefWindowProc(hWnd, message, wParam, lParam);

		if (p_Window->postMessage(hWnd, message, wParam, lParam))
			return 0;

		return DefWindowProc(hWnd, message, wParam, lParam);
	};
}

std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> Window::moveMessages()
{
	MSG msg;

	while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
	{
		TranslateMessage(&msg);
		DispatchMessage(&msg);
	}

	// What to do??

	return std::move(m_windowMessages);		// Mangles m_windowMessages causing application to be laggy

	std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> windowMessages;
	std::swap(windowMessages, m_windowMessages);
	return windowMessages;				// Kinda laggy

	return std::move(windowMessages);		// App also stutters here
}

bool Window::postMessage(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
	if (filterMessage(hWnd, message, wParam, lParam))
	{
		m_windowMessages.push_back(std::make_tuple(hWnd, message, wParam, lParam));
		return true;
	}

	return false;
}

bool Window::filterMessage(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
	if (hWnd != m_hWnd) return false;

	// return true for messages I want keep on m_windowMessages for the caller of moveMessages() to handle later on
	switch (message)
	{
		case WM_MOUSEMOVE: return true;
		case WM_LBUTTONDOWN: return true;
		case WM_LBUTTONUP: return true;
		case WM_CHAR: return true;
		case WM_SYSKEYUP: return true;
		case WM_SYSKEYDOWN: return true;
		case WM_DESTROY: return true;
		case WM_SYSCOMMAND:
			switch (wParam)
			{
				case SC_KEYMENU: return true;		// Disable system menu (F10 Key)  
				case SC_MONITORPOWER: return true;	// Disable power saving monitor shutdown
			}
			break;
	}
	return false;
}

Window MyWindow;
MyWindow.initialize(hInstance);

while(true)
{
	auto newMessages = MyWindow.moveMessages();

	/.../
	// processing newMessages
	/.../
}
Advertisement
I question how you are seeing lines as 'laggy' however...

When you tell it to 'move' the vector you are saying "I no longer want the vector you are moving from to be valid" which means it gives up everything, including the memory buffer it was pointing at.

When you next 'push back' it has to reserve new memory to fill all over again.

Assuming you've profiled this properly NOT in debug and found this to be a problem then I would suggest either;
a) reserving at the start of the processing loop a sane amount of memory
b) not moving the vector out, instead return const_iterators and processing the messages using that to iterate before clearing the buffer at the start of the next frame.

Hi, thx for reply, but didn't help :(

Reserving does not fix the issue.

I want the messages out of Window without deep copy. I do not want interators pointing into private members, what if the Window is destroyed while I use the iterators?

I have changed some of the code for more testing, such that m_windowMessages is now a pointer to a vector, then I assign the pointer the address of a local vector, run PeekMessage loop which will post messages using the pointer, then try to return by rvalue. But no luck! :(

Funny enough, if I do a deep copy there is no lag hah, but then m_windowMessages just builds up to have thousands of messages (mostly WM_MOUSEMOVE) just in a few seconds.

I think the problem is the "return std::move(.....)" is not working properly, something I got wrong. Is it even possible to return rvalue? How do the caller properly fetch rvalue to invoke the move constructor: "auto windowMessages = m_Window->moveMessages();"?

This one is correct:


std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> windowMessages;
std::swap(windowMessages, m_windowMessages);
return windowMessages;				// Kinda laggy

The reason why it is better than returning an rvalue, is because while you are avoiding a COPY by doing a MOVE, you are actually (possibly) getting in the way of the compiler from doing neither a move nor a copy - RVO (return-value-optimization) permits compilers to make "windowMessages" actually not exist, and assign directly to the variable that is receiving the return value.


i.e. 'meow' in:    meow = blah.moveMessages();

You say it's "kinda laggy", but it shouldn't be laggy - it should be very fast (internally it should swap a few pointers and integers). So you need to profile and debug to see why it's laggy. Likely your lag is unrelated to that particular piece of code (though I'd move "auto newMessages" to outside the loop, so you can help hint to the compiler to re-use it over - the compiler should be able to figure it out, but force of habit on my part, and a good sanity check).

Also, I'd rename the function "takeMessages()", since I believe "take" is the more commonly-used word for this kind of scenario (take = get and claim ownership / remove from current owner), but maybe that's just in my neck of the woods. Occupied solely by me. :P

Your swap (two moves) might be unnecessary there. I'd do it like this:


std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> windowMessages = std::move(m_windowMessages);
m_windowMessages.clear();

return windowMessages;

That should be kosher, but if std::vector has a stupid implementation, it might not be - in which case you can go back to the one above.

Oh, and in answer to your other question, yes you can return r-values, but unless you have a reason to, you should prefer returning l-values, because the compiler's RVO is superior and returning by r-value might trigger the compiler to disable its RVO to explicitly follow your request.

Two examples of standard library functions that return by r-value are the r-value casting functions: std::move() and std::forward().

Hi, thanks for your input.

I might change the name to "takeMessages()" :) , I was thinking of maybe using "getMessages()" but that could be interpreted/confused as a deep copy, leaving copies behind.

If I do manual variable hoisting if I don't trust the compiler to do it, then I still would have to clear it each time inside anway, so trust the compiler to do it and I save 1 line of code :lol:

So, if I stay away from std::move and && return type declarations, the compiler might return vector by rvalue anway right?

Is there a chance the compiler will return by value(deep copy) in that case?

What is the guarantee it will never do deep copy in this case?

Also, is there a size where the return vector is large enough for rvalue to outperform RVO? In this case, will this be resolved run-time and selected as appropiate, or is it compiled either as RVO or rvalue move?

I might change the name to "takeMessages()" , I was thinking of maybe using "getMessages()" but that could be interpreted/confused as a deep copy, leaving copies behind.


Yes, getX() would be misleading; forming and adopting common nomenclature for functions is helpful in coding. get() implies a deep copy (usually just for small types) or the return of a handle/reference, whereas take() implies getting and removing from the original owner, often with whatever optimizations can benefit.

If I do manual variable hoisting if I don't trust the compiler to do it, then I still would have to clear it each time inside anway, so trust the compiler to do it and I save 1 line of code

Since you are assigning to it, you don't have to clear it.

So, if I stay away from std::move and && return type declarations, the compiler might return vector by rvalue anway right?

No, it'll very likely do something better than rvalue.

Is there a chance the compiler will return by value(deep copy) in that case?


Sure, if you are using a terrible compiler, or a compiler version more than five years old, or if your good modern compiler decides not to perform RVO in this specific function for some reason - which you then handle on a case-by-case basis.

What is the guarantee it will never do deep copy in this case?

No guarantee. :P The same no guarantee that C++ intentionally has available that permits C++ to be faster than every other language out there. :wink:
The standard intentionally does not dictate how compilers do things, the standard merely says what the result should be, and intentionally gives compilers the elbow-room to do optimizations.

An example would be v-tables or empty-base optimization. Or small-string optimization. The standard doesn't dictate any of them, afaik.

There are optimizations I generally trust because they are so widespread, and optimizations I generally don't until I test to make sure. RVO is one I assume is present until (on a case-by-case basis) it is proven otherwise.

Also, is there a size where the return vector is large enough for rvalue to outperform RVO?

Nope, never. Provided a basic level of sanity in the implementation of your compiler's RVO.

In fact, that's an odd question, because even between two rvalue moves, the size of a vector is mostly irrelevant. Provided a basic level of sanity in the implementation of std::vector, it is equally fast to move a million-element vector as it is to move a single element vector - unlike with copying.
But RVO will be faster than the move. If the compiler does RVO (which is common if you have a single return in your function), it is the fastest possible solution. The fastest code is no code, and RVO is a way for the compiler to remove as much code as possible and still get the same result as returning an rvalue or lvalue.

NOTE: "provided a basic level of sanity in the implementation" is an assumption I always have to make or reject for anything in C++; with RVO, we're talking about the compiler, with std::vector's move-constructor/move-assignment, we're talking about whichever standard library implementation you are using (hint: three or four widely-used different implementations of the standard library exist).

In this case, will this be resolved run-time and selected as appropiate, or is it compiled either as RVO or rvalue move?

At compile-time, because compile-time is faster than run-time. Further, no decision at run-time is needed - RVO is faster. Always. :P

The question is only, "is the compiler choosing to do it, or not?".

Well, no, the question really is, "is my function running fast enough, or not?", and if the answer is, "No, it's too slow", then the next question is, "Why is it slow?".

Returning a vector by RVO, or by move-semantics, should be very fast! Especially since you're only doing it once per frame. Heck, even if you were doing a hundred thousand times a frame, it should still be fast. Because you are saying it's lagging, something in your code somewhere is wrong, and I can't see it in this:


std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> windowMessages;
std::swap(windowMessages, m_windowMessages);
return windowMessages;

So I'd look elsewhere first.

I'd check in here first:


while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
{
	TranslateMessage(&msg);
	DispatchMessage(&msg);
}

(Also, since you are doing event processing inside a function that, by its name, implies it's just getting something and quickly returning, neither move() nor take() works as a good name in that regard. More like "processMessages()". I hadn't realized that the first time around)

If I do manual variable hoisting if I don't trust the compiler to do it ...
Compilers are engineered to optimize for the common case that everybody writes. By trying to write smart code, you're writing unusual code, thereby disabling a lot of the heuristics in the compiler for recognizing cases that can be optimized by doing something completely different, you'd never ever invent by yourself. (Or if you could, you may not be able to express it in C++.)

In about 99% of the cases, writing simple straight-forward code is the way to go, giving the compiler optimal room for performing its smart tricks.

RVO is faster. Always.

I think I remember watching a video where Scott Meyers or the MS compiler guy said something about RVO, where if the return size is larger than 4k~ then RVO could be worse then rvalue since the caching/paging mechanisms of the stack would not be large enough for RVO to function without the CPU fetching more pages to the cache. Afaik the retern value has to be at the top of the stack after the function returns, and if the pre-fetcher don't know how much memory the RVO will use on the stack then it would have to reallocate for every page it has to grab. Do you think that applies to returning a std::vector since internals is pointing to heap?

I made a mistake in my first post, where I said: "I'd move "auto newMessages" to outside the loop, so you can help hint to the compiler to re-use it"
I had forgotten that for RVO to take effect, you have to construct the variable from the result of the function (not simply assign), so the variable can't be brought outside the loop (oops) if you also want RVO - in that case, I'd still bring it outside the loop (saving the repeated constructions/destructions).

RVO is faster. Always.



I think I remember watching a video where Scott Meyers or the MS compiler guy said something about RVO, where if the return size is larger than 4k~ then RVO could be worse then rvalue since the caching/paging mechanisms of the stack would not be large enough for RVO to function without the CPU fetching more pages to the cache. Afaik the retern value has to be at the top of the stack after the function returns, and if the pre-fetcher don't know how much memory the RVO will use on the stack then it would have to reallocate for every page it has to grab. Do you think that applies to returning a std::vector since internals is pointing to heap?


Seeing that the compiler can do whatever it wants, as long as it gets the same result, we can't say for sure what it's doing in your case without seeing the assembly. (and in my case, I haven't yet taken time to learn assembly. :ph34r:)

That said, we can theoretically walk through both cases of RVO and move-semantics. Let's say your vector contains 1 MB of data:


struct MyElement
{
   uint32_t x,y,z,w; //16 bytes.
};

template<typename Type = MyElement>
class MyVector //Our pretend implementation of std::vector
{
    size_t elementCount = {67 million and change};
    Type *data = {1 MB of data};

public:
    MyVector(MyVector &&other) //Move constructor.
    {
        std::swap(elementCount, other.elementCount);
        std::swap(data, other.data); //Simply swaps the *pointer*, does *not* move around any data! Doesn't even deference the pointer (so that particular cache line isn't even called up).
    }

}; //Likely 16 bytes on a 64 bit OS (a 64 bit integer and a 64 bit pointer).

(Some implementations of std::vector use two pointers (begin and end) rather than 'begin' and 'count', because this is more optimized CPU-wise, but that doesn't affect this discussion, and the class size would end up being pretty much the same (two 64bit pointers vs a 64 bit pointer and a 64 bit integer)).

So now let's examine two theoretical straight-forward implementations of returning r-values vs RVO.

Returning r-value:


//------------------------
//Example situation:
//------------------------

MyVector&& MyFunc()
{
    MyVector localVector;
    std::swap(localVector, this->memberVector);
    
    return std::move(localVector);
}

MyVector finalVector = MyFunc();

//------------------------
//Example implementation:
//------------------------

Allocate finalVector //16 bytes
Call MyVector() default-constructor on finalVector

Allocate MyFunc_localVector //16 bytes
Call MyVector() default-constructor on newly allocated MyFunc_localVector

Swap MyFunc_localVector and MyFunc_memberVector
.....(three moves* and an MyVector temporary allocation) *[by 'move' I mean a call to move-assignment or move-constructor)
..........(each MyVector move is two swaps)
................(each swap is three moves and a temporary allocation)

Call Function std::move() on MyFunc_localVector
.....(std::move() is just a cast from l-value reference to r-value reference)
.....[remember: the function takes a universal reference, not a r-value reference - that can look confusing]
.....[this std::move() will very likely get optimized out entirely, unless I'm confused]

Return the result and assign to finalVector
.....Move [unnamed MyVector r-value] into finalVector
..........(each MyVector move is two swaps)
................(each swap is three moves and a temporary allocation)

Destruct [unnamed MyVector r-value]

//Later:
Destruct finalVector

If RVO takes affect, then it (possibly) looks like this:


Allocate finalVector //16 bytes
//Skipped: Call MyVector() default-constructor on finalVector

//Skipped: Allocate MyFunc_localVector //16 bytes
Call MyVector() default-constructor on [[pre-existing]] [[finalVector]]

Swap [[finalVector]] and MyFunc_memberVector
.....(three moves* and an MyVector temporary allocation) *[by 'move' I mean a call to move-assignment or move-constructor)
..........(each MyVector move is two swaps)
................(each swap is three moves and a temporary allocation)

//Skipped: (in the other case, it'd be likely optimized out anyway)
//   Call Function std::move()
//   .....(std::move() is just a cast from l-value reference to r-value reference)
//   .....[remember: the function takes a universal reference, not a r-value reference - that can look confusing]
//   .....[this std::move() will very likely get optimized out entirely, unless I'm confused]

//Skipped:
//   Return the result and assign to finalVector
//   .....Move [unnamed MyVector r-value] into finalVector
//   ..........(each MyVector move is two swaps)
//   ................(each swap is three moves and a temporary allocation)

//Skipped: Destruct [unnamed MyVector r-value]

The only problem in this case is that RVO might not get triggered.

But here's good news: If RVO isn't triggered, code like this:


std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> Window::moveMessages()
{
	//...blah...

	std::vector<std::tuple<HWND, UINT, WPARAM, LPARAM>> windowMessages;
	std::swap(windowMessages, m_windowMessages);
	return windowMessages;
}

...that's used like this:


auto newMessages = MyWindow.moveMessages();

...has newMessages' move-constructor getting called anyway, because the result of that function is still an rvalue!


MyType x = (y + z); //This is returning an rvalue and giving it to 'x'.

As you know, '(y + z)' is really a function call: 'operator+(y, z)'. Yet the result is an rvalue, because once the function exits, there's no existing variables connected to the value.
In the same way, MyWindow.moveMessages()'s result is an rvalue. It's an unnamed temporary variable that the compiler knows isn't being used by anything else.

Returning by value will either RVO (when possible), or rvalue (if move-constructor/move-assignment operators are available for that type), or deep copy if your compiler doesn't support move-semantics (and if your compiler doesn't support move semantics, calling std::move() ain't gonna make a difference! :wink:).

But anyways, regardless of whether the compiler uses RVO is used or move-semantics, the 1MB data in the vector isn't touched.

So this shouldn't be the location of your slowdown, unless I'm missing something (which is very possible! :P).

Have you really confirmed through profiling that your slowdown is in how you return from that function?

For the record; returning a local variable via move semantics is, per Effective Modern C++ (for C++11 & 14) by Scott Meyers, a bad idea and will more than likely disable RVO/NRVO and force the compiler in to a move instead of doing clever in place initialisation of the returned value.

This topic is closed to new replies.

Advertisement