# Copy-on-Write - How?? [subject name changed]

This topic is 4408 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey all, I need a sugestion or two. Say I had a matrix class, Mat. It is an arbitrary-size 2D matrix, using a std::vector<float> as its internal representation. These objects will be of very large size, so I'd like to avoid any unnecessary construction / destruction, while keeping their use simple. Now, here is the syntax of what I would like to do:
Mat A, B, C;
A = fft( B );
// or
A = B = fft( C );
// or
A = fft( fft(B) );


Very simple. I am getting stuck in the details however. Should fft() return by value? If so, there are unnecessary copies going on. Should it allocate on the heap and return a pointer? If so, the syntax gets a bit uglier (unless Mat::operator=( Mat* ) is defined, but then the user must save the pointer, and handle deleteing). Should there be a global vector<Mat*> to keep track of all Mat's? Should I pass around handles as a level of indirection in this case? The real problem seems to be fft( Mat ), I'm not sure how to allocate the result-of-the-fft's memory. Would inlining the fft() function get rid of the return-by-value copying cost? Thanks for tips. [Edited by - discman1028 on September 17, 2006 4:59:29 PM]

##### Share on other sites
Compiler optimisation may or may not take care of it, depending on the compiler.

#include <stdio.h> // don't have stl port for one of the compilers I testedclass a{public:    a(int v) : value(v) { printf("constructor\n"); }    a(const a &v) : value(v.value) { printf("copy\n"); }    int value;};a func(int x){    a ret(x); return ret;}int main(){    a x=func(10);}

If the above code is compiled with Borland BCC55, it outputs:

constructor
copy

If the same code is compiled with DigitalMars DMC, it just outputs:

constructor

To be fair to Borland, there is probably an option to turn on optimising that would take care of this, but my point is you shouldn't be relying on optimisations like this.

A method that is sometimes used with string classes to address the issues you describe is to have the string class contain a pointer to a private implementation class. The implementation class holds a reference count of all the string objects that are pointing to it and when a string is copied, the implementation is just pointed to by another string class. Only when the string class actually tries to modify its data is a check made to see if the reference count of the implemetation pointed to is greater than one and, if so, the modifying string first creates its own copy of the data before modifying that.

Stroustrup describes such a string in The C++ Programming Language. I don't know if this approach would be useful in the situation you describe.

##### Share on other sites
Quote:
 Original post by EasilyConfusedTo be fair to Borland, there is probably an option to turn on optimising that would take care of this, but my point is you shouldn't be relying on optimisations like this.

Quote:
 Original post by EasilyConfusedA method that is sometimes used with string classes to address the issues you describe is to have the string class contain a pointer to a private implementation class. The implementation class holds a reference count of all the string objects that are pointing to it and when a string is copied, the implementation is just pointed to by another string class. Only when the string class actually tries to modify its data is a check made to see if the reference count of the implemetation pointed to is greater than one and, if so, the modifying string first creates its own copy of the data before modifying that.

This does sound good. So the fft() would return a Mat by value, which is a tiny copy. I'll get back to you with any more questions I have (probably on how to decide when a Mat's about to be modified), but this sounds good, thanks. Other comments welcome.

##### Share on other sites
Quote:
 Original post by discman1028I'll get back to you with any more questions I have (probably on how to decide when a Mat's about to be modified)

That certainly is the tricky bit [smile]. In the example Stroustrup uses for a copy-on-write string, obviously operator= and operator+= and so on are quite straightforward, but he does some very clever stuff with operator[] (accessing characters) whereby on the non-const operator[], a class wrapping a char but binded with a reference to the owning string is returned.

If memory serves, this class then overloads all the operators for modification and can then defer any modifications to an individual character back to the string, which can then check to see if it needs its own copy first before modifying.

Been a while since I read that book though. Googling "copy-on-write c++" seems to return quite a lot of results.

##### Share on other sites
In my case, I think it will be an acceptable assumption to assume that non-const functions will modify the data, and as such, a copy is okay to do. As a detail, I will only be using Mat::operator(...), probably not Mat::operator[].

And thanks, "copy-on-write", very cool, a new thing to learn. ;)

##### Share on other sites
Quote:
 Original post by discman1028In my case, I think it will be an acceptable assumption to assume that non-const functions will modify the data, and as such, a copy is okay to do.

Only problem with that is that if you have a const and non-const accessor, say, then even if you are using the accessor in a non-modifying context, if the object is non-const, the non-const accessor will be called anyway.

Maybe this daft little snippet will explain what I mean better:

class a{private:    int val;public:    a(int v) : val(v) { }    int &value(){ return val; }    int value() const { return val; }};int main(){    a x(10);    int z=x.value();}

That could be a typical example of where your accessor returns a reference for the non-const, and by value for the const.

The trouble comes in the last line, where x.value() is assigned to z so the accessor is used in the non-modifying sense. Because x is a non-const object the non-const member is still the one selected.

I don't know if this will present problems in the context you plan to use your class but I thought I'd mention it anyway.

##### Share on other sites
Yeh, it will, and it's a good point -- I hadn't thought it through that well yet. I am looking at boost::shared_ptr currently. I guess I will look into some "copy-on-write" then, first.

EDIT: Well, actually, considering my simple case:

Mat A, B, C;A = fft( B );// orA = B = fft( C );// orA = fft( fft(B) );

The following two functions should suffice (I don't need to check for a modify, I know which mat's will need to be modified):

Mat& Mat::operator= (Mat const& mat) <== no copy, just change pointer
Mat& Mat::operator= (Mat mat) <== no copy, just change pointer
Mat fft( Mat const& mat ) <== new Mat, return by value

Again, I havent thought this through completely.

##### Share on other sites
I wouldn't bother with both those operator='s. The first will suffice. Since it is a const reference it will happily take temporaries as well as properly addressable objects.

You also want a copy constructor:

Mat::Mat(const Mat &Copy){ /* ... */ }

That has to take a const reference otherwise you get an infinte recursion.

By the way, I should probably also suggest that you first write all your code forgetting all this copy-on-write complexity and just copy everything by value. If profiling then reveals that this is actually causing a problem, this copy-on-write stuff is an optimisation that is better performed then.

Unless you are writing Mat for a library and you can't forsee all its possible future uses, anyway.

##### Share on other sites
Quote:
 Original post by EasilyConfusedI wouldn't bother with both those operator='s. The first will suffice. Since it is a const reference it will happily take temporaries as well as properly addressable objects.

Oops, brain fart.

Quote:
 Original post by EasilyConfusedUnless you are writing Mat for a library and you can't forsee all its possible future uses, anyway.

Yeh, that's actually the case - it is to be designed for large problems. Thanks, and I'll post some code when I have it!

##### Share on other sites
Okay, I have a test program made. I have a Mat class, a MatImplementation class, and a templated cow_ptr (copy-on-write) class which uses boost::shared_ptr for ref-counting.

I decided to make fft() a member function of Mat for now, as it makes it easier to understand.

Here is main.cpp:

#include <cstdlib>#include <vector>#include <iostream>#include "cow_ptr.h"using namespace std;/// The actual fft.void thefftimplementation( vector<float>& rData ){	// just do something dumb like multiply by 2.0f for now	for( int i = 0; i < rData.size(); i++ )	{		rData *= 2.0f;	}}class MatImplementation{public:	/// Constructors.	MatImplementation( int nRows, int nCols )	{		m_data.resize( nRows*nCols );		for( int i = 0; i < nRows*nCols; i++ )		{	// Fill with random data for now			m_data[ i ] = 1.0f*rand();		}	}	MatImplementation( MatImplementation const& m ) : m_data( m.m_data ) {}	/// = operator.	MatImplementation& operator=( MatImplementation const& m )	{		m_data = m.m_data;		return *this;	}	/// Destructor.	~MatImplementation() {}	/// Honest-to-goodness matrix data.	vector<float> m_data;};class Mat{public:	/// Constructors.	Mat( int nRows, int nCols ) : pMatrixData( 0 )	{		pMatrixData = new MatImplementation(nRows, nCols);	}	Mat( Mat const& m ) : pMatrixData( m.pMatrixData ) {}	/// = operator.	Mat& operator=( Mat const& m )	{		pMatrixData = m.pMatrixData;		return *this;	}	/// FFT operation.	Mat fft()	{		Mat returnMat( *this );		thefftimplementation( returnMat.pMatrixData->m_data );		return returnMat;	}	/// Destructor.	virtual ~Mat() {}	protected:	/// Copy-on-write matrix.	cow_ptr<MatImplementation> pMatrixData;};int main(int argc, char* argv[]){	Mat a( 10, 10 );	Mat b( 10, 10 );	a = b.fft();	return 0;}

And here is cow_ptr.h, adapted from an online example.

#ifndef _COW_PTR_H_#define _COW_PTR_H_#include <boost/shared_ptr.hpp>template <typename DataType>class cow_ptr{public:	/// Constrcutors.	cow_ptr() {}	cow_ptr( DataType* ptr ) : m_pPtr(ptr) {}	/// = operator.	cow_ptr& operator=( cow_ptr const& rPtr )	{		m_pPtr = rPtr.m_pPtr;		return *this;	}	/// Destructor.	virtual ~cow_ptr() {}	/// Const ops.	DataType const* get() const		{ return m_pPtr.get(); }	DataType const* operator->() const	{ return m_pPtr.get(); }	DataType const& operator*() const	{ return *m_pPtr; }	/// Non-const ops.	DataType* get()				{ MakeCopy(); return m_pPtr.get(); }	DataType* operator->()			{ MakeCopy(); return m_pPtr.get(); }	DataType& operator*()			{ MakeCopy(); return *m_pPtr; }protected:	/// Make a new copy on the heap of the data currently pointed to.	void MakeCopy()	{		// If we are pointing to unique data, there is no need to copy. Else, copy.		if( !m_pPtr.unique() )		{			m_pPtr.reset( new DataType( *m_pPtr ) );		}	}	/// The ref-counted pointer.	boost::shared_ptr<DataType> m_pPtr;};#endif

PLEASE, comments! Soon, the matrix data type will be templated as well, as well as some other functionality, so abstracting this COW as best as possible is important!

EasilyConfused, I believe your comment about the non-const function always being called still holds here, but perhaps the const version of get() can be forced to be called by passing const references into a matrix get function, and just a reference into a set function.

Thanks.

1. 1
Rutin
30
2. 2
3. 3
4. 4
5. 5

• 13
• 54
• 11
• 10
• 14
• ### Forum Statistics

• Total Topics
632967
• Total Posts
3009554
• ### Who's Online (See full list)

There are no registered users currently online

×