References and Copying

Started by
4 comments, last by ToohrVyk 17 years, 2 months ago
I've been working on my own language lately and I want it to be "purely oo". So, I was planning on all types being reference types, even "primitive types" (such as int, float, char, ect). But many problems seem to come up at that point, such as pass by reference/value and shallow/deep copy. In C#, if you do something like: myClass a = new myClass(); myClass b = a; Then you have copied the reference. I dont want my language to be like that, I want it to do a deep copy(clone) at that point. But, I also want the language to beable to pass the reference like the example(though maybe not without explicitly stating to)... I have came up with a few "solutions" for this: 1) An explicit operator/keyword to use when you want to pass by reference.(Like C#s "ref" keyword, or even like a pointer or reference in C++) 2) An explicit operator/keyword to use when you want to pass by value.(unlike C#s "ref" keyword, this would conceptually be "val") 3) By default, the = operator passes by value. Im not sure exactly which solution(s) to take or if im even approaching this correctly. Though, it looks like #3 combined with #1 is the best bet. Any advice, thoughts, knowledge, or corrections??? Thanks
Advertisement
How about mocking up some test code using each of the three methods and seeing which one you prefer the look of? Personally (and for no real reason) I'd lean towards 1 or 2, but which one would be better probably depends on which usage is going to be more common.
I could also have a copy constructor like in C++ or something similar to it.

Perhaps use = to do a deep copy but use the copy contructor to pass the reference?

I think what I will end up doing is using = to perform a deep copy and have an operator for doing a shallow copy.
Ok what you need is for the syntax to be essentially identical to C++. Of course you can use different operators or keywords or whatever so that it looks nicer to you. Now here is the difference: every variable is the same size. It holds the reference that you are talking about. Now the terminology might get a little confusing because people use words in so many ways so I am going to throw out some definitions:

pointer: a variable that refers to an object rather than holding an object. This is the same as a C++ pointer and also the same as a java reference

alias: a new name for an old variable. This is the same as a C++ reference

handle: this is an implementation detail. Programmers cannot create handles. All variables hold handles rather than objects. Thus in a 32 bit system handles are 32 bit addresses.

Ok so now when you do the following in a function:

char c;

it creates a 32 bit handle on the stack and allocates an 8 bit (or whatever) char on the heap. The handle points to the heap allocated char. When c leaves scope the char is freed

Ok now we do this:

double* dp;

this creates a 32 bit handle on the stack and allocates a 32 bit pointer on the heap. The pointer can point to a handle which would point to a double.

Now lets say you do this:

int x;
int y;

x=y;

No problem here. The assembly will dereference the handle to y and copy that value into the memory for x.

Well... no it won't.

You see what if we have type B and derived types D1 and D2?
B is 128 bytes, D1 is 164, D2 is 256


B b; //makes a variable for holding b, the default constructor is called allocating 128 bytes

b=d1; //the object held in b is destructed, its memory freed, then 164 bytes are allocated and then d1 is copied into it. So you get a deep copy without slicing.

b=d2; //the object held in d is destructed, its memory freed, then 256 bytes are allocated and then d1 is copied into it. So you get a deep copy without slicing.

So what is the big difference? You get polymorphic behavior and no slicing and you still get the niceness of C++. The downside? An efficiency hit. The compiler could optimize away the handles in cases where a class is not subclassed so you would get the best of both worlds. Hmm.... I might do this for my language.
I think you may have missed the point, or maybe I dont understand you. But one things I was really asking was, should a = b, pass b's reference to a, or the value?

If we pass by reference and do shallow copying and do say:
a=5;
b=a;
//at this point, a and b are BOTH references to the SAME object...

I dont think im understanding you correctly, it seems you typed it quickly.
But as far as I can tell, doing something such as a tree class wouldnt be possible using references with your idea...?
A few thoughts about how this is done in other languages. ML languages pass all objects by non-reseatable reference. For example, in OCaml:

let a = new foo inlet b = a in (* No copy *)b (* Returns the original *)


This has an interesting consequence:

let a = 5 inlet b = a in


Here, both a and b reference the same object, which is the integer 5. Since the value of 5 cannot be changed, all is well and the code works. To have a reseatable reference, you have to declare it manually:

let a = ref new base ina := new derived;let b = a in (* Reference copy *)let c = ref !a in (* Reference content copy *)


Here, b is the same reference as a, so changing a changes b. Meanwhile, c is a new independent reference (which happens to initially reference the same object, but can change independently).

In C++, references are non-reseatable, kind of like in ML. So, the first call to operator= (in the initialization) is actually a shallow copy (pass-by-reference), while all other calls to operator= are processed through the defined operator (the argument is passed by reference and used to alter the object in-place).

Other languages implement always-reseatable references. In these cases, a=b is always a shallow copy, and deep copy is implemented as a = b.clone().

In the end, looking at your semantics, it seems that you're saying one thing, but doing another. Implementing deep copy on assignment is a fundamental property of value types, and having this property in your language definitely eliminates its claims to having reference semantics. So, you are left with a funny language that pretends to handle types by reference, but actually handles them by value. This is a consequence of a simple fact you've overlooked: primitive types should be handled either by value, or by constant reference, never by reference. So, to keep up with your decision that "primitive types are treated like other types", and trying to manage the usability issues that come from primitive types being non-const references, you are effectively removing the entire "reference types" property from your language.

So, you have three options there:
  1. Do it the C++ way: give up reference-typing, since it screws up the so-called "value types".
  2. Do it the C# way: use value semantics for those value types, and reference types for the rest.
  3. Do it the ML way: use constant references for primitive types, with explicit mutable references.

This topic is closed to new replies.

Advertisement