Sign in to follow this  
Azh321

References and Copying

Recommended Posts

Azh321    569
I've been working on my own language lately and I want it to be "purely oo". So, I was planning on all types being reference types, even "primitive types" (such as int, float, char, ect). But many problems seem to come up at that point, such as pass by reference/value and shallow/deep copy. In C#, if you do something like: myClass a = new myClass(); myClass b = a; Then you have copied the reference. I dont want my language to be like that, I want it to do a deep copy(clone) at that point. But, I also want the language to beable to pass the reference like the example(though maybe not without explicitly stating to)... I have came up with a few "solutions" for this: 1) An explicit operator/keyword to use when you want to pass by reference.(Like C#s "ref" keyword, or even like a pointer or reference in C++) 2) An explicit operator/keyword to use when you want to pass by value.(unlike C#s "ref" keyword, this would conceptually be "val") 3) By default, the = operator passes by value. Im not sure exactly which solution(s) to take or if im even approaching this correctly. Though, it looks like #3 combined with #1 is the best bet. Any advice, thoughts, knowledge, or corrections??? Thanks

Share this post


Link to post
Share on other sites
OrangyTang    1298
How about mocking up some test code using each of the three methods and seeing which one you prefer the look of? Personally (and for no real reason) I'd lean towards 1 or 2, but which one would be better probably depends on which usage is going to be more common.

Share this post


Link to post
Share on other sites
Azh321    569
I could also have a copy constructor like in C++ or something similar to it.

Perhaps use = to do a deep copy but use the copy contructor to pass the reference?

I think what I will end up doing is using = to perform a deep copy and have an operator for doing a shallow copy.

Share this post


Link to post
Share on other sites
Glak    315
Ok what you need is for the syntax to be essentially identical to C++. Of course you can use different operators or keywords or whatever so that it looks nicer to you. Now here is the difference: every variable is the same size. It holds the reference that you are talking about. Now the terminology might get a little confusing because people use words in so many ways so I am going to throw out some definitions:

pointer: a variable that refers to an object rather than holding an object. This is the same as a C++ pointer and also the same as a java reference

alias: a new name for an old variable. This is the same as a C++ reference

handle: this is an implementation detail. Programmers cannot create handles. All variables hold handles rather than objects. Thus in a 32 bit system handles are 32 bit addresses.

Ok so now when you do the following in a function:

char c;

it creates a 32 bit handle on the stack and allocates an 8 bit (or whatever) char on the heap. The handle points to the heap allocated char. When c leaves scope the char is freed

Ok now we do this:

double* dp;

this creates a 32 bit handle on the stack and allocates a 32 bit pointer on the heap. The pointer can point to a handle which would point to a double.

Now lets say you do this:

int x;
int y;

x=y;

No problem here. The assembly will dereference the handle to y and copy that value into the memory for x.

Well... no it won't.

You see what if we have type B and derived types D1 and D2?
B is 128 bytes, D1 is 164, D2 is 256


B b; //makes a variable for holding b, the default constructor is called allocating 128 bytes

b=d1; //the object held in b is destructed, its memory freed, then 164 bytes are allocated and then d1 is copied into it. So you get a deep copy without slicing.

b=d2; //the object held in d is destructed, its memory freed, then 256 bytes are allocated and then d1 is copied into it. So you get a deep copy without slicing.

So what is the big difference? You get polymorphic behavior and no slicing and you still get the niceness of C++. The downside? An efficiency hit. The compiler could optimize away the handles in cases where a class is not subclassed so you would get the best of both worlds. Hmm.... I might do this for my language.

Share this post


Link to post
Share on other sites
Azh321    569
I think you may have missed the point, or maybe I dont understand you. But one things I was really asking was, should a = b, pass b's reference to a, or the value?

If we pass by reference and do shallow copying and do say:
a=5;
b=a;
//at this point, a and b are BOTH references to the SAME object...

I dont think im understanding you correctly, it seems you typed it quickly.
But as far as I can tell, doing something such as a tree class wouldnt be possible using references with your idea...?

Share this post


Link to post
Share on other sites
ToohrVyk    1596
A few thoughts about how this is done in other languages. ML languages pass all objects by non-reseatable reference. For example, in OCaml:

let a = new foo in
let b = a in (* No copy *)
b (* Returns the original *)


This has an interesting consequence:

let a = 5 in
let b = a in


Here, both a and b reference the same object, which is the integer 5. Since the value of 5 cannot be changed, all is well and the code works. To have a reseatable reference, you have to declare it manually:

let a = ref new base in
a := new derived;
let b = a in (* Reference copy *)
let c = ref !a in (* Reference content copy *)


Here, b is the same reference as a, so changing a changes b. Meanwhile, c is a new independent reference (which happens to initially reference the same object, but can change independently).

In C++, references are non-reseatable, kind of like in ML. So, the first call to operator= (in the initialization) is actually a shallow copy (pass-by-reference), while all other calls to operator= are processed through the defined operator (the argument is passed by reference and used to alter the object in-place).

Other languages implement always-reseatable references. In these cases, a=b is always a shallow copy, and deep copy is implemented as a = b.clone().

In the end, looking at your semantics, it seems that you're saying one thing, but doing another. Implementing deep copy on assignment is a fundamental property of value types, and having this property in your language definitely eliminates its claims to having reference semantics. So, you are left with a funny language that pretends to handle types by reference, but actually handles them by value. This is a consequence of a simple fact you've overlooked: primitive types should be handled either by value, or by constant reference, never by reference. So, to keep up with your decision that "primitive types are treated like other types", and trying to manage the usability issues that come from primitive types being non-const references, you are effectively removing the entire "reference types" property from your language.

So, you have three options there:
  1. Do it the C++ way: give up reference-typing, since it screws up the so-called "value types".
  2. Do it the C# way: use value semantics for those value types, and reference types for the rest.
  3. Do it the ML way: use constant references for primitive types, with explicit mutable references.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this