Dynamic Memory Allocation and Implementing Objects

Started by
7 comments, last by LordofDhara 20 years, 3 months ago
I''m building a scripting engine (with assembler, compiler, and VM) and all in all it''s been rather fun and not too hard... until now. I''ve hit some roadblocks and I was curious to see what others think. The VM uses a runtime stack of a Value struct with a type field and a union - so obviously this is a pretty standard ''typeless'' language. First is the issue of dynamic memory. My language supports pointers, which work fine for pointing to other variables on the stack. But I''m not sure how to go about getting them to point to arbitrary amounts of dynamically allocated memory as C/C++ pointers can. I''ve thought of perhaps having the VM distinguish between stack and heap pointers in the type field (and having the dynamic mem stored in a void* in the union), which would allow the VM to determine how to handle the script language''s pointer while keeping this distinction invisible to the script writer. This means it can switch from being a stack ptr to a heap ptr and vice versa. Any thoughts? Second is how to represent objects. I can represent simple structures underneath the hood as arrays of ''normal'' variables. The structures can hold both data and functions, but with no access specifiers or inheritance capability. Finally, do the benefits of being able to overload functions, or even to allow inheritance and other OO features outweigh the added complexity and ultimate slowdown of a scripting language? I''m debating about whether allowing objects (structs stay, their utility is immeasurable) at all and perhaps sticking to a more C feel to the language. It would certainly be faster, but the question stands. Some fresh perspective would certainly be appreciated. Thanks in advance for any help.
Advertisement
First, a random suggestion: study how other languages do it.

Second, another random suggestion: pointers in a scripting language are a waste of time. Why would you want random indirection in a scripting language (or have we mixed up our levels of abstraction here)?

quote:Original post by LordofDhara
First is the issue of dynamic memory. My language supports pointers, which work fine for pointing to other variables on the stack. But I''m not sure how to go about getting them to point to arbitrary amounts of dynamically allocated memory as C/C++ pointers can.
If you must go through with this, realize that type isn''t simply an identifier which instructs the VM how to interpret the subsequent data. You still need to define types - Integer, Float, List, MemoryPointer, File, etc - which will provide the necessary methods and structure for dealing with various situations. All of these could derive from a single type, say <Language>Object, and support a query mechanism to determine which operations they provide. This is basically how Python works.

quote:Second is how to represent objects. I can represent simple structures underneath the hood as arrays of ''normal'' variables. The structures can hold both data and functions, but with no access specifiers or inheritance capability.
You need a <Language>Class type.

quote:Finally, do the benefits of being able to overload functions, or even to allow inheritance and other OO features outweigh the added complexity and ultimate slowdown of a scripting language? I''m debating about whether allowing objects (structs stay, their utility is immeasurable) at all and perhaps sticking to a more C feel to the language. It would certainly be faster, but the question stands. Some fresh perspective would certainly be appreciated.
Supporting OO features such as inheritance doesn''t have to be complicated, but it will be if you do it the C++ way. In Python (and this is why you should study the language references of existing languages) base classes are added to the object dictionary, and evaluated right to left (in the order they are specified as bases). This eliminates diamond inheritance problems, among others, because if an entity is already in the dictionary, it is ignored.

Similarly, Python supports operator overloading by providing specially named functions which when implemented are interpreted as operators at runtime: __repr__ is the print operator; __len__ is the len operator; __mul__ is the multiplication operator; and so forth. This simplifies parsing and scanning, and is less error-prone. Since C++ features like const are unavailable, there are fewer versions to compare against as well.

Why you would want a C-like feel to your language is beyond me. If you want C, use C. Or use one of its derivatives (C++, Java). A C-like language is another thing entirely; Perl and Python are C-like languages, but absolutely do not feel like C (Perl feels rather like line noise sometimes, and like being omnipotent at others).
I''ve never taken a serious look at Python because the syntax has always put me off(a stupid reason I realize), but I''ll take a closer look. Thanks for the suggestion.

quote:
Second, another random suggestion: pointers in a scripting language are a waste of time. Why would you want random indirection in a scripting language (or have we mixed up our levels of abstraction here)?


My main reason for thinking pointers would be useful is not for the sake of random indirection to stack locations (that''s behavior from C I felt must be preserved for consistency) but to facilitate less memory waste. If I have the scripting language hold an array of door objects, for instance, it must be of some pre-determined maximum size. This could be rather wasteful - this is why I thought pointers to dynamic memory would be a good idea.

quote:Why you would want a C-like feel to your language is beyond me. If you want C, use C. Or use one of its derivatives (C++, Java). A C-like language is another thing entirely; Perl and Python are C-like languages, but absolutely do not feel like C (Perl feels rather like line noise sometimes, and like being omnipotent at others).


I probably didn''t word myself correctly but I meant a language with C/C++ syntax. My app is written in C++, and the toolset in Java, so it seemed like keeping to the syntax of this family of languages would allow the most efficiency since programmers wouldn''t have to ''switch gears'' so to speak.
quote:Original post by LordofDhara
I''ve never taken a serious look at Python because the syntax has always put me off(a stupid reason I realize), but I''ll take a closer look. Thanks for the suggestion.
My pleasure.

quote:My main reason for thinking pointers would be useful is not for the sake of random indirection to stack locations (that''s behavior from C I felt must be preserved for consistency) but to facilitate less memory waste. If I have the scripting language hold an array of door objects, for instance, it must be of some pre-determined maximum size. This could be rather wasteful - this is why I thought pointers to dynamic memory would be a good idea.
That places the burden of dynamic memory management on the programmer working in the scripting language, though, which is both counter-intuitive and counter-productive. There are well-known and well-documented ways to provide reasonable-to-high levels of efficiency for sequences without burdening the implementer with the details (lots of people leave the details to std::vector, std::list and other sequential containers even in C++); logarithmic growth (double capacity when the container is full) and garbage collection (if you insist, you can even allow users to tune the garbage collector) are some of them.

Always keep in mind that the objectives of a scripting language value simplicity and robustness over efficiency. Most of the code in a game doesn''t have to run anywhere near peak (it''s nice, but not essential), and that kind of code is exactly the sort a scripting language would be used to implement.

quote:I probably didn''t word myself correctly but I meant a language with C/C++ syntax. My app is written in C++, and the toolset in Java, so it seemed like keeping to the syntax of this family of languages would allow the most efficiency since programmers wouldn''t have to ''switch gears'' so to speak.
The target audience is different. Don''t prematurely constrain the abilities of your tools through poor presumptions. Programmers smart enough to have learned C++ can switch to simpler languages easily; the other benefit would be that non-programmers would be able to contribute to development, scripting behaviors for models and levels, etc.

Good luck!
Thanks for the input - I have some interesting new directions to take this.
Oluseyi says very wise things here.

I''d just like to add a slightly different angle on the subject - one of the nicest things about Perl (and I presume about Python and Lua and all the other scripting languages that I don''t actually know - yeah, I''m stuck in the Dark Ages with Perl, shh :x ) is the containers it makes available to you. Basically you have:

1) dynamically-resized arrays ("arrays"), which you can implement with the exponential growth thing. If you try to insert into an out-of-bounds location, the array magically becomes long enough to contain the requested location.
2) associative containers ("hashes"), which are basically hash tables.

This is all you really need; don''t expect people to mess around with pointers or specialized data structures in a scripting language. If they really want to build a binary tree, they can (and will) do it on top of arrays or hashes, by storing references in the array/hash elements.

Oh, I said a magic word there, "reference". That''s not the C++ concept of it that I''m referring to; I mean "reference" in the Java sense of a pointer which can never be invalid unless it''s "null". That is, you let people make a reference to an array or hash, but not to a specific element of it - they have to go through the []''s or {}''s (Perl''s equivalent of [] when dealing with hashes rather than arrays). Pointer arithmetic is right out.

Heck, you don''t even need arrays really, just the hash tables. All you need is a way to use any value as a key.

Might I suggest: in the Type field, instead of storing just a magic number identifying the type, store a "vtable pointer" (well, something like that... pointer into your VM''s memory? Or maybe the main executable''s memory... I haven''t thought this through very well...) to a set of methods (written in your VM''s bytecode probably) which implement common tasks for each type. And have one of those be "hashCode()" or something like that (see, now I''ve abandoned Perl as inspiration and switched to Java ). So you have some statement in the scripting language like

foo = bar{baz};

and it becomes bytecodes something like

barsize CALL (bar.Type + LENGTH, bar.Value) ; invoke ''length'' method for bar''s type, where bar presumably is a hashbazloc CALL (baz.Type + HASHCODE, baz.Value) ; calculate a hash code for the value of baz, depending on its type. So for example ''true'' might have the same hash code as ''1231'' or something. Actually, 1231 is the hashCode value for ''true'' in MIDP 1.0 :) I have no idea why though, that''s just what it says in my JavaDoc.bazloc = bazloc % barsize ; cuz you always do a mod with these hash thingies, ya know?; don''t forget collision checking or whatever!foo.Value = DEREF(bar.Value, bazloc) ; Find the specified value in the bar hash, and copy its value into foo''s value field.


But yes - the hash or array should be smart enough to do the reallocation and cleanup by itself, and not be overly concerned with space efficiency. If you use the techniques Oluseyi is talking about then it will be Good Enough(TM). Simple and robust. Yes.
Very interesting, Zahlman. I''ve been approaching this all from the wrong angle.
quote:Original post by LordofDhara
I''ve never taken a serious look at Python because the syntax has always put me off(a stupid reason I realize), but I''ll take a closer look. Thanks for the suggestion.


Python is very good, but his C implementation is very complicated because of some optimizations needed to run the interpreter faster.
You can look at Ruby source code if you want. I find him simpler.

quote:Original post by LordofDhara
I''ve never taken a serious look at Python because the syntax has always put me off(a stupid reason I realize), but I''ll take a closer look. Thanks for the suggestion.


Python is very good, but his C implementation is very complicated because of some optimizations needed to run the interpreter faster.
You can look at Ruby source code if you want. I find him simpler.

This topic is closed to new replies.

Advertisement