Tim Sweeney on programming languages

Started by
55 comments, last by maximAL 18 years ago
Quote:Original post by CoffeeMug
Quote:Original post by Promit
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Does that in any way imply that he's a programming language design guru and cannot possibly make statements that he pulled out of his ass?


Could you give an example of a ridiculous statement that he made?
Advertisement
Quote:Original post by Xeee
using a couple of assert's and good programming practices, most of the problems he talked about will be gone.


Yes, but human errors are possible (that is why we have bugs in the first place). It's very unlikely the compiler will make an error forget to check the bounds etc. Please don't try to tell me that you NEVER just assumed your index would be in bounds.

Also assertions won't fix all those problems, it will first introduce a minor problem, which is a slightly slower debug build (not something big, but some time is used evaluating assertions).

It is also possible that the out of bounds access is very rare and therefore you might not even catch it during debugging, for example the error might only slow up on 3+ core systems because it is a multithreading bug.

Another problem is that you will have to compile the whole program and run it (maybe many times) before triggering the assertion, this could add a great deal to development time.

Imagine this code:
class dynamic_array{    int *arr_;    size_t size_;public:    void set_new_size(size_t size)    {        size_ = size;        if( arr_ )            delete arr_;        arr_ = new size;    }    int operator[](size_t index)    {        assert( index > 0); // Do you always make this check? I avoid it sometimes, but it could introduce errors        assert( index < size_);            return arr_[index];    }};

Now imagine we have two threads the first one running:
arr.set_size(5);arr[4]; // Just access it, we don't need to run it

The other running:
arr.set_size(3);

Now the CPU determines to first run arr.set_size(5) in thread 1, then it starts with operator[] and stops right AFTER the asserts. Now thread 2 sets the size to three and thread 1 starts again, but this time it will access an element outside the array. This is an error, but it can be hard to see and it can take many runs before it's discovered.
Quote:Original post by Roboguy
Quote:Original post by CoffeeMug
Quote:Original post by Promit
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Does that in any way imply that he's a programming language design guru and cannot possibly make statements that he pulled out of his ass?

Could you give an example of a ridiculous statement that he made?

Actually, I apologize. I found the link to the presentation through a blog and the blog had a list that summarized his points. Once I read the presentation I realized that the summary in the blog was very misleading.
Sweeney made excellent points, and I agree with everything he said. Why not include dependent types? They go deeper that Tim's examples, and constraining a variable to a set of values as well as a type makes sense in statically typed languages. Just using asserts is not the same.
Quote:Original post by CoffeeMug
How does he propose to statically verify this:
int n = random();arran[n];


In a way it is statically verified.

A fixed sized array like that can only be indexed with a variable of type int, n >= 0, n < 3. At the time of access, this variable is statically guaranteed to be between 0 and 3.

When constructing this variable from a literal, all checks are done statically; when constructing dynamically (in your example the call to random), you have to explicitly cast from int to the dependent type, and this cast will only succeed if it has one of the allowed variables. When you use the newly casted variable later, it is statically verified that it will not cause an out of range error.
Quote:Original post by CoffeeMug
Link.

Some of his statements lead me to believe he has no idea what he is talking about. For example, he suggests that array bounds access should be entirely verified at compile time.

Wrong. That's what cedric thinks Tim is saying, but it isn't actually what he's saying. For example, when he's talking about arrays, he actually says that the language should provide features so that array bounds can be verified at compile time, if it is possible to do so.

Take the following example from his slides: Transform accepts a list of vertices and indices and a matrix, and transforms each indexed vertex by the matrix. I've modified it slightly to make it safe for all inputs.

Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m){  Vertex[] Result = new Vertex[Indices.length];  if (!Vertices || !Indices)    return null;  if (!m)    m = new Matrix;  for (int i = 0; i < Indices.length; ++i)    if (Indices < 0 || Indices > Vertices.length)      Result = new Vertex;    else      Result = Transform(m, Vertices[Indices]);  return Result;}


This code has to check that the arguments are not null. It also has to check that the indices in the index array are not outside the vertex limit.

Here's Tim's "safe" version of the code:

Transform{n:nat} (Vertices: [n]Vertex, Indices: []nat<n, m: Matrix):[]Vertex=  for each (i in Indices)    Transform(m, Vertices)


Now we know that Vertices is an array of 'n' Vertex objects. It cannot be null. Indices is an arbitrary-length array of integers, but more specifically, an array of integers which are each less than the length of the Vertices array. Hence, we know that no index can possible be out of bounds. The for loop is cleaned up into a form which cannot suffer from a bounds error (such as the common "off by one" error).

Tim doesn't say you can't have dynamically sized arrays. Just that you can provide annotations which allow the compiler to detect when something might go wrong:

Transform{n:nat} (Vertices: [n]Vertex, Indices: []nat<n, m: Matrix):[]Vertex=  for each (i in Indices)    Transform(m, Vertices)<br></pre><br><br>Here the compiler will be able to determine that "i + 1" <i>could be</i> greater than &#111;ne minus the length of the Vertices array, and issue a warning or, depending upon how paranoid you've told the compiler to be, an error.<br><br><!–QUOTE–><BLOCKQUOTE><span class="smallfont">Quote:</span><table border=0 cellpadding=4 cellspacing=0 width="95%"><tr><td class=quote><!–/QUOTE–><!–STARTQUOTE–><br>How does he propose to statically verify this:<br><pre><br>int n = random();<br>arran[n];<br></pre><br><!–QUOTE–></td></tr></table></BLOCKQUOTE><!–/QUOTE–><!–ENDQUOTE–><br>Okay. I'll bite. The example is, apparently unknown to you, a particularly good &#111;ne for Tim's case because it might well break at run-time, so it may be useful to be warned about this fact. We'll say it's an error if we can't be sure that the access is valid at compile-time.<br><br><pre><br>int n = random();<br>array[n]; // Error: Possible out of bounds array access.<br></pre><br><br>The first thing is to ensure that the code can't ever break at runtime. This is achieved by ensuring that the index is always less than the length of the array.<br><br><pre><br>int n = random() % array.length;<br>array[n]; // Error: Possible out of bounds array access.<br></pre><br><br>I've assumed that the compiler 'forgets' that n must be in the range [0, array.length) when it is assigned to an 'int' variable. We need to give a more precise type:<br><br><pre><br>int&lt;0, array.length&gt; n = random() % array.length;<br>array[n]; // Okay.<br></pre><br><br>Note that it doesn't matter that we don't know the length of the array yet. What's important is that there's a contract which says that it's always okay to access an array element if the index can be proven to be between 0 and &#111;ne minus the length of the array, which n's type guarantees.
Quote:Original post by CoffeeMug
Link.

Some of his statements lead me to believe he has no idea what he is talking about. For example, he suggests that array bounds access should be entirely verified at compile time. How does he propose to statically verify this:
int n = random();arran[n];

Even if the language has dependent typing where you can define the range of values for a type explicitly, (int<1..10>) this check still couldn't possibly be done statically.


Technologies for performing static analysis are a lot more advanced nowdays than they used to be. I DO think its a stretch to believe that a compiler can catch complicated versions of these types of problems - the process can take too long. However, a process that can perform partial modeling and speculative analysis while doing complete data and control path analysis will find issues like this.

Depending on what random returns, the first issue is obviously that n can be a negative number ( rand() returns and int between 0 to RAND_MAX, but its still best practice to use an unsigned int if indexing/declaring an array ). If it was possible that random could return -1 (just for discussion), then the lack of boundary checking before the declaration would flag an issue for a possible underflow.

In the case that:

int n = random();
foo = arran[n];

Then you can also extend the argument to determining what the legal bounds of arran is and that n may extend past that boundary in the positive direction as well.

However it is important to note that it CANNOT be determined WHEN or HOW it will fail, only that the possibility exists (quantum reference not intended... lol). Complete modeling ends up being a much harder problem to solve and really doesn't provide additional benefit.



just my 10 cents..

#dth-0

<sorry - a little late getting this in, the discussion has evolved since I started this...>
"C and C++ programmers seem to think that the shortest distance between two points is the great circle route on a spherical distortion of Euclidean space."Stephen Dewhurst
I'm right with Sweeney on this. What we really need is a language that allows highly expressive compile-time contracts to be established. The DBC model is well known to be extremely effective (some might say invaluable) when working on large-scale systems, so why do we only have obscure, unpopular languages (Eiffel, et. al.) that understand the notion?

I'd love to see a language with the library support and rapid-development potential of the .Net family, but with the contract expressivity of something like Eiffel. Contracts are easy to verify at compile-time in many cases, and run-time contract violations are beautifully easy to track and catch. Type-level contractual semantics in a high-level language would totally remove an entire category of run-time bugs, like out-of-bounds array access.

Really IMHO the problem is that languages are not really expressive enough about abstract semantics. We've got great tools for expressing technical semantics (like "this number requires 16 bits of storage and is always positive") but that's a weak basis. DBC is just one of dozens of areas where the expressivity of most languages in the abstract layers is pathetic.

Unbounded array access should be easy to catch, sure. But in my view, this should be causing a build failure, too:

int milliseconds = timeGetTime();int playerhealth = milliseconds;


I'd love to be able to write code like this, and have it enforced at compile time:

DataType milliseconds Basis:Integers Storage:WhoCaresDataType hitpoints Basis:Naturals Storage:WhoCaresDataType flatpercentage Basis:Naturals Storage:WhoCares Validation:{x < 100}DataType somebitmask Basis:BitField Storage:Bits32


Assigning a milliseconds value into a somebitmask variable should die at compile time. Tossing 120 into a flatpercentage should die as soon as possible (compile time if it can be statically evaluated, runtime otherwise). Trying to assign the result of a hitpoints function into a flatpercentage should vomit at compile time.

Of course, that brings up another question: what if we need to convert between these things? In pretty much any nontrivial application that's going to be a real need. So we do something like this:

DataType inches Basis:Integers Storage:WhoCaresDataType centimetres Basis:Integers Storage:WhoCaresConversion(inches i Into centimetres c) { i = c / 2.54 }Conversion(centimetres c Into inches i) { c = i * 2.54 }


The compiler should be able to do simple chaining of these as well. If I have a centimetres to kilometres conversion, and an inches to centimetres conversion, and code implicitly asks for a munging between inches and kilometres, the compiler should in most cases be able to construct a graph that reveals the appropriate chain: inches -> centimetres -> kilometres and then invokes that chain transparently. Maybe if the chain is really long it throws a warning and says "maybe there's a smarter way to do this." Maybe it caches the chain someplace and optimizes it separately (noting that certain trivial algebraic manipulations can be done to get a direct conversion from inches to kilometres).

Damn, that'd be a nice world to live in...



Parallelism is something I haven't spent much time worrying about yet, although in another year or so I'll probably have some opinions. It sounds like Sweeney has some good thoughts on the subject, but I have a feeling there are a lot of alternatives that need to be explored. Language expressivity of abstract notions is currently poor in general; expressivity of concurrency possibilities barely even exists.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by xiuhcoatl
Complete modeling ends up being a much harder problem to solve and really doesn't provide additional benefit.

You misspelled "impossible".
Proving these kinds of statements or code properties isn't possible for the general case (reducible to the halting problem).
Quote:Original post by ApochPiQ
Unbounded array access should be easy to catch, sure. But in my view, this should be causing a build failure, too:

int milliseconds = timeGetTime();int playerhealth = milliseconds;


I'd love to be able to write code like this, and have it enforced at compile time:

DataType milliseconds Basis:Integers Storage:WhoCaresDataType hitpoints Basis:Naturals Storage:WhoCaresDataType flatpercentage Basis:Naturals Storage:WhoCares Validation:{x < 100}DataType somebitmask Basis:BitField Storage:Bits32


This is called dimensional analysis.

Some math-related programming languages support it automatically, and there is research in other areas. Microsoft research have looked into it. A Zerksis D. Umrigar implemented it in C++, as did some other people. The Emacs calculator comes with dimensional analysis.

Google's calculator performs dimensional analysis and is surprisingly good at parsing expressions. It calculates that "the speed of sound squared times 1 kilogram in kilojoules" is "115.797284 kilojoules".

However, I do see a flaw in your approach. Strictly, the dimensionality of an expression is distinct from its domain. You can measure metres with integers, but also with rationals.

unit metre;unit centimetre = metre / 100;unit inch = centimeter * 2.54;


Now you can use metre, centimetre and inch as normal type specifiers, along with the traditional type specifiers. We'd also allow numbers to be suffixed with a type specifier, to make things look more natural.

float acceleration a = Physics.G * planet.mass /                       length(object.location - planet.location);


Were there to be a logical error in the above formula, dimensional analysis gives us an increased chance of it being detected at compile-time.

Dimensional analysis has wider applications than just physical simulation. Currencies can be dimensions in their own rights. If your application needs to be able to deal with Dollars and Euros, you obviously don't want to get them mixed up. Dimension types mean you can draw clear line between them, without losing the flexibility to have integer, floating-point or fixed-point Dollars/Euros, as need dictates.

This topic is closed to new replies.

Advertisement