Sign in to follow this  

Tim Sweeney on programming languages

This topic is 4270 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Link. Some of his statements lead me to believe he has no idea what he is talking about. For example, he suggests that array bounds access should be entirely verified at compile time. How does he propose to statically verify this:
int n = random();
arran[n];
Even if the language has dependent typing where you can define the range of values for a type explicitly, (int<1..10>) this check still couldn't possibly be done statically.

Share this post


Link to post
Share on other sites
Yet at the same time, it's completely inexcusable that this is perfectly good C code:
int arr[10];
arr[11] = 2346;

You may not be able to check everything statically, but you should check as much as you possibly can.

Also, if we assume that random() is something that returns any 32 bit int, some stricter type checking for boundedness may be statically resolvable as well. For example:
int arr[10];
int n = random() % 10;
arr[n] = 5;
Compiles, but:

int arr[10];
int n = random();
arr[n] = 5;

Does not compile. That would be another potential static check to do.

Share this post


Link to post
Share on other sites
EDIT: Damn I missed your last sentence where you gave the same idea I have just written, anyway could you give an example where it couldn't be checked statically?

Well that might not be able to check at compile-time in C, C++, C#, Java etc. but this isn't about adding his features to existing languages. Say a function could be defined like this:

int[0,2^32] random()
{
stuff.
}

This would mean that random will return a type between 0 and 2^32 (0xFFFFFFFF), so in your example the compiler would see that there is a possibility that it will result in an access outside the array.

In the same way you might for input have:

int[0,MAX_KEYS] get_key()
{
stuff.
}
key_state keys[MAX_KEYS-1];

keys[get_key()].pressed = true;

Now here the compiler would also be able to see that there is a possibility that we will access an element outside keys.

Now defining such return type could be a big problem, especially if for example 1, 6, 15 and a lot of other numbers can be returned (Especially if programmers can define "wrong" ranges, which could easily introduce hard to catch errors).

A better approach would be to make the compiler determine the range of the return value. The functions provided by the compiler could have the return value range specified and then the compiler could figure out the types itself. Take this simple example:

// random is defined to return an int between 0 and 2^32
int special_random()
{
return random()*random();
}

Here the compiler sees that random and random both have a minimum of 0 and a maximum of 2^32 so the minimum of special_random is 0*0 = 0 and the maximum of special_random is 2^32 * 2^32 = 2^64. If the language can do the same as assembly language I can see no case where the compiler can't figure out the return-value range. I can see two problems (and the second not being a big one).

This was just something I thought about, so there is most likely better alternatives.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Some of his statements lead me to believe he has no idea what he is talking about.


lol, I didn't like it either...

using a couple of assert's and good programming practices, most of the problems he talked about will be gone.

Share this post


Link to post
Share on other sites
Quote:
Original post by Xeee
Quote:
Original post by CoffeeMug
Some of his statements lead me to believe he has no idea what he is talking about.


lol, I didn't like it either...

using a couple of assert's and good programming practices, most of the problems he talked about will be gone.


Yes, but wouldn't it be better if the compiler did all that?

Share this post


Link to post
Share on other sites
Quote:
Original post by Xeee
Quote:
Original post by CoffeeMug
Some of his statements lead me to believe he has no idea what he is talking about.


lol, I didn't like it either...

using a couple of assert's and good programming practices, most of the problems he talked about will be gone.
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Does that in any way imply that he's a programming language design guru and cannot possibly make statements that he pulled out of his ass?

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Quote:
Original post by Promit
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Does that in any way imply that he's a programming language design guru and cannot possibly make statements that he pulled out of his ass?


Could you give an example of a ridiculous statement that he made?

Share this post


Link to post
Share on other sites
Quote:
Original post by Xeee
using a couple of assert's and good programming practices, most of the problems he talked about will be gone.


Yes, but human errors are possible (that is why we have bugs in the first place). It's very unlikely the compiler will make an error forget to check the bounds etc. Please don't try to tell me that you NEVER just assumed your index would be in bounds.

Also assertions won't fix all those problems, it will first introduce a minor problem, which is a slightly slower debug build (not something big, but some time is used evaluating assertions).

It is also possible that the out of bounds access is very rare and therefore you might not even catch it during debugging, for example the error might only slow up on 3+ core systems because it is a multithreading bug.

Another problem is that you will have to compile the whole program and run it (maybe many times) before triggering the assertion, this could add a great deal to development time.

Imagine this code:

class dynamic_array
{
int *arr_;
size_t size_;
public:
void set_new_size(size_t size)
{
size_ = size;
if( arr_ )
delete arr_;
arr_ = new size;
}
int operator[](size_t index)
{
assert( index > 0); // Do you always make this check? I avoid it sometimes, but it could introduce errors
assert( index < size_);

return arr_[index];
}
};


Now imagine we have two threads the first one running:

arr.set_size(5);
arr[4]; // Just access it, we don't need to run it

The other running:

arr.set_size(3);

Now the CPU determines to first run arr.set_size(5) in thread 1, then it starts with operator[] and stops right AFTER the asserts. Now thread 2 sets the size to three and thread 1 starts again, but this time it will access an element outside the array. This is an error, but it can be hard to see and it can take many runs before it's discovered.

Share this post


Link to post
Share on other sites
Quote:
Original post by Roboguy
Quote:
Original post by CoffeeMug
Quote:
Original post by Promit
Because god knows Tim Sweeney wouldn't know anything about designing scalable, robust, extensible, and stable software that can survive major changes in technology and requirements.

Does that in any way imply that he's a programming language design guru and cannot possibly make statements that he pulled out of his ass?

Could you give an example of a ridiculous statement that he made?

Actually, I apologize. I found the link to the presentation through a blog and the blog had a list that summarized his points. Once I read the presentation I realized that the summary in the blog was very misleading.

Share this post


Link to post
Share on other sites
Sweeney made excellent points, and I agree with everything he said. Why not include dependent types? They go deeper that Tim's examples, and constraining a variable to a set of values as well as a type makes sense in statically typed languages. Just using asserts is not the same.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
How does he propose to statically verify this:

int n = random();
arran[n];


In a way it is statically verified.

A fixed sized array like that can only be indexed with a variable of type int, n >= 0, n < 3. At the time of access, this variable is statically guaranteed to be between 0 and 3.

When constructing this variable from a literal, all checks are done statically; when constructing dynamically (in your example the call to random), you have to explicitly cast from int to the dependent type, and this cast will only succeed if it has one of the allowed variables. When you use the newly casted variable later, it is statically verified that it will not cause an out of range error.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Link.

Some of his statements lead me to believe he has no idea what he is talking about. For example, he suggests that array bounds access should be entirely verified at compile time.

Wrong. That's what cedric thinks Tim is saying, but it isn't actually what he's saying. For example, when he's talking about arrays, he actually says that the language should provide features so that array bounds can be verified at compile time, if it is possible to do so.

Take the following example from his slides: Transform accepts a list of vertices and indices and a matrix, and transforms each indexed vertex by the matrix. I've modified it slightly to make it safe for all inputs.


Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m)
{
Vertex[] Result = new Vertex[Indices.length];
if (!Vertices || !Indices)
return null;
if (!m)
m = new Matrix;
for (int i = 0; i < Indices.length; ++i)
if (Indices[i] < 0 || Indices[i] > Vertices.length)
Result[i] = new Vertex;
else
Result[i] = Transform(m, Vertices[Indices[i]]);
return Result;
}


This code has to check that the arguments are not null. It also has to check that the indices in the index array are not outside the vertex limit.

Here's Tim's "safe" version of the code:


Transform{n:nat} (Vertices: [n]Vertex, Indices: []nat<n, m: Matrix):[]Vertex=
for each (i in Indices)
Transform(m, Vertices[i])


Now we know that Vertices is an array of 'n' Vertex objects. It cannot be null. Indices is an arbitrary-length array of integers, but more specifically, an array of integers which are each less than the length of the Vertices array. Hence, we know that no index can possible be out of bounds. The for loop is cleaned up into a form which cannot suffer from a bounds error (such as the common "off by one" error).

Tim doesn't say you can't have dynamically sized arrays. Just that you can provide annotations which allow the compiler to detect when something might go wrong:


Transform{n:nat} (Vertices: [n]Vertex, Indices: []nat<n, m: Matrix):[]Vertex=
for each (i in Indices)
Transform(m, Vertices[i + 1])


Here the compiler will be able to determine that "i + 1" could be greater than one minus the length of the Vertices array, and issue a warning or, depending upon how paranoid you've told the compiler to be, an error.

Quote:

How does he propose to statically verify this:

int n = random();
arran[n];


Okay. I'll bite. The example is, apparently unknown to you, a particularly good one for Tim's case because it might well break at run-time, so it may be useful to be warned about this fact. We'll say it's an error if we can't be sure that the access is valid at compile-time.


int n = random();
array[n]; // Error: Possible out of bounds array access.


The first thing is to ensure that the code can't ever break at runtime. This is achieved by ensuring that the index is always less than the length of the array.


int n = random() % array.length;
array[n]; // Error: Possible out of bounds array access.


I've assumed that the compiler 'forgets' that n must be in the range [0, array.length) when it is assigned to an 'int' variable. We need to give a more precise type:


int<0, array.length> n = random() % array.length;
array[n]; // Okay.


Note that it doesn't matter that we don't know the length of the array yet. What's important is that there's a contract which says that it's always okay to access an array element if the index can be proven to be between 0 and one minus the length of the array, which n's type guarantees.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Link.

Some of his statements lead me to believe he has no idea what he is talking about. For example, he suggests that array bounds access should be entirely verified at compile time. How does he propose to statically verify this:

int n = random();
arran[n];

Even if the language has dependent typing where you can define the range of values for a type explicitly, (int<1..10>) this check still couldn't possibly be done statically.


Technologies for performing static analysis are a lot more advanced nowdays than they used to be. I DO think its a stretch to believe that a compiler can catch complicated versions of these types of problems - the process can take too long. However, a process that can perform partial modeling and speculative analysis while doing complete data and control path analysis will find issues like this.

Depending on what random returns, the first issue is obviously that n can be a negative number ( rand() returns and int between 0 to RAND_MAX, but its still best practice to use an unsigned int if indexing/declaring an array ). If it was possible that random could return -1 (just for discussion), then the lack of boundary checking before the declaration would flag an issue for a possible underflow.

In the case that:

int n = random();
foo = arran[n];

Then you can also extend the argument to determining what the legal bounds of arran is and that n may extend past that boundary in the positive direction as well.

However it is important to note that it CANNOT be determined WHEN or HOW it will fail, only that the possibility exists (quantum reference not intended... lol). Complete modeling ends up being a much harder problem to solve and really doesn't provide additional benefit.



just my 10 cents..

#dth-0

<sorry - a little late getting this in, the discussion has evolved since I started this...>

Share this post


Link to post
Share on other sites
I'm right with Sweeney on this. What we really need is a language that allows highly expressive compile-time contracts to be established. The DBC model is well known to be extremely effective (some might say invaluable) when working on large-scale systems, so why do we only have obscure, unpopular languages (Eiffel, et. al.) that understand the notion?

I'd love to see a language with the library support and rapid-development potential of the .Net family, but with the contract expressivity of something like Eiffel. Contracts are easy to verify at compile-time in many cases, and run-time contract violations are beautifully easy to track and catch. Type-level contractual semantics in a high-level language would totally remove an entire category of run-time bugs, like out-of-bounds array access.

Really IMHO the problem is that languages are not really expressive enough about abstract semantics. We've got great tools for expressing technical semantics (like "this number requires 16 bits of storage and is always positive") but that's a weak basis. DBC is just one of dozens of areas where the expressivity of most languages in the abstract layers is pathetic.

Unbounded array access should be easy to catch, sure. But in my view, this should be causing a build failure, too:

int milliseconds = timeGetTime();
int playerhealth = milliseconds;


I'd love to be able to write code like this, and have it enforced at compile time:

DataType milliseconds Basis:Integers Storage:WhoCares
DataType hitpoints Basis:Naturals Storage:WhoCares
DataType flatpercentage Basis:Naturals Storage:WhoCares Validation:{x < 100}
DataType somebitmask Basis:BitField Storage:Bits32


Assigning a milliseconds value into a somebitmask variable should die at compile time. Tossing 120 into a flatpercentage should die as soon as possible (compile time if it can be statically evaluated, runtime otherwise). Trying to assign the result of a hitpoints function into a flatpercentage should vomit at compile time.

Of course, that brings up another question: what if we need to convert between these things? In pretty much any nontrivial application that's going to be a real need. So we do something like this:

DataType inches Basis:Integers Storage:WhoCares
DataType centimetres Basis:Integers Storage:WhoCares

Conversion(inches i Into centimetres c) { i = c / 2.54 }
Conversion(centimetres c Into inches i) { c = i * 2.54 }


The compiler should be able to do simple chaining of these as well. If I have a centimetres to kilometres conversion, and an inches to centimetres conversion, and code implicitly asks for a munging between inches and kilometres, the compiler should in most cases be able to construct a graph that reveals the appropriate chain: inches -> centimetres -> kilometres and then invokes that chain transparently. Maybe if the chain is really long it throws a warning and says "maybe there's a smarter way to do this." Maybe it caches the chain someplace and optimizes it separately (noting that certain trivial algebraic manipulations can be done to get a direct conversion from inches to kilometres).

Damn, that'd be a nice world to live in...



Parallelism is something I haven't spent much time worrying about yet, although in another year or so I'll probably have some opinions. It sounds like Sweeney has some good thoughts on the subject, but I have a feeling there are a lot of alternatives that need to be explored. Language expressivity of abstract notions is currently poor in general; expressivity of concurrency possibilities barely even exists.

Share this post


Link to post
Share on other sites
Quote:
Original post by xiuhcoatl
Complete modeling ends up being a much harder problem to solve and really doesn't provide additional benefit.

You misspelled "impossible".
Proving these kinds of statements or code properties isn't possible for the general case (reducible to the halting problem).

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Unbounded array access should be easy to catch, sure. But in my view, this should be causing a build failure, too:

int milliseconds = timeGetTime();
int playerhealth = milliseconds;


I'd love to be able to write code like this, and have it enforced at compile time:

DataType milliseconds Basis:Integers Storage:WhoCares
DataType hitpoints Basis:Naturals Storage:WhoCares
DataType flatpercentage Basis:Naturals Storage:WhoCares Validation:{x < 100}
DataType somebitmask Basis:BitField Storage:Bits32


This is called dimensional analysis.

Some math-related programming languages support it automatically, and there is research in other areas. Microsoft research have looked into it. A Zerksis D. Umrigar implemented it in C++, as did some other people. The Emacs calculator comes with dimensional analysis.

Google's calculator performs dimensional analysis and is surprisingly good at parsing expressions. It calculates that "the speed of sound squared times 1 kilogram in kilojoules" is "115.797284 kilojoules".

However, I do see a flaw in your approach. Strictly, the dimensionality of an expression is distinct from its domain. You can measure metres with integers, but also with rationals.


unit metre;
unit centimetre = metre / 100;
unit inch = centimeter * 2.54;


Now you can use metre, centimetre and inch as normal type specifiers, along with the traditional type specifiers. We'd also allow numbers to be suffixed with a type specifier, to make things look more natural.


float acceleration a = Physics.G * planet.mass /
length(object.location - planet.location);


Were there to be a logical error in the above formula, dimensional analysis gives us an increased chance of it being detected at compile-time.

Dimensional analysis has wider applications than just physical simulation. Currencies can be dimensions in their own rights. If your application needs to be able to deal with Dollars and Euros, you obviously don't want to get them mixed up. Dimension types mean you can draw clear line between them, without losing the flexibility to have integer, floating-point or fixed-point Dollars/Euros, as need dictates.

Share this post


Link to post
Share on other sites
I would see it using mathematical set notation.

// where [n] <=> [0, n)
// note the "Indices:[0, n)", which means all the indices in the array fall under
// the [0, n) range. I could have written "Indices:[n]", but chose not to for show
set<n> Vertex[n] Transform (Vertex[n] Vertices, int[n] Indices:[0, n), Matrix m)
{
if (!Vertices || !Indices) return NULL;
if (!m) m = new Matrix;

Vertex[n] Result = new Vertex[n];

for (int i = 0; i < n; ++i)
{
// we have already (at compile time) verified that all the indices in Indices
// lie in [0, n), so the below check is superfluous
// if (Indices[i] < 0 || Indices[i] > Vertices.length)
Result[i] = m * Vertices[ Indices[i] ];
}

return Result;
}



Edit: On second thought, that would make it terribly hard to parse. An easier way could be:

set< n | n >= 0 > Vertex:n Transform (Vertex:n Vertices, int:n Indices:n, Matrix m)

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
I'd love to see a language with the library support and rapid-development potential of the .Net family, but with the contract expressivity of something like Eiffel.
I wonder if, just maybe, Eiffel.NET would fit the bill [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathan Baum

This is called dimensional analysis.

Some math-related programming languages support it automatically, and there is research in other areas. Microsoft research have looked into it. A Zerksis D. Umrigar implemented it in C++, as did some other people. The Emacs calculator comes with dimensional analysis.


Bah. If I read that ZDU link right, his library is non-standard C++.

Meh.


Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Anonymous Poster
Yeah it would be nice if "utopia programming language" existed.


/me dreams on...


It's called Ada.

Share this post


Link to post
Share on other sites

This topic is 4270 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this