Automatic checking and explicitness

Started by
5 comments, last by Zahlman 17 years, 4 months ago
I'm still designing a programming language (same old, same old). My question of the day is on automatic (static) checking, and assertions in particular. First, an assertion is a stated property about the program which must always be true (unlike exceptions, which correspond to properties that are true in exceptional cases). In this language, one can make three kinds of assertions:
  1. Invisible assertions coming from the constraints of functions. In the example below, assert (bar(i) != 0) is implied.
    def foo(i:int) {
      return 100 / bar(i);
    }
  2. The usual assertion of a property inside imperative code. The example below is exactly the same as above, except that now the assertion has been made explicit.
    def foo(i:int) {
      assert (bar(i) != 0);
      return 100 / bar(i);
    }
  3. The universal assertion of a declarative property over the entire code. The example below is, again, identical to those above.
    
    for (i:int) assert(bar(i) != 0);
    def foo(i:int) {
      return 100 / bar(i);
    }
Some assertions can be checked at runtime. For instance, if def bar(i:int) = 1;, then all three situations could be statically determined to be perfectly safe. Many others, though, cannot be resolved because of well-known undecidability theorems. And so, the options are:
  • Whine. This basically asks the programmer for a proof that the assertion is really true. The programmer can then rely on the type system (moving the assertion to the calling code), use a conditional (because if(a) assert(a) is trivially true) for some special treatment, or provide intermediary assertions that help out the automatic proof system.
  • Use static analysis to determine that, even though the assertion in itself may be false (for instance, bar(1) = 0), the function foo(1) will never be called and so everyone is happy.
  • Build the assertion into the code. This is the expected approach when coming from a C* background. The assertion will be verified whenever necessary: whenever encountered in imperative code (for case 2) or whenever used to prove another assertion (for case 3).
  • Create a unit test automatically that verifies that no assertion is broken.
  • Take the assertion for granted. I mean, if the programmer says it, it must be true!
My questions are: how should I choose which behaviour to adopt for each of these assertion types? How do I factor in modules (asserting over your own module versus asserting over external things), release/debug approaches, types? Are there any factors that you think need to be taken into account when determining these?
Advertisement
The decision is pretty much arbitrary, and depends on how much static reliability you want your language to provide. You can be as lax as C or C++, as anal-retentive as Java with checked exceptions, provide a really rich contracting system, or go so far as to provide a fully featured static dimensional analysis function in your type system that makes the vast majority of your correctness tests verifiable at compile-time. Obviously, the only real issues of concern here are A) how potent you want your language to be in the verifiability department, and B) how hard you're willing to work to accomplish that.

As far as modules: my personal wish is that modules permanently go away. I don't want to think about modules, ever again. I don't want to have to remember if I put some code in this file or that file, and how it will affect my compilation process. I did more than enough of than in C and then C++, and one of the enduring victories of the .Net languages in my opinion is the near-perfect escape from the hell of worrying about how the compiler will process my code on a physical (file) level. So IMHO the right answer is that you, as a language/tool creator, do whatever magic you need to do so that I, as a programmer, never worry about it - no matter how much work that means for you.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by ApochPiQ
Obviously, the only real issues of concern here are A) how potent you want your language to be in the verifiability department, and B) how hard you're willing to work to accomplish that.


The answers are A) as potent as possible and B) I have a lot of time and knowledge available for this. However, there is an additional issue: I want to design a language that caters to its users, so I'm asking you (the programmers) for what you think would make the most sense. Because even the most flawless language sucks if using it requires constantly fighting your intuition. So, my question here is more about what you expect as a programmer in this situation.

Quote:As far as modules: my personal wish is that modules permanently go away. I don't want to think about modules, ever again. I don't want to have to remember if I put some code in this file or that file, and how it will affect my compilation process.


There's been a misunderstanding on this point. I didn't even know that C and C++ had a module system integrated to them (this certainly wasn't my impression last time I checked, and it certainly wouldn't be my inspiration for a module system). I call module a cohesive set of semantically related entities (functions, types, assertions etc), but they need not be in the same file at all (nor is a single file restricted to containing at most one module). Objective Caml modules are one example of this, although not necessarily the only one.

I'd say that if static analysis demonstrates that the assertion always holds, then stay silent. If static analysis doesn't prove the assertion, then issue a warning, with the ability for the programmer to silence the warning by making an explicit statement that the assertion will hold, while adding a dynamic code check to debug code.

That's be the "all of the above" answer. :)
I would suggest the following levels of checking:

Unsafe. C++ release mode. Implicit assertions are not even checked. Explicit assertions are compiled out, and the compiler should only report errors if they *will* be violated regardless of input.

Fast. C++ debug mode. Implicit assertions are not even checked. Explicit assertions throw exceptions if they fail, and no static checking is done at all.

Weak. Java-style. Implicit assertions are generated conservatively but simply (i.e. without any intra-function analysis). These are not statically checked, but assertions that throw exceptions are compiled in. Explicit assertions also throw exceptions in the same way; no static checking is done except to generate the implicit assertions.

Normal. Implicit assertions are generated conservatively but simply (i.e. without any intra-function analysis), and thereafter treated equally to explicit ones. All of these are statically checked; if it will never succeed, an error is emitted, and if it merely isn't provable, then a warning is emitted and an exception-throw generated.

Strict. Implicit assertions are generated whereever possible, and thereafter treated equally to explicit ones. The static checker attempts to remove all assertions which are redundant (and does "chain" through logic e.g. checking whether Foo(1) can be called if that would lead to a divide by zero: effectively, an implicit assertion 'i not in x such that Bar(x) == 0' is added to Foo), and then those which can be guaranteed for all input. If anything is left, emit errors: "I can't prove XXX for all input".

Unit test generation, if offered, should be done by a separate tool. Also, I consider the two types of explicit assertion to be equivalent at all levels.
Quote:Original post by JasonBlochowiak
I'd say that if static analysis demonstrates that the assertion always holds, then stay silent. If static analysis doesn't prove the assertion, then issue a warning, with the ability for the programmer to silence the warning by making an explicit statement that the assertion will hold, while adding a dynamic code check to debug code.


You raise an interesting point which I hadn't seen in the first place, and that is the interaction between "what could happen to today's code" and "what could happen to tommorrow's code". Static analysis only proves today's code to be correct, while the user might want to prove the code to be always correct (for instance, he intends on distributing a module on the web, so he wants to make sure that no bugs will appear and all constraints are expressed in the function contracts).

I don't really know how to make this differentiation. I could say, for example, that descriptive assertions must always be proven to be true before being distributed, and that internal (imperative) assertions must be proven to be true using only the contract of the function (if it's a public function), or through static analysis (if it's a provate function). Does this make sense? And also, would this fit with the open-closed principle?

Quote:Original post by Zahlman
Unsafe.
Fast.
Weak.


Interesting levels, although the static checker will run by default: it's more or less required to ensure type safety and code generation, but it's made quite fast by the language structure.

Quote:
Normal. Implicit assertions are generated conservatively but simply (i.e. without any intra-function analysis), and thereafter treated equally to explicit ones. All of these are statically checked; if it will never succeed, an error is emitted, and if it merely isn't provable, then a warning is emitted and an exception-throw generated.


Interesting. This sounds to me like a good "development-in-progress" mode, with warnings and safety scaffolds and thorough verification.

Quote:Strict. Implicit assertions are generated whereever possible, and thereafter treated equally to explicit ones. The static checker attempts to remove all assertions which are redundant (and does "chain" through logic e.g. checking whether Foo(1) can be called if that would lead to a divide by zero: effectively, an implicit assertion 'i not in x such that Bar(x) == 0' is added to Foo), and then those which can be guaranteed for all input. If anything is left, emit errors: "I can't prove XXX for all input".


I think my answer to JasonBlochowiak applies here as well: what about making sure that a module, in isolation, is correct? What rules do I apply to decide when static checking is applicable (XXX is never violated in this program) and when I'm looking for more strict verification (XXX can never be violated by any program)?

Quote:Unit test generation, if offered, should be done by a separate tool.


Good point. How do you suggest, though, that the compiler use the results from the unit tests to "prove" assertions?

Quote:Also, I consider the two types of explicit assertion to be equivalent at all levels.


I'm actually wondering if this is reasonable. Using static checking, an error is created only if an assertion contradicts the actual usage of the object. For imperative assertions, it would make sense, but I'm not certain it would for descriptive assertions. If I write a couple of functions, and then assert properties over these function, I would like to be warned about this right now, and not when an object is finally created and used in a way that contradicts these assertions.

Basically, when I state for (i:int) assert (bar(i) != 0); I'd rather have this statement be checked and cleared when I'm wrapping up the module that defines bar, and not when I'm in the middle of writing a module that uses bar correctly (but bar itself happens to be incorrect). What do you think?
Quote:Original post by ToohrVyk
Quote:Strict. Implicit assertions are generated whereever possible, and thereafter treated equally to explicit ones. The static checker attempts to remove all assertions which are redundant (and does "chain" through logic e.g. checking whether Foo(1) can be called if that would lead to a divide by zero: effectively, an implicit assertion 'i not in x such that Bar(x) == 0' is added to Foo), and then those which can be guaranteed for all input. If anything is left, emit errors: "I can't prove XXX for all input".


I think my answer to JasonBlochowiak applies here as well: what about making sure that a module, in isolation, is correct? What rules do I apply to decide when static checking is applicable (XXX is never violated in this program) and when I'm looking for more strict verification (XXX can never be violated by any program)?


Ugh. Umm... off the top of my head: when compiling modules instead of the whole program, downgrade errors to warnings, and push the remaining assertions to the interface. (Anything which can't be pushed to the interface must be an error.)

When the program is linked (assuming you are doing static linking :) ) you can then follow the same process of removing things (e.g. Foo.wibble(i) requires i != 0; no module resolving a Foo::wibble(i) call could pass i == 0; therefore this assertion checks out and we remove our mark-up), and report errors for incompatible module use. Basically, the compiler is now identifying *module preconditions* for you and doing static checking on them :)

Quote:
Quote:Unit test generation, if offered, should be done by a separate tool.


Good point. How do you suggest, though, that the compiler use the results from the unit tests to "prove" assertions?


Mu :) Unit tests are for programmers. They'll want to modify the tests anyway, and/or keep only interesting ones, and/or mix them in with the ones they wrote themselves, etc. (Some people write unit tests *instead of* coding explicit assertions, yeah?)

Quote:
Quote:Also, I consider the two types of explicit assertion to be equivalent at all levels.


I'm actually wondering if this is reasonable. Using static checking, an error is created only if an assertion contradicts the actual usage of the object. For imperative assertions, it would make sense, but I'm not certain it would for descriptive assertions. If I write a couple of functions, and then assert properties over these function, I would like to be warned about this right now, and not when an object is finally created and used in a way that contradicts these assertions.

Basically, when I state for (i:int) assert (bar(i) != 0); I'd rather have this statement be checked and cleared when I'm wrapping up the module that defines bar, and not when I'm in the middle of writing a module that uses bar correctly (but bar itself happens to be incorrect). What do you think?


That makes my brain hurt, and I can only hope it's at least partially addressed by the first part of this post. :)

This topic is closed to new replies.

Advertisement