Requesting comments on language syntax

Started by
30 comments, last by Nathan Baum 18 years, 10 months ago
I'm putting together an imperative programming language. The syntax is not going to be the main "selling point" of the language but I'd still like to try and make the language pleasant to read and write. I've taken some things I like from C, Pascal, Java and Python and come up with the following:

module Example in package example using
	otherpackage.OtherModule,
	otherpackage.AnotherModule:

function foo(integer i, float f): integer
	/* Basics: */
	if i < f: // A colon opens a scope.
		if i == 0:
			do:
				bar();
			while !quit();
		else:
			foobar(1);
		end;
	end;
	
	if i != 0:
		bar();
	end if; // an if with one statement still requires an end :(
	/* ALTERNATIVE: use 'then':
	if i != 0 then
		bar();
	//or, if the user wants
	if i != 0 then bar();
	*/
	
	/* Explicit scope: */
	scope: // or, more pascalishly, 'begin:' ?
		const integer i = 0;
		
	end;
	
	/* C++-style for */
	for i = 0; i < 10; ++i:
		bar();
	end for;

	/* Exception handling */
	try:
		bar();
	catch SomeError ex:
		
	catch SomeOtherError ex:
		
	finally:
		foobar();
	end try;

	return 0;
end function;

function bar() // Implicit void return type
	// do something...
end;

end module;


I've tried to find a balance between the verbosity of Pascal and the intensity of C and Python. Every block begins with a colon and ends with the keyword 'end', optionally followed by the block's type (if, for, module...). The end is necessary since the language is free-form (unlike Python) and IMHO it is cleaner (than Python) this way. I wanted to have every block start explicitly to avoid parentheses in ifs and loops. Parentheses are IMO tedious to type and don't contribute much to readability in this case. As you can see I've used a Pascal-like notation for the function's return type. I've found it much more natural to think of parameters before the result and the big 'function' keyword, when properly highlighted, is easy to recognize (unlike some three-lined template function monstrosities in C++). The function header is, however, inconsistent with the rest since it doesn't use the colon to begin its scope. I'd like to fix this but I'm not sure how. Some other topics: integer vs. int? (all integers are signed) float vs. real? (there is only one floating-point type) Integer vs. integer? const vs. constant vs. final vs. immutable vs. (some character)?? end vs. end;? All comments and suggestions are welcome! [Edited by - 255 on June 12, 2005 5:37:19 PM]
Advertisement
I think the reason Python gets away with lack of braces is that the whitespace defines scope. What you've done there is mearly converted "{" to ":" and "}" to "end;". I gotta say - "end" is also much more tedious to type than "}" (or "shift-tab" for python).

Really, this makes the language somewhat less readable than C-style languages. Braces can eaisly be scanned, as can whitespace-based scoping. Your language can't be quickly scanned because "end" very similar to other statements (while "}" is not, and python has nothing in the first place to be confused).

As for the location of the return type. In your current usage it dosn't make much sense (because, after the colon, it is now "in scope"). At bare-minimum, you should move it before the colon. At the end or begining of the function definition probably dosn't matter.

Removal of brackets (as in "(" and ")") from control statements is an interesting stylistic choice. I think that if you do decide to use braces ("{" and "}"), it'd probably just look nicer with the brackets there, otherwise it probably looks equaly as good with them either in or out. If you give them brackets, they look somewhat like functions - if this is a positive or negitive is up to you.

Personally, I'd select a proven and known style (C or Python or something else) and stick with it, rather than inventing a totally new one.
On a functional level, I'd suggest a change in design for "finally". C++ dosn't need it, because it uses the RAII idea. If your language supports RAII nicely, then you probably don't need it either.

If you do have it, instead of associating it with a "try" block, consider associating it with a any scope at all (including try blocks), or even associating it with specific variables. You could even make it act like an object itself (a function object of sorts) - this might make for simple implementation.
Just some random personal nitpicks:

- end is inconsistent. Sometimes you say what is ending, other times not.
- As you noted, the function declaration is inconsistent and unwieldy
- If you're defining a new syntax, don't be scared to break out of conventions. C-style for loops are hideous and unintuitive.
- As Andrew noted, "finally" is superfluous if you use RAII. I personally can't say enough good things about the RAII idiom, so I'd suggest favoring it (or even explicitly designing the entire language syntax around it).
- I personally hate the module/package paradigm. I'd far prefer a C++ style namespace scheme.


Here's some suggested tweaks I'd make:

// The function problem is easy to fix:// move the return type before the colon.// The parser can still find this easily// (just look for the end of the parameter// list) and it feels more consistent.function foo(integer i, float f) integer:	/* Basics: */ // Can we PLEASE kill this comment style already?	if i < f:		if i == 0:			do:				bar();			while !quit();		else:			foobar(1);		end;	end;		if i != 0:		bar();	end; // Use end consistently	// Redundant syntax is bad - no "then" construct please.		// Explicit limited scope	scope: // Call it a scope. That's what it is.		const integer i = 0;	end;		// Please kill the C++ style for.	// A BASIC style for is much more intuitive and	// much more difficult to abuse with Bad Code.	// Anything complicated you need to do with a C	// style for can be done better and more cleanly	// with a different type of loop.	for i from 1 to 10:		bar();	end;	// As a bonus, adding non-trivial iterators is	// a breeze with this style of syntax:	for i from 10 to 1 iterate -2:		baz();	end;	for element from List.Start to List.End iterate List.Iter:		element.boz();	end;	// Exceptions should promote RAII ideally	try:		bar();	catch SomeError:		barf();	end;	return 0;end;// Implicit return types are the tool of Satan.function bar() nothing:		// or void, if you prefer	// do something...end;



I can't say it enough: if you're going to the distance to make a new language syntax, don't feel encumbered by what other languages do. Learn from there mistakes, sure, but don't blindly follow their less obvious problems. Opening scopes with "begin" in Pascal is a bit counterintuitive IMO because the English word carries baggage that the syntax does not. Java-style finally blocks are pointless if you favor RAII (which, IMO, you should favor, because it has benefits far beyond just exceptions). Module/package is an annoying encumberance that a lot of people misuse or outright undermine (c.f. the ridiculous practice of "using namespace std;" in C++).

Be minimalistic. Your catch syntax had a dangling "ex" on the end of the specifier. This isn't necessary; no decent parser needs it, and the coder probably doesn't want to have to type it.

Be consistent. If you want a scope to simply close with "end" then close everything with a simple end. If you want to clarify what block just closed, then fully qualify all ends, and require that they are always qualified in the syntax rules. Similarly, rearrange things that may look odd at first to make them fit better with the rest of the syntax (like function return type specifiers).

Final word on functions: I personally prefer return information at the front of the function identifier. You have at least marginally strong typing from the looks of it (no &#106avascript/PHP style weak typing) so there's no need for the function keyword - I know it's a function just by looking at it. I also hate having to scroll around through a long parameter list to find the function's return type (curse you VB6!). You might want to preserve the "function" keyword if you decide to require qualified "ends" after every block, but other than that I personally don't like it at all. But that's just my anal opinion - it's your language [wink]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quick one after reading ApochPiQ's post: Now the return type is on the inside of the colon, it bunches up with the arguments (reduced scanability). Suggest putting it directly before or after the "function" keyword.

I really like ApochPiQ's improved "for" syntax too.


Anyway, I think you really need to consider stepping back and looking at what you language actually does and what it's for.

Perhaps you can better achieve these things with a different feature set (instead of what is in C). Things like classes/objects, states, serialization, (network) replication, latent functions, for-each blocks, exception handling, finally blocks, built-in iterators, singletons, lambdas, threads, string manipulation, package/module management, versioning, templates, generics, inheritance, polymorphism, etc, etc, etc.

At the moment, your language looks like a improved-C with slightly different syntax. There are already plenty (far too many) scripting languages that are basically slightly-different versions of C.

I had to come up with my own scripting language recently, because I needed serializable and replicatable objects with states, events and latent functions (and basically none of the existing C-like scripting languages have those). I didn't really need "functions" at all.

So yeah - what is the "point" of your language? What kind of application are you aiming at? What things will your language do that the vast number of C-like languages don't do? Perhaps you can chose a set of features and syntax that complements your domain.
end

I think that end keyword has advantages for novice programmers. With the more conventional {}, it is easy to forget to close a block. Where blocks of a particular type are closed by a matching end statement, it is easier for a compiler to communicate to the programmer exactly where the error may occur.

if (foo) {  while (bar) {    if (baz) {      d = 12;    }    x = 13;  }


if foo:  while bar:    if baz:      d = 12;    end while;    x = 13;  end if;


With the first example, the error could be anywhere: we know a } is missing, but the compiler can't provide guidance upon which one. From the indentation, it's implied that the outer-most if is unclosed.

With the second example, that the error is obviously that the inner-most if is unclosed. The error in indentation might confuse the programmer, but it doesn't confuse the compiler.

But, if the optional keyword is not used, the end statement offers no clear benefits over {}.

One policy might be to require that certain blocks always have complete end statements. For example: end function. That way, any confusion about a missing end would always be confined to the function that contains it.

On the other hand, Python's technique is undeniably elegant. Unless there is a good reason why you can't use whitespace to delimit blocks, I would go with indentation-based blocking. Actually, I'd use a combination:

Modules and Packages

How is a module different from a package?

If this aspect of the language is Python-like, then perhaps a module is a file, and a package is a directory.

If that's true, then why do you have (1) a block-like module declaration and (2) a clause for specifying which package a module belongs to.

As it stands, it looks like you should be able to put more than one module in a file. If module was just a declaration with no members, then it would make sense that you could only have one per file.

If a package is a directory, then it looks like the right way to add a module to a particular package is to put its file in that directory.

On the other hand, if modules and packages are not simply files and directories then is there really a distinction between the two?

module

It seems rather verbose.

Is in package really needed? It seems like either keyword by itself would suffice. Or, from the syntax you use in the using declaration, why not module package.Example? Or a seperate package declaration to select the package the module should be inserted into.

Your using clause is probably doing two things. Firstly, it's importing the symbols exported from the specified module. Secondly, it's exporting them from the module being defined.

It would be a good idea if those were seperate functions. Module foo might well need to use functions exported by module bar, but you don't necessarily want to make those functions visible to clients of foo. Consider what would happen if foo was refactored and didn't use bar any more: clients of foo which were using those functions would need to be changed to reflect the changes in the implementation of foo.

A possiblity here is having using do the symbol importing (or, better import) and have export do the symbol exporting.

Another possibility is to use C++'s terminology and use private or public to control whether or not the imported symbols are exported.

module Example in package example using	otherpackage.OtherModule,	otherpackage.AnotherModule:  export otherpackage.OtherModule;end;module Example in package example using	public otherpackage.OtherModule,	private otherpackage.AnotherModule:  ;end;


I'd also suggest promoting using to a full declaration. Then, it could be placed exactly where it is needed without polluting the namespace elsewhere.

function wobble ():  // This module's functions are only needed by this function.  using somepackage.Module:    ..  end;end;


There comes the question of what to do with public functions which use privately imported types. You could either say such functions are invalid: because they'd expose implementation details, or say they are valid but cannot be used unless the client also imports the module that the type is from, or say they are valid but the return type cannot be named unless the client also imports the module, although they can be called:

module foo:  type SomeType;end;module bar using private foo:  // This could be invalid: no public functions with private types.  public function doSomething (): SomeType  end;end;using bar:  // Or this could be invalid: no calls to functions with inaccessible types.  doSomething ();end;using bar, foo:  // If public functions with private types are valid, then this would be valid:  SomeType o = doSomething ();end;


Compound blocks

If using is a declaration of its own, yet can also be part of module, that could be generalised. You could say that each block header is a set of clauses rather than a fixed-syntax statement:

for x from 1 to 10for y from 1 to 10:  print $"$x $y";end;


This would iterate over x and y in step, outputting "1 1", "2 2", "3 3", etc. I've imagined a $ operator which gives Perl-style variable substitution inside strings.

To iterate x and y seperately, you'd need to nest one block inside another, as usual.

You could combine different kinds of iteration constructs. Each would do its thing on each iteration.

for x from 1 to 10 while x < y:  ...;end;


This iterates x from 1 to 10, but terminates when x is not less than y: even x isn't 10 yet. Note that this isn't easily expressed using nested loops: you'd need to break out of two levels.

Syntax sugar allows each construct to be seperated with and or then.

using quux if x == 12 then for y in list:  ...;end;


This uses the module quux in this block only and, if x is equal to 12, iterates y over the contents of list

Type declarations

These are inconsistent.

I think you should use either name: type or type name, not both.

Type declarations should be optional. These days, every new general-purpose language worth its salt should have a common type which can be used if no explicit type is specified.

Scope

I think this should be called either begin or do (terminated with end rather than while).

If-Then

You say an if with one statement still requires an end, but it doesn't if you use then.

Empty statement

Both your catch statements are empty. Rather than allowing that, I'd require that any block contain at least a ;.

For

You should support other kinds of for loop. Python has a for which iterates through a sequence for a good reason: it's one of the most common uses of for. This might be a for-in loop -- for x in list:.

If your language supports a distinct lvalue/rvalue concept, then being able to for-in loop through a sequence as lvalues would be useful. Supposing your language used ref to denote a reference type (this might also be used by function arguments), you might use for ref x in list.

Another common use of for is iterating through a range of numeric values. A concise way of expressing that might be useful: for x from y to z:

Finally

I like Andrew's suggestion about allowing finally on any type of block. I'd suggest a couple of other ideas in this vein.

catch break is triggered when a block is exited using break, assuming you have break. Similarly catch continue is triggered upon a continue.

catch break and finally differ in that finally is triggered when control leaves the try whether it is via break, return, an exception or just reaching the end of the block. Conversely, catch break only triggered upon a break.

Things you didn't mention

Presumably strings can look like "...". What are strings made out of? Are they made out of single-character strings, Python-style, or is a seperate character type? If there is, does it use C/C++-style '...' syntax?

Are functions first-class objects? Are there anonymous functions? Can you have references to member functions both in a bound form (i.e. their this already has a value) and unbound (i.e. their this doesn't have value -- this is like a C++ pointer-to-member-function).

Quote:Original post by 255
integer vs. int? (all integers are signed)

I vote for integer. That way, I can define integer str, dex, con, int, wis, cha without the compiler complaining at me.
Quote:
float vs. real? (there is only one floating-point type)

Whilst all floats are real, not all reals are floats. For a mathematically-oriented programmer, the distinction is quite important. If real exists as a type, it should be the abstract type of which float is a subtype.
Quote:
Integer vs. integer?

Probably Integer. A more pertinent question may be whether or not it must be one or the other.

If declared type names are required to begin with an uppercase letter, then type names with lowercase letters can be treated as type variables: which are roughly equivalent to C++'s typename template parameters. That's useful in ways that might be more sophisticated than you intend for your language to be.
Quote:
const vs. constant vs. final vs. immutable vs. (some character)??

final has a particular meaning which isn't quite the same as const.

final denotes a member of an inheritable object which cannot be overriden.

const denotes an object whose value cannot be changed. In C++, it also denotes a function whose this implied argument is a pointer to a const object.

A more interesting question is whether or not constness/mutability will be a compile-time or runtime feature. In C++, constness is not part of an object's runtime type: if you take a pointer to a mutable object and cast it to a const pointer, there is no way to later determine that the object pointed to is mutable.

This has a number of effects. Because constness doesn't exist at runtime, there's no way to define a virtual method which behaves differently depending upon whether or not the object it's called on should logically be const.

struct Foo{  virtual void bar () const  {    std::cout << "const" << std::endl;  }  virtual void bar ()  {    std::cout << "non-const" << std::endl;  }};int main (){  Foo f;  const Foo *fp = &f;  fp->bar();}


This outputs const, even though it's obvious to the reader that f isn't really constant. I think that's the wrong thing to do.

[Edited by - Nathan Baum on June 13, 2005 12:56:21 AM]
Quote:Original post by ApochPiQ
Be minimalistic. Your catch syntax had a dangling "ex" on the end of the specifier. This isn't necessary; no decent parser needs it, and the coder probably doesn't want to have to type it.

On the other hand, once you've caught SomeError, how are you going to refer to that error in code?
Quote:
I also hate having to scroll around through a long parameter list to find the function's return type (curse you VB6!).

But that's really a problem with Visual Basic's attitude to newlines. I hate having to scroll to the end to find the type of the last parameter. If it was nice and easy to use newlines wherever in VB, then programmers would put the return type where you wouldn't need to scroll to get to it, unless there were so many parameters they filled the entire screen. In which case you probably have more problems than that function's return type.
Quote:Original post by Andrew Russell
Quick one after reading ApochPiQ's post: Now the return type is on the inside of the colon, it bunches up with the arguments (reduced scanability). Suggest putting it directly before or after the "function" keyword.

I'm not seeing that. It's easy for me to see it there. If it was syntax highlighted and the parameter names were usefully long, it'd stand out as the only type on the line without a parameter name following it.
Quote:Original post by ApochPiQ
for i from 10 to 1 iterate -2:  baz();end;


I think iterate is a rather C++-centric keyword to use here, and a rather inappropriate one at that. I think by would be a better keyword.
Quote:Original post by ApochPiQ
for element from List.Start to List.End iterate List.Iter:  element.boz();end;


What's List.Iter for? Perhaps it's a function which accepts an iterator and returns the next one? If for behaved this way when given a function, that could be quite useful, particularly if combined with a concise syntax for closures, here inspired in parts by Ruby and perl:
for i from 1 to 100 by {2 * $1}:  print i;end;

This then outputs "1 2 4 8 16 32 64".

If for is to be completely consistant, the logical conclusion is that the numbers are single-argument functions which add themselves to their argument. Which is odd. But could it be useful?
Quote:Original post by Nathan Baum
Quote:Original post by ApochPiQ
Be minimalistic. Your catch syntax had a dangling "ex" on the end of the specifier. This isn't necessary; no decent parser needs it, and the coder probably doesn't want to have to type it.

On the other hand, once you've caught SomeError, how are you going to refer to that error in code?

Perhaps an "exception" (or "ex") keyword (similar to "this")?
Quote:Original post by Andrew Russell
Quote:Original post by Nathan Baum
Quote:Original post by ApochPiQ
Be minimalistic. Your catch syntax had a dangling "ex" on the end of the specifier. This isn't necessary; no decent parser needs it, and the coder probably doesn't want to have to type it.

On the other hand, once you've caught SomeError, how are you going to refer to that error in code?

Perhaps an "exception" (or "ex") keyword (similar to "this")?

Why introduce more keywords, and potential difficulty if you've got a try nested within a catch (no, I can't imagine a situation where you'd need to do that), when you could use the existing syntax for variable declarations to declare a variable to keep the exception in?

This topic is closed to new replies.

Advertisement