Idea: User-defined keywords

Started by
8 comments, last by asgeirb 13 years, 1 month ago
Hey!

It has been a while! I was just cleaning my HDD again and I found this article for Angelscript that I wrote a while ago. I decided to finish and share it.

-------------------------------------------------

There are many scripting language libraries around today, and a lot of them have certain unique features, some of which are specific to the application that implements the library. While playing with a script of another video game, I had this idea for a new feature.

User-specified keywords:
These basically act like flags that can be applied to a variable, an object or a data type inside a script by adding a specific keyword in front of the declaration. What this allows the script wrter to do is to signal how a particular variable/object/type should be treated by the application. Here's an example of a variable declaration with and without user-specified keywords:

int health; // without
reliable shared int score; // with


User-specified keywords are always located left of the variable type keyword i.e. [one or more user-specified keywords][type name keyword][variable name];. In the second case above, our variable has the "reliable" and "shared" flags set by the script writer. This means that the application should treat this variable as to whatever behaviour is defined per each keyword. When the engine scans the script, it can look for these flags. When a flag is found, the engine does some extra work that is specific to a given keyword. It is up to the developer to define the keywords for his application and code actions to be taken for each keyword.

// Keywords could be set on global vars.
triggered int gem_counter;

// I'm unsure about function return type or parameters, probably not.
indexed int MyFunc(special bool MyBool) {
verified int MyInt; // How about local vars?
... stuff ...
if (MyBool == true) {
if (Callback(MyInt) == 6)
return 0;
}
... stuff ...
}

managed class Player { // Class declarations?
dynamic float x, y, z; // Vars defined in a class.
};

tracked Player pl; // Object instances go as well. These can be global or class members. Not sure about local.


Personally, I think these can be used best on global variables and persistent objects as well as class members i.e. in an object-oriented environment where the engine needs to know how certain objects should be treated (like serialization).

Keyword exclusion:
Since certain keywords may exclude each other, there could also be keyword lists/sets/enums where only one keyword from a given enum is allowed on a single variable. Combinations are prohibited. Suppose we have a set called "Priority" with the following keywords: important, normal, and unimportant.

class Monster {
important int health; // ok
normal int ammo; // ok
unimportant Vector location; // ok
important unimportant float score; // Raises an error!
};


You're probably asking yourself how the heck are those keywords applicable to Angelscript. Well, that's exactly the new feature I'm describing here. These keywords here are just a mockup, but the idea is to create new robust API's that developers can use to register their own keywords per library implementation. It should be the job of a developer to tell the AS engine which keywords are available to the script. Registering a keyword that matches any of the Angelscript built-in keywords or existing data types throws an error. So does loading and parsing a script that contains variables or types/typedefs with the same name as a registered keyword. There are probably more cases where an error should be raised, but you get the idea...

Keyword registration:
I've mocked up a number of API's for this task. These could probably use a better syntax, but they're only here to demonstrate the idea. I've used a syntax similar to the existing registration API's.

// What should these return on fail/success?
r = engine->RegisterVariableKeyword("shared", asBEHAVE_BLABLABLA, etcetc); assert(r >= 0);
r = engine->RegisterObjectKeyword("reliable", asBEHAVE_BLABLABLA, etcetc); assert(r >= 0);

// Alternate syntax
r = engine->RegisterKeyword("shared", asKEYWORD_OBJECT, etcetc); assert(r >= 0); // Can be only used on objects
r = engine->RegisterKeyword("reliable", asKEYWORD_VARIABLE, etcetc); assert(r >= 0); // Can be only used on varables
r = engine->RegisterKeyword("reliable", asKEYWORD_VARIABLE | asKEYWORD_OBJECT, etcetc); assert(r >= 0); // Can be used on both

// Keyword sets: Only one of these keywords can be used on a given variable.
// These three keywords belong to a keyword set named "priority".
r = engine->RegisterKeyword("high", "priority", asKEYWORD_OBJECT, etcetc); assert(r >= 0);
r = engine->RegisterKeyword("med", "priority", asKEYWORD_OBJECT, etcetc); assert(r >= 0);
r = engine->RegisterKeyword("low", "priority", asKEYWORD_OBJECT, etcetc); assert(r >= 0);

// This one is not part of any set.
r = engine->RegisterKeyword("low", NULL, asKEYWORD_OBJECT, etcetc); assert(r >= 0);

// I'm not sure, if there should be a separate API to register variable sets and another to add
// keywords to those sets, but it seemed more robust to include a set parameter in an existing
// API, and simply pass a NULL when you don't want a variable to be a part of any set.

// There should be API to enumerate which keywords have been applied to a given variable
r = asIScriptObject->GetKeyword("shared"); // Check for single keyword
list = asIScriptObject->GetKeywordList(); // Get a list of all user-defined keywords for this object

// There should probably be API's for global enumeration as well i.e. for
// enumerating keywords, sets, and what set a keyword belongs to (if any).


Final notice:
While I realize there is an existing addon for parsing script metadata, which allows you to perform similar actions, I decided to give this idea a go anyway because it seems more practical to have a feature like this be a part of the engine itself rather than an addon. I may have given this idea a lot of attention, but realize that it's still just an idea.

Thanks.

EDIT: Cleaned up code after HTML got messed up.
Advertisement
While an interesting feature, it is one that I do not want to add to the core library. I would rather see you improving the CScriptBuilder add-on to add this feature there. The reason being that most developers probably do not want or need this feature, and they shouldn't be forced to include it if not absolutely necessary. This is one my main guidelines for the library; Keep the core engine as simple as possible, while still allowing high-level features like this through add-ons.

Regards,
Andreas





AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

While reading this topic I got an other idea, what would be good is if you would have more influence in the parsing and structuring step, e.g. some callbacks which are made while those steps plus some additional methods for influencing what happens. It's probably easier to show than to tell so here is some pseudo-code:

[...]
mEngine->SetStructuringCallback(asFUNCTION(StructuringCallback), asCALL_CDECL);
[...]

void StructuringCallback(asIStructuringContext* context)
{
if(context.GetKind()==AS_STRUCTURE_Declaration)
{
int numTokens = context->GetTokenCount();
for(int i=0; i < numTokens; i++)
if(context.GetTokenType(i) == AS_TOKEN_Classifier)
{
std::string token = context->GetTokenString(i);
if(token == "important")
{
if(context->IsDeclarationOfMember())
SerializationManager::MarkAsImportant(context->GetCurrentClass(), context->GetDeclarationID());
else
SerializationManager::MarkAsImportant(context->GetDeclarationID());
}
}
}
};


That would make implementing such things much easier. And it could be also used for many other customizations.

While reading this post I got an other idea, what would be good is if you would have more influence in the parsing and structuring step, e.g. some callbacks which are made while those steps plus some additional methods for influencing what happens. It's probably easier to show than to tell so here is some pseudo-code:


... Code snippet Removed ...

That would make implementing such things much easier. And it could be also used for many other customizations.



Now this is something I would like to see. User keywords can create an opportunity to provide optimizations where stock AngelScript would just slow it down. Unreal Script has many features built in to it that allow for modifying things such as state in an agent, and numerous other things I can't think of at the moment. Things that make it specialized for the Unreal Engine. Callbacks would be a great way to provide this without becoming obtrusive.

And also, Andreas, you have to think that this could be another selling point for the library and one more thing that sets it apart from other languages.

I would, however, wrap the callback into a class, asIStructureHelper, rather than a raw C callback. Angelscript already requires one to use too many globals to use it than many would like to tolerate blink.gif
Spell:

Providing callbacks during the parsing does indeed sound better. It wouldn't require AngelScript to keep track of all the keywords, and would still make it easier for the application to implement the desired functionality. I can see the CScriptBuilder taking advantage of this kind of feature to implement the pre-processing, conditional programming, and meta-data support.

I'll add this to the to-do list so I can evaluate it for a future release.

_orm_:

On the topic of the callbacks being classes or global functions, I guess this is mostly a matter of taste. I try to provide the flexibility of using either where possible (like the message callback), but in some cases it just doesn't make sense to implement a class to do something that a simple function can do. If you have any particular function you would rather see as a class, just let me know the reasoning behind it and I'll see if I can't make a change to support both.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Another option would be to allow user-supplied parsers (possibly even written in AngelScript as a module) instead of the standard asCParser.
Combined with something like Parsec or better yet Metalua you could add any syntax macros you like.
This would require exposing the AST api but could still use the same asCCompiler.

Let's say you need a DDL to describe npcs and conversations with them.
So you define your own parser, either to extend the AngelScript parser or to override it for the compilation of this module.
npc Robot
{
mesh = "models/robot.collada";
dialog
{
say_hello {
say = ("Hi, there", null);
reply = ("<beep><bop><boop>", "sounds/robot/beepbopboop.ogg");
}
push {
say = ("<push the robot>", null);
action {
setFlag("Hostile", true);
dialog.say_sorry.visible = true;
}
}
say_sorry(visible = false) {
say = ("I'm sorry", null);
reply = ("<beep><bop><boop>", "sounds/robot/beepbopboop.ogg");
action {
setFlag("Hostile", false);
dialog.say_sorry.visible = false;
}
}
}
}


Could then yield the same AST as if you had written thisfuncdef void DIALOGFUNC();


class DialogEntry
{
string conversationLine;
bool visible;
DIALOGFUNC @func;

DialogEntry(string say, bool v, DIALOGFUNC @f)
{
conversationLine = say;
visible = v;
@func = @f;
}
}

typedef array<DialogEntry> DialogArray;

// The above code would obviously be in some library that you would import

class Robot : NpcAgent
{
DialogArray dialog;

Robot()
{
super("models/robot.collada");
_dialog__initialize();
}

void _dialog__initialize()
{
dialog.resize(3);
dialog[0] = DialogEntry("Hi, there", true, @this._dialog_say_hello);
dialog[1] = DialogEntry("<push the robot>", true, @this._dialog_push);
dialog[3] = DialogEntry("I'm sorry", false, @this._dialog_say_sorry);
}

void _dialog_say_hello()
{
Gui.printNpcConversationLine("<beep><bop><boop>");
Audio.playSound(this.transform, ResourceManager.getHandle("sounds/robot/beepbopboop.ogg"));
}

void _dialog_push()
{
setFlag("Hostile", true);
dialog[3].visible = true;
}

void _dialog_say_sorry()
{
Gui.printNpcConversationLine("<beep><bop><boop>");
Audio.playSound(this.transform, ResourceManager.getHandle("sounds/robot/beepbopboop.ogg"));
setFlag("Hostile", false);
dialog[3].visible = true;
}
}


This would also allow adding preprocessor directives, conditional compilation, class and function metadata. You name it.
Yo dawg, I put a language in your language so you can program while you program.

Why not just use a separate language then? Seems a bit redundant.
One reason could be that you want your game designers or level editors to be able to affect game changes without having to learn programming, while still maintaining the power of programming natively in the scripting language of the game.
The point is not necessarily replacing the grammar of the language, but merely adding macros as syntactic sugar. Like property accessors in AngelScript are syntactic sugar for associated get_ and set_ methods.
The power of combinatory parsing lies in the fact that you can have an extensible language so you can write macros that are syntactic sugar for your boilerplate code but it still compiles down to the same bytecode as if you had written that boilerplate code yourself.
One trivial example is adding an assert macro, which expands to an if statement, that emits no bytecode when compiled for a release build. Because your assert macro watches for a command line flag and prunes the if statement node when it sees that you are doing a release build of the script.
The best part is that there is already a separation between the parser and the compiler via an AST, unlike many other scripting languages, so allowing to plug in a custom parser should be relatively painless.
Then couldn't we do that with the CScriptBuilder's metadata feature and keep the parser separate from the script engine? I think I had already brought up the possibility of a macro system to Andreas.
It could be done with CScriptBuilder but if we can take over the construction and filtering of the AST we can get better integration with cleaner code.

Let's do a comparison. Here is conditional compilation from ScriptBuilder int pos = 0;
int nested = 0;
while( pos < (int)modifiedScript.size() )
{
int len;
asETokenClass t = engine->ParseToken(&modifiedScript[pos], modifiedScript.size() - pos, &len);
if( t == asTC_UNKNOWN && modifiedScript[pos] == '#' )
{
int start = pos++;

// Is this an #if directive?
asETokenClass t = engine->ParseToken(&modifiedScript[pos], modifiedScript.size() - pos, &len);

string token;
token.assign(&modifiedScript[pos], len);

pos += len;

if( token == "if" )
{
t = engine->ParseToken(&modifiedScript[pos], modifiedScript.size() - pos, &len);
if( t == asTC_WHITESPACE )
{
pos += len;
t = engine->ParseToken(&modifiedScript[pos], modifiedScript.size() - pos, &len);
}

if( t == asTC_IDENTIFIER )
{
string word;
word.assign(&modifiedScript[pos], len);

// Overwrite the #if directive with space characters to avoid compiler error
pos += len;
OverwriteCode(start, pos-start);

// Has this identifier been defined by the application or not?
if( definedWords.find(word) == definedWords.end() )
{
// Exclude all the code until and including the #endif
pos = ExcludeCode(pos);
}
else
{
nested++;
}
}
}
else if( token == "endif" )
{
// Only remove the #endif if there was a matching #if
if( nested > 0 )
{
OverwriteCode(start, pos-start);
nested--;
}
}
}
else
pos += len;
}


And here is conditional compilation using metalua
mlp.lexer:add "#if"
mlp.lexer:add "#else"
mlp.lexer:add "#endif"


-- This function determines which block is emitted for a conditional statement
-- If the identifier has been defined we emit the first block otherwise we emit the second block
local function cond_if_builder(x)
local identifier, if_block, else_block = x[1], x[2], x[3]
if identifier in preprocessorDefines then
return if_block
else
-- else_block[1] is the tag "#else" and else_block[2] is the conditional block
return else_block[2]
end
end


-- An <#if> token is followed by an identifier and a block and is terminated by the <#else> and <#endif> tokens
local cond_elsifs_parser = gg.list {
gg.sequence { mlp.id, mlp.block },
separators = "#else",
terminators = { "#else", "#endif" }
}


-- A conditional statement looks like this
-- <#if> <identifier> <block> (<#else> <block>)? <#endif>
local cond_statement = gg.sequence {
"#if", cond_elsifs_parser, gg.onkeyword { "#else", mlp.block }, "#endif", builder = cond_if_builder
}


-- Add the conditional statement to the existing list of statements
mlp.stat:add cond_statement


N.b. I haven't tried this code out but it should give you a hint of how it works.
The only changes this would require to AngelScript is allowing users to plug in custom parsers and using the AST api. The rest can be an external project.

This topic is closed to new replies.

Advertisement