Jump to content

  • Log In with Google      Sign In   
  • Create Account

#Actualcr88192

Posted 11 February 2013 - 03:18 AM

FWIW, for my project, I am using my own script language (sort of like JavaScript and ActionScript mixed with C, Java, and C#).


and, generally the way it is run works in a number of stages.

first, there is a parser, which:
starts breaking the code down into individual "tokens", for example, a brace, or an identifier (variable name), or a number, ... each according to their specific rules (numbers contain digits, ...);
starts matching the tokens against various syntax patterns, going down the parts of the tree for which the syntax matches;
in doing so, produces a tree-like structure (an "AST") representing the code it has seen along the way.

the next stage is the front-end compiler which:
walks along this structure, figuring out various pieces of information (where is this variable declared? does this operator have a known type? ...) and starts spitting out appropriate globs of bytecode.

at this point, bytecode may be saved for later, pulled from files, or simply passed to the back-end.


then, we get to the interpreter backend, which in my case currently:
begins by converting this bytecode into a list-like structure (representing individual bytecode operations), usually whenever a function or method is first called;
splits apart this structure into a collection of "traces", which represent various non-branching sequences of bytecode instructions (technically, this is a "Control Flow Graph", or "CFG");
it may either directly execute these traces (via embedded function-pointers), or (if a counter runs out), pass it off to the JIT, which then spits out a mix of function-calls back into the interpreter, and directly compiled machine-code sequences for various operations.


granted, these last stages aren't strictly required, and there are many interpreters which instead directly execute bytecode (by decoding and invoking the logic for each bytecode instruction one-at-a-time).

and, some much simpler interpreters will simply walk over the AST, and some directly on the input text.


the tradeoff mostly has to do with performance, and the relative amount of resources "invested" into a piece of code.
simpler strategies are better when code will likely at most only ever be seen once, and never be run again. typically, execution speeds are "very slow" (often 1000x-10000x or more slower than native).

directly interpreting bytecode is generally better when code will be run more than once (so hopefully isn't "dead slow"), but doesn't need to be "particularly" fast. bytecode interpretation can usually get within about 100x-200x of native speeds (IME with "while()" loops and large "switch()" blocks).

(some people have also had good luck with getting good performance out of "computed goto" and "label pointers", but these are generally GCC specific).


trace-graphs can get a little faster, IME generally around 50-60x of native (though some experimental interpreter designs have run much faster than this, like 8-10x of native, but it seems problematic to replicate this effect in a "real" interpreter). (granted, some could have to do with architecture: my experimental interpreter was more Dalvik like, using untagged registers and three-address operations, whereas my main interpreter is a stack-machine and currently still uses tagged references).

this is partly because a lot of the "figuring out what to do" work can be moved out of the execution path, so largely it becomes a matter of simply calling through function-pointers (and generally using a "trampoline loop").


a JIT can still be faster though, reaching comparable speeds to native C or C++ code (and doing pretty much anything else a traditional compiler can do), though this isn't always necessarily the case, for example, my current JIT is fairly naive and can get within about 3x of native C speeds in some cases, but typically runs often runs a bit slower than this (as much of its operation is generally still handled via interpreter machinery), but often this doesn't matter as much as I am not usually using the scripting-language for speed critical tasks. (basically, it goes fastest if treated like it were a C-like subset of Java, but using it this way is lame, and is not really what it was designed for...). (reaching C-like speeds generally requires things like figuring out how to cache variables in registers, perform register allocation, ... which some of my past JITs have done, but my current JIT doesn't do).


note that, while interpreters can be interesting, writing them can also end up eating up a large amount of time and effort, so are not necessarily a good idea if a person has other things they want to get done (and, it may take effectively years of time and effort invested into them before they stop looking like a joke if compared with more mature pieces of technology).

#3cr88192

Posted 11 February 2013 - 03:15 AM

FWIW, for my project, I am using my own script language (sort of like JavaScript and ActionScript mixed with C, Java, and C#).


and, generally the way it is run works in a number of stages.

first, there is a parser, which:
starts breaking the code down into individual "tokens", for example, a brace, or an identifier (variable name), or a number, ... each according to their specific rules (numbers contain digits, ...);
starts matching the tokens against various syntax patterns, going down the parts of the tree for which the syntax matches;
in doing so, produces a tree-like structure (an "AST") representing the code it has seen along the way.

the next stage is the front-end compiler which:
walks along this structure, figuring out various pieces of information (where is this variable declared? does this operator have a known type? ...) and starts spitting out appropriate globs of bytecode.

at this point, bytecode may be save for later, pulled from files, or simply past to the back-end.


then, we get to the interpreter backend, which in my case currently:
begins by converting this bytecode into a list-like structure (representing individual bytecode operations), usually whenever a function or method is first called;
splits apart this structure into a collection of "traces", which represent various non-branching sequences of bytecode instructions (technically, this is a "Control Flow Graph", or "CFG");
it may either directly execute these traces (via embedded function-pointers), or (if a counter runs out), pass it off to the JIT, which then spits out a mix of function-calls back into the interpreter, and directly compiled machine-code sequences for various operations.


granted, these last stages aren't strictly required, and there are many interpreters which instead directly execute bytecode (by decoding and invoking the logic for each bytecode instruction one-at-a-time).

and, some much simpler interpreters will simply walk over the AST, and some directly on the input text.


the tradeoff mostly has to do with performance, and the relative amount of resources "invested" into a piece of code.
simpler strategies are better when code will likely at most only ever be seen once, and never be run again. typically, execution speeds are "very slow" (often 1000x-10000x or more slower than native).

directly interpreting bytecode is generally better when code will be run more than once (so hopefully isn't "dead slow"), but doesn't need to be "particularly" fast. bytecode interpretation can usually get within about 100x-200x of native speeds (IME with "while()" loops and large "switch()" blocks).

(some people have also had good luck with getting good performance out of "computed goto" and "label pointers", but these are generally GCC specific).


trace-graphs can get a little faster, IME generally around 50-60x of native (though some experimental interpreter designs have run much faster than this, like 8-10x of native, but it seems problematic to replicate this effect in a "real" interpreter). (granted, some could have to do with architecture: my experimental interpreter was more Dalvik like, using untagged registers and three-address operations, whereas my main interpreter is a stack-machine and currently still uses tagged references).

this is partly because a lot of the "figuring out what to do" work can be moved out of the execution path, so largely it becomes a matter of simply calling through function-pointers (and generally using a "trampoline loop").


a JIT can still be faster though, reaching comparable speeds to native C or C++ code (and doing pretty much anything else a traditional compiler can do), though this isn't always necessarily the case, for example, my current JIT is fairly naive and can get within about 3x of native C speeds in some cases, but typically runs often runs a bit slower than this (as much of its operation is generally still handled via interpreter machinery), but often this doesn't matter as much as I am not usually using the scripting-language for speed critical tasks. (basically, it goes fastest if treated like it were a C-like subset of Java, but using it this way is lame, and is not really what it was designed for...). (reaching C-like speeds generally requires things like figuring out how to cache variables in registers, perform register allocation, ... which some of my past JITs have done, but my current JIT doesn't do).


note that, while interpreters can be interesting, writing them can also end up eating up a large amount of time and effort, so are not necessarily a good idea if a person has other things they want to get done (and, it may take effectively years of time and effort invested into them before they stop looking like a joke if compared with more mature pieces of technology).

#2cr88192

Posted 11 February 2013 - 03:15 AM

FWIW, for my project, I am using my own script language (sort of like JavaScript and ActionScript mixed with C, Java, and C#).


and, generally the way it is run works in a number of stages.

first, there is a parser, which:
starts breaking the code down into individual "tokens", for example, a brace, or an identifier (variable name), or a number, ... each according to their specific rules (numbers contain digits, ...);
starts matching the tokens against various syntax patterns, going down the parts of the tree for which the syntax matches;
in doing so, produces a tree-like structure (an "AST") representing the code it has seen along the way.

the next stage is the front-end compiler which:
walks along this structure, figuring out various pieces of information (where is this variable declared? does this operator have a known type? ...) and starts spitting out appropriate globs of bytecode.

at this point, bytecode may be save for later, pulled from files, or simply past to the back-end.


then, we get to the interpreter backend, which in my case currently:
begins by converting this bytecode into a list-like structure (representing individual bytecode operations), usually whenever a function or method is first called;
splits apart this structure into a collection of "traces", which represent various non-branching sequences of bytecode instructions (technically, this is a "Control Flow Graph", or "CFG");
it may either directly execute these traces (via embedded function-pointers), or (if a counter runs out), pass it off to the JIT, which then spits out a mix of function-calls back into the interpreter, and directly compiled machine-code sequences for various operations.


granted, these last stages aren't strictly required, and there are many interpreters which instead directly execute bytecode (by decoding and invoking the logic for each bytecode instruction one-at-a-time).

and, some much simpler interpreters will simply walk over the AST, and some directly on the input text.


the tradeoff mostly has to do with performance, and the relative amount of resources "invested" into a piece of code.
simpler strategies are better when code will likely at most only ever be seen once, and never be run again. typically, execution speeds are "very slow" (often 1000x-10000x or more slower than native).

directly interpreting bytecode is generally better when code will be run more than once (so hopefully isn't "dead slow"), but doesn't need to be "particularly" fast. bytecode interpretation can usually get within about 100x-200x of native speeds (IME with "while()" loops and large "switch()" blocks).

(some people have also had good luck with getting good performance out of "computed goto" and "label pointers", but these are generally GCC specific).


trace-graphs can get a little faster, IME generally around 50-60x of native (though some experimental interpreter designs have run much faster than this, like 8-10x of native, but it seems problematic to replicate this effect in a "real" interpreter). (granted, some could have to do with architecture: my experimental interpreter was more Dalvik like, using untagged registers and three-address operations, whereas my main interpreter is a stack-machine and currently still uses tagged references).

this is partly because a lot of the "figuring out what to do" work can be moved out of the execution path, so largely it becomes a matter of simply calling through function-pointers (and generally using a "trampoline loop").


a JIT can still be faster though, reaching comparable speeds to native C or C++ code (and doing pretty much anything else a traditional compiler can do), though this isn't always necessarily the case, for example, my current JIT is fairly naive and can get within about 3x of native C speeds in some cases, but typically runs often runs a bit slower than this (as much of its operation is generally still handled via interpreter machinery), but often this doesn't matter as much as I am not usually using the scripting-language for speed critical tasks. (basically, it goes fastest if treated like it were a C-like subset of Java, but using it this way is lame, and is not really what it was designed for...). (reaching C-like speeds generally requires things like figuring out how to cache variables in registers, perform register allocation, ... which some of my past JITs have done, but my current JIT doesn't do).


note that, while interpreters can be interesting, writing them can also end up eating up a large amount of time and effort, so are not necessarily a good idea if a person has other things they want to get done (and, it may take effectively years of time and effort invested into them before they stop looking like a joke if compared with more mature pieces of technology).

#1cr88192

Posted 11 February 2013 - 03:07 AM

FWIW, for my project, I am using my own script language (sort of like JavaScript and ActionScript mixed with C, Java, and C#).


and, generally the way it is run works in a number of stages.

first, there is a parser, which:
starts breaking the code down into individual "tokens", for example, a brace, or an identifier (variable name), or a number, ... each according to their specific rules (numbers contain digits, ...);
starts matching the tokens against various syntax patterns, going down the parts of the tree for which the syntax matches;
in doing so, produces a tree-like structure (an "AST") representing the code it has seen along the way.

the next stage is the front-end compiler which:
walks along this structure, figuring out various pieces of information (where is this variable declared? does this operator have a known type? ...) and starts spitting out appropriate globs of bytecode.

at this point, bytecode may be save for later, pulled from files, or simply past to the back-end.


then, we get to the interpreter backend, which in my case currently:
begins by converting this bytecode into a list-like structure (representing individual bytecode operations), usually whenever a function or method is first called;
splits apart this structure into a collection of "traces", which represent various non-branching sequences of bytecode instructions (technically, this is a "Control Flow Graph", or "CFG");
it may either directly execute these traces (via embedded function-pointers), or (if a counter runs out), pass it off to the JIT, which then spits out a mix of function-calls back into the interpreter, and directly compiled machine-code sequences for various operations.


granted, these last stages aren't strictly required, and there are many interpreters which instead directly execute bytecode (by decoding and invoking the logic for each bytecode instruction one-at-a-time).

and, some much simpler interpreters will simply walk over the AST, and some directly on the input text.


the tradeoff mostly has to do with performance, and the relative amount of resources "invested" into a piece of code.
simpler strategies are better when code will likely at most only ever be seen once, and never be run again. typically, execution speeds are "very slow" (often 1000x-10000x or more slower than native).

directly interpreting bytecode is generally better when code will be run more than once (so hopefully), but doesn't need to be "particularly" fast. bytecode interpretation can usually get within about 100x-200x of native (IME with "while()" loops and large "switch()" blocks).

(some people have also had good luck with "computed goto" and "label pointers", but these are generally GCC specific).


trace-graphs can get a little faster, IME generally around 50-60x of native (though some experimental interpreter designs have run much faster than this, like 8-10x of native, but it seems problematic to replicate this effect in a "real" interpreter). (granted, some could have to do with architecture: my experimental interpreter was more Dalvik like, using untagged registers and three-address operations, whereas my main interpreter is a stack-machine and currently still uses tagged references).

a JIT can still be faster though, reaching comparable speeds to native C or C++ code (and doing pretty much anything else a traditional compiler can do), though this isn't always necessarily the case, for example, my current JIT is fairly naive and can get within about 3x of native C speeds in some cases, but typically runs often runs a bit slower than this (as much of its operation is generally still handled via interpreter machinery), but often this doesn't matter as much as I am not usually using the scripting-language for speed critical tasks. (basically, it goes fastest if treated like it were a C-like subset of Java, but using it this way is lame...).


note that, while interpreters can be interesting, writing them can also end up eating up a large amount of time and effort, so are not necessarily a good idea if a person has other things they want to get done (and, it may take effectively years of time and effort invested into them before they stop looking like a joke if compared with more mature pieces of technology).

PARTNERS