Archived

This topic is now archived and is closed to further replies.

flyguy

Scripting Engines...

Recommended Posts

flyguy    122
Hi... I am writing a script editor for my scripting engine in my rpg. I was wondering..how can I make the scripting files so that I can write them in plain english, (the language I wrote), but if someone loads it, they can't make heads or tails of it and can't change it? Should I have the script editor convert the scripts to binary and then have a converter back to the scripting language in my RPG? It seems kind of slow. Please help! Tom "And I invented the internet!" - Al Gore Edited by - flyguy on June 26, 2001 3:23:06 PM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
I would reccomend changing them to binary, it''ll probably knock the file size down about 75%, plus make it allot easier for your engine to work. The language thing would be pretty cool too. I would probably have an instruction list and a mneonic list, you could switch the mneonics(I know I am spelling it wrong) based on the language.

Share this post


Link to post
Share on other sites
flyguy    122
So you''d suggest having the script editor encode the file into binary, and then decode the binary file and store it in memory, then execute the commands in my RPG? You don''t think it would be too slow?

"And I invented the internet!" - Al Gore

Share this post


Link to post
Share on other sites
Midnight Coder    122
I don''t see how it could be slow... it would be faster in my opinion. Rather than doing a lot of language parsing and stuff, the editor would take care of all that. Basically, the editor would ''interpret'' your script and put it into binary form. Then, the game wouldn''t have to do no interpretation, it would be able to just right away execute the commands it reads from the binary file. It would be a whole lot easier.

Share this post


Link to post
Share on other sites
JmarsKoder    122
Ok none of these people know what they are talking, just do this. use a small simple xor cipher algo, with a basic randomizer built in. you select the key and bam, unless there is a leet cracker that really wants to break your script you will have nothing to worry about. plus if your key is the same text size ass the script, and you use the randomizer, you would need a f*in super computer to crack it.

Reality Makes Me XIC
I hack code: passion is my feul. Use my programs, experience genius.
http://www.x-i-c.com/

Share this post


Link to post
Share on other sites
afterburn    124
Ok what is with the last post? what does he think this is RSA? His method would require the editor to have the same key no matter what, so that you send the script to me, I could not open the file becuase I have no key. There is no need for that..............

I would write it in C++, but no need for language details. I would use the binary file but also just shift the data left or right every other line.

Share this post


Link to post
Share on other sites
JmarsKoder    122
Maybe, my suggestion was... misunderstood.

Expand your mind outside of the box: the box being "if its encrypted the user must use a password" bull, it could be set to be crypted to a randomly generated file placed in some intricate location on the user HD. Then the program would just reference the file, and that would be the key. Look its low secrity but for his appearant assumed target audience, imho its pretty good.

Reality Makes Me XIC
I hack code: passion is my feul. Use my programs, experience genius.
http://www.x-i-c.com/

Share this post


Link to post
Share on other sites
Epolevne    175
You could just "compile" the scripts. It would mean you''d have to have the "source code" scripts, and the compiled ones...and you just don''t distribute the "source code." You''d also get some of the benefits of "compiled" code (speed, size, etc).

This could be as simple as writing out the parsed memory as binary. Then loading the binary file directly into a structure, so you don''t have to parse again.


Epolevne

Share this post


Link to post
Share on other sites
Zaei    122
Go the route of assembly, and use simple opcodes with one or two parmeters. Each opcode has a number assigned to it, and the editor "compiles" the source into the resulting code. Your engine just has to read in a number, use a switch() statement on it, and decide if it should grab 0,1, or 2 more numbers, and do stuff. The go a bit further, and write your own scripting language on top of the "assembly".

Z.

Share this post


Link to post
Share on other sites
TrIaD    122
use opcodes...
store the data in raw format...
have each function retrieve its parameters...
this will allow you to pass strings, multiple arguments (first arg=how many to read), whatever...
store everything in raw format (i.e. write out data types in binary, strings direct to file with null terminator, and when you read a string just start reading bytes until you hit the null...

it can do control statements like ASM, use a JMP (JE, JL, JNE, etc) that executes fseek()...
start it with a binary zero (perhaps after a few header bytes) and most text-editors will choke trying to read it...
you''ll be able to see the strings, but you need a hex-editor to read numbers, and you have to simultaneously figure out what the opcodes are and how they read their paramteres, because there''s nothing in the file that says, "okay this is an opcode", you rely on each function pulling its arguments correctly to get the file position cursor set to something that''s the next opcode...
not very error-friendly, but it should work if the script is clean...

if you REALLY want security, use some form of encryption (XOR could work) on the entire file, then load it into a memory file and decrypt it... that way you could support both encrypted and non-encrypted scripts and theoretically only care which it is when you load and unload it

--Tr][aD--

Share this post


Link to post
Share on other sites
Chris Hargrove    256
Two comments, one related to scripting languages, and one to Jmars'' references to encryption:

One... I''d recommend going the route of Epol and Zaei''s suggestions regarding compilation.

Specifically, the five standard "stages" of a compiler and interpreter are ordered as follows:

1. Lexical Analysis (aka "Lexing" or "Scanning"): converts a stream of raw text into a sequence of tokens, which filter out whitespace (including comments) and tag the tokens as "identifier", "integer", "keyword" and so forth. See tools such as Lex.

2. Syntactic Analysis (aka "Parsing"): converts the token sequence from the lexing stage into a hierarchical concrete syntax tree (CST) based on the language grammar. This can then be converted to an abstract syntax tree (AST), where the root of the tree is a source file, progressing downward all the way to the terminal symbols stored at the leaves. For example, the expression A+B*C would be converted to a tree where the root is a multiply operation node, whose second child is the C identifier node, and whose first child is an add operation node, with child identifier nodes A and B. See tools such as Yacc with regards to building the CST; the AST can be built manually from the CST.

3. Semantic Analysis: Analyzes and modifys a syntax tree for correct meaning, including symbol generation (for types, variables, functions etc), type checking and coercion (like int to float promotion), and other such tasks. Some optimizations suitable to be performed on trees can also be done here (such as constant folding).

4. Intermediate Code Generation: Converts a semantically-augmented tree into a linear form, using simple primitive machine-independent operations. The ICode can be in any convenient form, although something that''s not register-based is recommended (since this is meant to be machine-independent, and different machines have different register counts, so let the object code generator handle this). Common forms include memory-based systems for static single assignment, and abstract stack machines. The former is good for real compilers that intend to generate object code, while the latter is easier to generate from an AST but doesn''t convert as quickly to a real machine architecture, so it''s more applicable to a bytecode VM. Optimizations such as basic block reordering and dead code elimination can be done here.

5. Object Code Generation: Converts the intermediate code into machine-specific object code, performing register allocation and other conversions required for adaptation to a particular platform. Final machine-level optimizations (like peephole optimization) can be performed here.

Now, with regards to a scripting language, #5 generally isn''t applicable. #4 is though, and if you compile your scripts to a simple stack machine VM bytecode system, you''ll get pretty decent execution speed and reverse engineering the source from the bytecode will be about as useful/useless as trying to reverse engineer C++ code from assembly.

If you''re looking for more information on how to write a compiler/interpreter like this, I suggest reading a standard compiler textbook, such as the "dragon book", aka "Compilers: Principles, Techniques, and Tools" by Aho, Sethi, and Ullman.


Two... I''m not recommending encryption for the scripts in this case since it''s really not necessary, but regardless, with regards to JmarsKoder''s encryption post:

quote:
Ok none of these people know what they are talking, just do this. use a small simple xor cipher algo, with a basic randomizer built in. you select the key and bam, unless there is a leet cracker that really wants to break your script you will have nothing to worry about. plus if your key is the same text size ass the script, and you use the randomizer, you would need a f*in super computer to crack it.


This is false. It would be advised that you please read a book such as "Applied Cryptography" by Bruce Schneier before recommending things like this (I''m talking specifically to Jmars here).

XOR ciphers protect against naive folks trying to read a block of data directly, but anyone beyond that can break them in seconds by running a frequency analysis on them. XORing all the values in a given block of data does not change the frequencies of those values. At the very least, anyone even thinking of using an XOR cipher for anything at all should at least add some kind of additional position-dependent operation to them, to at least hinder simple frequency checks... and even then, this is still pretty minimal encryption.

Don''t factor in target audience in when choosing an encryption scheme, ever. Remember that one good cryptanalyst can release his solutions (in the form of algorithms, code, or even actual tools) to the public, and that''s the end of it. Either do it right it don''t do it at all.

Regarding the "if the key is the same size as the script, you''d need a supercomputer to crack it" comment, that is also false if you use a "basic randomizer" (your words) to generate the key. The only unbreakable cryptosystem in existance (according to the laws of information theory) is the one-time pad. This uses a key as long as the plaintext, and adds the key and the plaintext together to produce the ciphertext. However, this is unbreakable *only if the key is truly random*, and that requires something other than a computer-based PRNG. The best cryptographic PRNGs can still never be as good as a one-time pad, because they''re determistic if you know the input conditions.

Having a random number generator that has a decent distribution is not sufficient. For cryptography, you must have one that is unpredictable. For example, MSVC++''s implementation of the C stdlib "rand()" function uses an LCG (linear congruential generator). It''s not a very good LCG, and there are plenty of better ones out there with a better number distribution. Unfortunately, *ALL* LCGs are 100% predictable if you know the seed and the constants used. So are LFSRs (linear feedback shift registers) and many other PRNGs you''ll find out there. Good for random distribution, but very predictable. And it doesn''t matter whether your key is as long as your plaintext... if you use a PRNG to generate the key (rather than a truly "random" process taken from nature), and someone figures out the seeds for that PRNG (often an easy task), then they can reproduce that same key stream no matter how long your plaintext is.

Incidentally, this exact issue was the cause of one of Netscape''s bigger public boo-boos. For a while their "secure" encryption scheme used session keys that were based on a good PRNG, but the inputs to the PRNG were few and rather predictable, so when it was broken shortly afterwards it caused some seriously bad PR against them.

In general, cryptography is one of those fields that a lot of people think they know more about than they do. I''m constantly seeing people come up with "unbreakable" crypto schemes for parts of their game projects, when in reality almost all of them could be torn to shreds by a competant cryptanalyst (and plenty of crackers out there have done their homework in terms of cryptanalysis). I''m able to break half of them myself, and I''m not a good cryptanalyst whatsoever.

If you ever attempt to do anything with crypto, make sure to do some reading first; it''ll save you a lot of headache later down the road. "Applied Cryptography" is a great place to start.

--
Chris Hargrove
Senior Programmer, Legend Entertainment

Share this post


Link to post
Share on other sites
TrIaD    122
...and why would an XOR crumble so easily on a freq search? what exactly is going to be in these scripts that appears with such frequency?

particularly if you use a long key, your "frequent" hits won''t show up, because each occurrence could be encrypted as several different things, particularly if you use a strange length for the key (say 5 or 7 bytes, as opposed to something like 4), and if you''re using a system like the one I described, that has absolutely no particular alignment of where each opcode is stored...

But, to take this back down to earth, let''s be honest. The goal of writing a scripting engine should be to keep the average user from poking around. It''s extremely unlikely that you''d ever be able to write something that someone won''t crack, if they''re good, rather the objective is to make it difficult enough that you have to be very persistent to crack the system. In many cases, simply compiling the script is sufficient, since probably 99.9% of your end-users aren''t going to decompile the program and study its inner workings (in which case, they''re toying with a whole lot more than your scripts, anyway). Even storing the scripts in some pack format so they aren''t just lying around may be suficient, even if they''re still stored in ascii! (note that if you do this, you should probably implement at least a simple CRC system to prevent random tampering). Compiling and encrypting your scripts (using the XOR method described several times) should be sufficient. If you do anything more complex, chances are anyone that would break your code would be just as likely (if not moreso) to decompile and tinker with your actual program.

--Tr][aD--

Share this post


Link to post
Share on other sites
freshya    122
Hmm... explain to me why you need cryptography again? Wait.. before you do, tell me how successfuly Quake would have been if you couldn''t edit it.

.. it wouldn''t have been? Okay!

In my games I set the security level at a bit past "chimpanzee" level - not only because I couldn''t encrypt anything better than that (I am a chimpanzee, after all), but because editing is a part of the whole computer games scene.

The level of security should also be variable to the game - ie. for a Pac Man game make them .txt files so people are encouraged to edit them, poke around etc... For a slightly more complex game, rename the files, add one to each character, something small. Anyone who wants to can still edit them. The user will know they are cheating, so it doesn''t really matter.

On the other hand, you may be coding a MMORPG (as opposed to a single player one, and let''s face it, which self respecting enthusiastic startup video game company isn''t?) in which case you can ignore my post.


cheerio

Share this post


Link to post
Share on other sites
Kylotan    9852
quote:
Original post by TrIaD
...and why would an XOR crumble so easily on a freq search? what exactly is going to be in these scripts that appears with such frequency?


The letter ''E''.

And others.

It''s plain English, as stated in the original question.

Share this post


Link to post
Share on other sites
Mithrandir    607
quote:
Original post by TrIaD
...and why would an XOR crumble so easily on a freq search? what exactly is going to be in these scripts that appears with such frequency?




First off, when xoring ASCII text, that gives the key away instantly. Not only is the most frequent character a space, but all ascii letter codes start off the same way.

capital letters: 41h-5Ah
binary: 01000001b - 01011010b
lowercase letters: 61h-7Ah
binary: 01100001b - 01111010b

Notice that the first two bits of any letter are ''01''.
Frequency analysis on ascii text will immediately show that most characters are letters, and thus you''d be able to figure out part of the key right away.

the most frequent characters, as I''ve said before, are spaces, which is 20h (32d). if you assume that the most frequent character is a space, make a key that xor''s the character into 20h, and then apply it to the rest of the document, 90% of the time you will have a valid key.


Now lets work with binary. Unless you are storing impossibly large numbers, most of any binary integer (assuming 32 bits is used) will consist of 0 bits. The first part of a 32 bit integer can be broken easily, since the highest bit is almost always a 0, and you can work downwards from there.

Basically, you can assume that the lowest bit will have an equal chance of being 1 or zero, but the higher bits will have a much higher likelyhood of being zero. Count the frequency of 1''s and zero''s in the 31''st bit, and whichever number is greater will most likely represent ''0''.


quote:

particularly if you use a long key, your "frequent" hits won''t show up, because each occurrence could be encrypted as several different things, particularly if you use a strange length for the key (say 5 or 7 bytes, as opposed to something like 4), and if you''re using a system like the one I described, that has absolutely no particular alignment of where each opcode is stored...



you''re talking about obfusciation. Sure, its effective in the short term, but obfusciation has been known to be a very weak form of protection. It also has the side effect of making your loading/saving code much more difficult to work with.

quote:

But, to take this back down to earth, let''s be honest. The goal of writing a scripting engine should be to keep the average user from poking around. It''s extremely unlikely that you''d ever be able to write something that someone won''t crack, if they''re good, rather the objective is to make it difficult enough that you have to be very persistent to crack the system. In many cases, simply compiling the script is sufficient, since probably 99.9% of your end-users aren''t going to decompile the program and study its inner workings (in which case, they''re toying with a whole lot more than your scripts, anyway). Even storing the scripts in some pack format so they aren''t just lying around may be suficient, even if they''re still stored in ascii! (note that if you do this, you should probably implement at least a simple CRC system to prevent random tampering). Compiling and encrypting your scripts (using the XOR method described several times) should be sufficient. If you do anything more complex, chances are anyone that would break your code would be just as likely (if not moreso) to decompile and tinker with your actual program.



All it takes is one person to crack your code. Once they distribute the crack in an easy-to-use package, even the people who know nothing about cracking will use it.

Do you honestly think that the only people who use cheats and cracks are the people who can do it themselves? Go check out astalavista.box.sk then. Its funny seeing cracks to programs released to the general public days after a program is released. This is exactly why greater organized security is required.

Share this post


Link to post
Share on other sites
TrIaD    122
Excuse me, but since people seem to be missing the point...
I was talking about throwing an XOR over COMPILED code...

so the only "e"s would be in whatever strings were given as arguments...
and if you used a string table, there wouldn''t be ANY

--Tr][aD--

Share this post


Link to post
Share on other sites
neonstar    122
don''t put yourself through the bullshit trouble of encrypting the files, just do like Epolevne said: make a converter to convert your plaintext script files to a binary format that is easily readable for the engine.

if you are worried about ease of use of the scripting engine during development, then make the command loader two-sided so it can load text script files and then binary script files. when you load a script in, be it text or binary, place the functions and their required data into a function data structure, then execute them in the game when needed. your script functions will be hardcoded anyway, so using a function behavior structure wouldn''t make things any harder.

if you are encryping the file, then how will you decrypt them back to standard english when you are running the game? will you decrypt the script to a temporary file which could be left out in the open if the game were to crash when the parser was running? even if the user couldn''t edit the script file to make it change the game, they could still find out something about that part of the game.

just some thoughts.

dave



--
david@neonstar.net
neonstar entertainment

Share this post


Link to post
Share on other sites
TrIaD    122
quote:

if you are worried about ease of use of the scripting engine during development, then make the command loader two-sided so it can load text script files and then binary script files.



Better yet, open Visual Basic, put a REALLY BIG textbox on your form, write some really simple load/save code, and either a: write your compiler with it, or b: write a "shell" command to call the compiler... do this whenever you save, or put it on a menu... that way compiling will only take a few seconds, minimal effort, and you''ll see any errors BEFORE you start up your game and try to run the script for real... compile errors, anyway...

I sure hope if you went that route, you''d put all of the text loader code in "#ifdef _DEBUG" blocks... plus, you''d have to write a compiler twice

quote:

if you are encryping the file, then how will you decrypt them back to standard english when you are running the game? will you decrypt the script to a temporary file which could be left out in the open if the game were to crash when the parser was running? even if the user couldn''t edit the script file to make it change the game, they could still find out something about that part of the game.



once again, THE SCRIPTING WOULD BE COMPILED! NOT IN ENGLISH!
...and I don''t know why it wouldn''t be loaded into memory... unless we''re talking about some HUGE scripts...

if you load it into memory, and it''s encrypted, you run a decryption pass over it before you leave the loader functions... that''s all there is to it

--Tr][aD--

Share this post


Link to post
Share on other sites
Mithrandir    607
encrypting the compiled code doesn''t neccessarily make it harder to crack.

Let us examine, for a moment, the purpose of a scripting engine: Externally loaded commands to be executed by a core game engine.
Reasons for using such a set-up: Almost exclusively to allow ease of use in mod-making.

This means that you will most likely release a compiler publicly, for mod-makers, correct? Once the cracker has a compiler, it is only a matter of time before they determine how it works; whether it is stack based or register based; complex command format or simple command format, addressing modes, etc.

Once the details are known about how it compiles, breaking a simple xor encryption is amazingly simple. If you''ve studied ASM at all, you''d know that 90% of all commands in a program are either mov''s, push''s or pop''s (depending on whether it is register or stack based). Thus the encryption can be cracked with a simple frequency analysis, again.


The most difficult part of programming is knowing that there are scores of crackers out there, and most of them are probably smarter than yourself. Its tough, because I do not know of a single program that is uncrackable.

If someone out there wants to crack your program bad enough, it will happen.

Share this post


Link to post
Share on other sites
Zaei    122
Why do we care if a hacker went in and changed our script files? If they did, and the game doesnt work anymore, all the better. If they want to mess around, their computer deserves to crash very hard. They go messing around, change something around so, for instance, a mov destination is now 0x23, instead of variable "x", go ahead and delete the entire script (or others, for that matter). Its the least they deserve. Throw something in the registry that says "I TRIED TO MODIFY ''s SCRIPTS. NOW I CANT PLAY!! WAAAAAAAA WAAAAAAA!!!". Think up fun stuff.

if(scriptModified) RebootWindows;

Z.

Share this post


Link to post
Share on other sites