Security

Started by
28 comments, last by GameDev.net 19 years, 7 months ago
Just a little question about the bytecodes. How "close to the system" is the bytecode really? All pointers etc is generated at linking, right? The reason I'm asking is that I'm thinking about a system in my game engine where the server could send scripts to the client. If this is sent as source I'm pretty sure I could prevent malicious code (simply by not registering potentially dangerous stuff for access by the scripts). I'm a bit worried about sending bytecode though (which would be nicer in many ways). If the bytecode can be constructed in such a way it'd start accessing stuff it shouldn't, it might be a pretty big problem. So, basically the question is... Uhm... All of the above, formulated as a question somehow. :) /Anders Stenberg
Advertisement
In a short answer: Loading precompiled bytecode is not safe.

It is for example possible to insert the following bytecode:

SET4 0 // value
SET4 0 // address
WRT4 // *address = value

The virtual machine would gladly execute this code.

The bytecode generated by the compiler shouldn't contain any hardcoded addresses though. So if you can somehow guarantee that nobody tampers with the bytecode you should be able to pass the bytecode from the server to the client (assuming the engines have been configured with the exact same functions, in the exact same order).

I don't think I will be able to make the VM somehow validate the bytecode so that it doesn't do anything it shouldn't. If you have any suggestions on that I would be very interested in hearing them.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by WitchLord
SET4 0 // value
SET4 0 // address
WRT4 // *address = value

The virtual machine would gladly execute this code.

But would the bytecode deserializer (I haven't checked out the saving/loading of bytecode really) allow for such a construct to be generated? I thought you had some metadata when saving about what all pointers should point to, and relink them pointers that make sense in the current environment when loading? (I have to admit I haven't looked much at the bytecode stuff at all, so I might not make sense at all. :)
Actually, when the bytecode is saved it's just a direct copy of the in memory bytecode, with some extras so that the module can be rebuilt after loading (function declarations etc).

The only thing that guarantees that the bytecode doesn't do any bad stuff is the compiler. The VM verifies that an object pointer isn't null when accessing methods or properties, it also verifies that no division by null is made. That's about all the verification that is done after compilation. Any more and the performance would get really bad.

It would be easy to verify that

SET4 0 // value
SET4 0 // address
WRT4 // *address = value

isn't executed. But it would also be very easy to hide this code with a few more instructions:

SET4 0 -- Store 0 in a variable
PSF 1
WRT4
POP 1

SET4 0 -- Write 0 to an adress stored in a variable
PSF 1
RD4
WRT4
POP 1

Both of these sequences are perfectly normal, and could be found in correct code. Though if they are run together as is, they will result in the same as the previous example.

The only way to make the bytecode safe is to have the application register the memory ranges that the VM should allow access to and then have the VM verify each instruction that accesses memory. As you can imagine this would be extremely inefficient.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I still don't really understand how the bytecode could be just saved and loaded right off. Mustn't _all_ pointers _always_ be resolved when loaded? I mean, how could possibly a stored pointer make sense between two runs of a program?
I would assume the 'pointers' in the bytecode are relative the AngelScript stack, not the system. So as long as the stack is the same size, it's all there.

You said you're sending code from the Server to the Client - if some client messes with it and runs it, they are screwing themselves. If someone writes a malicious server, that's altogether different. And, of course, it would be rather easy for the client to cheat.
It's really only global variables that would be accessed through memory pointers, the rest use offsets from the function stack frame or stack pointer. I solved the access to global variables by indexing the list of variable declarations, it's slightly slower than direct memory access but it simplifies other things. The VM doesn't verify out-of-bounds access to these lists though as it relies on the compiler for that.



AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Couldn't the loader do a sanity check if it's in range when loading?
It could and it's probably a good idea, but it would only protect against uncompatible engine configurations. It still wouldn't solve the other issue above with malicious code.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I guess I don't know enough about this to actually say anything about it, but I don't see why not. :)
If the loader/linker/whatever knows what indices make sense, it should be able to bolt out if the indices in the bytecode are way off? Or maybe the bytecodes are too low level to easily know what's legal and not?

This topic is closed to new replies.

Advertisement