Audio scripting works with some various capabilities. Now to graphics. You might think that simple graphic functions like blitting would be the easiest portion to code. That would be true, until thinking about all the various ways that even straight-blit graphics are used - in single-buffered or double-buffered mode, layered in various combinations, sometimes read from a file straight to the frame but other times read or manipulated off-frame, where ideally they should be tightly packed together and maybe even compressed. Then there are all the conditions associated with choosing the optimal blitting routine and carrying out special blitting functions. I should have expected complexities, though; after all, even 1D audio presents more challenges than it displays.
Creating the network commands later. Asked my old prof for permission to use our little cluster at his school to test the net scripting. Then there's input. If this script is going to support mobile devices, input specs will need to be designed with that in mind.