finite amount of different screens you can store.
reexamine the math of the problem you are proposing to solve
a smallish game screen (these days) 1024 x 768 = 786432 pixels
assume for simple example its just black and white 0/1 values per pixel (assume for later color values will be magnitude more)
what is 2 to the power of 786432 ??? (rough calc 3 decimal digits every 10 bits)
that is a number with about 235929 digits ... your different 'screens' states
finite, yes. horribly huge, yes.
you might want to think more along the lines of preprocessing your 'screens' with some radical data reduction like edge detection and then some kind of 'stroke' encoding scheme to crush the state inputs to your NN down significantly.(like by magnitude of many thousands)
Be aware though that the bigger a NN (of the type you seem to be refering to even reduced in the way Ive suggested) is the more unlikely it will successfully form. Backtracking formation of the weights also requires many cyclings of the training data - not something you can do really in anything like real time.
You also WONT be able to recreate the original raw input 'state' (the video screen) as all the (hopefully) extraneous detaiil would be crushed out of the internal net data - the best you could reconstruct would be (for the edge/stroke factoring example) crude vector lines.