Glad to see interest in using more advanced AI for games :D. I tried a few months ago to use a deep neural network for image recognition from a bot:
It used a web service (Clarifai), but the lag was long to obtain the response and there are quotas (this is why I ran it by screenshots), which makes the web service approach not viable commercially. Recently I tried running a deep neural network (Alexnet) for vision on my local machine which uses a mid 2012 laptop graphic card, using Graphlab framework, and inference took 0.4 second with a precision of around 80%. Actually since i am on mac and that gpu is not supported by this framework for mac, it might be even faster on gpu, in a not so powerful graphic card (jetson tx1) it takes 0.007 sec with a well optimised framework : https://developer.nvidia.com/gpu-inference-engine, on new pascal gtx 1080 it might be feasible to run continuous scene recognition for a few bots and enable a more realistic vision than possible with raycast and tagging.
Also vision is just a starting point, memories can be saved and retrieved from a familiar scene input with the same principles as a recommender system ect. :-), I think there is definitely potential there, but first the challenge is with the data, alexnet googlenet resnet are all trained on cars, dogs, common objects. It would be nice to do some kind of "rpgnet" with data from spaceships, dragons ect. no need for every fantasy related content though, for an image of a wizard there could be for example recognition of a robe and a man/woman, then with a semantic engine that runs separately in the game there would be some kind of reasoning like "robe is superposed on man/woman --> man/woman wears robe" then "man/woman wears robe ->value=seen("wizard", 10 meters...)" and then with personality programmed for e.g in a personality table for a barbarian "key=wizardseend==true" "value=diminishmood(-5)". See the potential? a player or bot can disguise itself based on assumption on knowledge and personality table, you could see behaviors emerging naturally which in my opinion completely changes the experience of the player because then bots are kind of sentient, in the long term (say a few years at least) there can be video analysis to recognise actions being done, and using other machine learning algorithms (e.g association rules) predict what is going to happen next ect.
Also I assumed no batching when I said 0.007 sec, if batching+recognition+reasoning on recognition (e.g embedded knowledge of rules like above) can already take under say 0.2 sec on a high end gaming gpu (on a titan x, which is less capable than gtx 1080, it can recognise more than 3000 images per sec with 128 images batch https://www.nvidia.com/content/tegra/embedded-systems/pdf/jetson_tx1_whitepaper.pdf) it would open up possibilities earlier than expected. Did anyone try to do image recognition with batching on a gpu? I don't have linux and can't access services such as amazon web services -_-, so those interested if you can try running tensorflow, caffe or another similar framework on a batch of images on a modern gpu and see the time it takes I would be very curious to know.