Machine Vision

posted in Continuous Refinement

Published February 08, 2005

Some thoughts on machine vision.

Say we're making a game with strong stealth elements... Splinter Cell, Thief or something. We want the player to be able to sneak around, hide, and so on; we want the AIs looking for him to respond to him being hidden as realistically as possible.

Traditional approaches use a line-of-sight test. A line is traced from the AI agent's eyes to some point on the player. If the line intersects the environment, the AI agent cannot 'see' the player. The problem with this technique is that only tracing to a single point cannot possibly produce an accurate result (unless the player is a glowing ball of light or something). We need some technique that takes the whole of the player's geometry into account from the AI's point of view.

Enter differential rendering. Here's how it works:

The camera is set to the AI agent's eyeposition.
The game world is rendered (sans player) into a texture.
The game world is rendered again (with player) into another texture.
The two textures are bound to texture stages.
The depth buffer is cleared to a value of 0.5.
An occlusion query is issued.
A quad is rendered. A pixel shader is in place which (a) samples both textures, (b) subtracts one from the other, (c) dots the result, (d) tests the result against some threshold value, (e) writes the test to the oDepth register.
The occlusion query is ended.
The results of the occlusion query are retrieved.

The resulting value is the total number of pixels for which the player character caused a significant difference to what the AI can see. What's nice about this?

It takes all world and player geometry into account.
It handles transparent stuff - as well as stuff with specialised shaders - seamlessly (you just render things normally for steps 2 and 3).
It allows you to set the 'keen-eyed-ness' of your AI agents by varying the size of the buffer. The smaller the buffer, the less sensitive the AI will be to small changes, and vice versa.
It allows camoflauge. If I'm wearing a black ninja-suit and I stand in a black area, I make a smaller difference than if I were standing against a white backdrop. And it's handled without any testing/calculation of light levels and what have you.

The big downside is that it requires pixel shader 2 (to write oDepth). There's also the fact that occlusion queries are asynchronous, which can make managing them a bit of a problem... using the results from the previous frame should be fine, though, because chances are you'll want your AI to pause for a split second before 'reacting' anyway.

Suggestions / comments? Otherwise I'll see about knocking up a demo of this...

Previous Entry urgh

Next Entry Smiling ninja hang from ceiling!

0 likes 11 comments

Comments

rick_appleton

Seems like a very nice method. But I'm wondering, why the occlusion test, and the specific depth clear?

- Make the two textures as above.
- Render to a new quad using the two textures and a PS that subtracts the texel in the texture without player from the one with player):
out.Color = abs(texWith-texWithout)
- Let the GPU generate a mipmap for this third texture automatically.
- Only read back the lowest mipmap level of this third texture.

The color of that single pixel should give an indication of how large the difference between the two original textures is. If you can make this texture only a single byte instead of color (GL_LUMINANCE maybe?) you would only need to read back a single byte. Of course this can happen just before you swap the buffers, since the AI doesn't need to react immediately like you already said.

February 08, 2005 10:40 AM

superpig

That's pretty damn smart right there, but I'm wondering about the value of that single pixel - it's the average difference from the norm, rather than a specific pixel count. Is it a better metric?

February 08, 2005 04:29 PM

rick_appleton

I'm not sure, I guess it'd need some testing. But the pixel count you mention is also in relation to the texture size, so that is actually also a percentage, as is the single byte. Although if the single byte is not detailed enough, you can always use a lower mipmap level (lower is more detailed right?).

February 09, 2005 01:29 AM

superpig

Yeah, it'd need to be a percentage. And yes a more detailed mipmap is a 'lower' level (level 0 being the most detailed).

I'm just pondering. Averaging out the pixel could mean that a large + mostly camoflauged object could produce the same value as a small, non-camoflagued object, no? While counting pixels gives you a percentage of the AI's FOV that is occupied by the target... not sure if that's more useful or not though. I guess it'd cause the large object to register more strongly. Perhaps that's desired behaviour - are you that much more likely to spot a large object that doesn't quite blend in, than a small object that sticks out?

February 09, 2005 04:00 AM

FReY

Nice idea, but I think I see a flaw.

Most AI agents have to move, they patrol, they turn their heads etc. I'd imagine that if the AI was to have a patrol cycle then this technique would not work during that cycle for the reason that there will be a certain amount of image-space transformation during the AI's own movement anyway.

Your idea is brilliant, but I can't think of any ways around this flaw. Your thoughts? :)

[edit]:
Sorry, I go the wrong end of the stick there. I thought you were checking the difference between 2 subsequent frames, meaning that I thought your method was based on the AI being able to pickup movement rather than just presence :)

February 09, 2005 07:15 AM

rick_appleton

Very good point though on the account of moving objects being more visible.

Maybe using two textures of the world with player, one of the previous frame, and one for the current frame. Blend these two together to generate the texture used for comparison. Although I think this will not be sufficient to raise the observability of moving objects.

February 09, 2005 07:28 AM

superpig

Perhaps combining my technique with Rick's would be a good idea? Take the average 'unusualness' using the mipmap approach to assess a static object, and compare pixel counts over multiple frames to... oh, I see. Hmm.

First and foremost, tests only need take place if the player is moving. We know that so we can skip many situations [smile]

Perhaps we could get somewhere by tracking the change in the AI's FOV. If a billboarded quad representing the player were projected into each FOV and the area of the resulting 2D primitive calculated, we could compare areas to see what kind of a change is 'expected' based on the AI's head movement alone. If the actual change in pixel counts is greater than that, we trigger suspicion?

Though that will only trigger things when the player is emerging from cover... won't handle the player standing in plain view and moving around, though we'd hope that by that time the AI has already spotted them... [smile]

February 09, 2005 08:50 AM

CJM

Hey,

Great idea, but there are 3 factors to whether the AI would 'see' you methinks.

Firstly is the out-of-placedness, where if you're wearing black on a white backdrop you look out of place. That would be solved with your render backdrop, copy to texture, render character [in rendered backdrop], copy to texture and compare algorithm.

Then you've got motion, which I'd solve by comparing the backdrop to the backdrop + character's position on last update. You'd be rendering stuff from a perspective that the AI couldn't possibly have seen it from [ie, the player's previous position, with the AI's new position], but if you had enough updates it wouldn't have a noticable effect.

The third aspect would have to be other stuff moving in the scene. To solve this, I'd probably render every moving object in it's previous position compared to the scene, then render every moving object's current position [including the character][maybe just in the region around the character], and then test to see if the ratio of the character's movement pixels to the world's movement pixels is greater than a certain threshold, to see if they'd actually notice anything, after all in a room of moving objects, the AI should be less likely to recognise a moving target.

Anywho, now I'm just ranting.

CJM

February 10, 2005 05:09 PM

superpig

Quote:
after all in a room of moving objects, the AI should be less likely to recognise a moving target.

Good point. Hmm...

One of the ideas I've been pondering is to use more than one occlusion query. If you segmented your player character, it'd let you work out which parts of the player can be seen (which may be helpful - if the player is in disguise and the AI can only see his boots, perhaps he will not raise suspicion). In general it'd allow the AI to deal with multiple threats. Something like this:

Render the static environment to a texture.
Copy texture to second texture.
Render this object.
Issue occlusion query; render depth-fixed quad to generate pixel count in the usual way.
Goto 2.

Because you copy the result to the comparision texture each time, it should work as if already-rendered objects are just 'part of the environment.'

Using this, you could obtain pixel-counts for all your moving objects, and then see what percentage of the total count the player makes up.

February 11, 2005 07:10 AM

Volte

Wow that is amazing. What a sweet idea. How about this for an idea, I could be way off, but I think I know what your talking about.

What if you added the difference for each pixel the player is, to a variable on the AI. so basically, if you think of it visually, there is a bar of concern, which is filled when the character is in the AI eyesight (seeing the character is what is being determined). If the bar rises over some degree which is linked to the AIs observance level, a flag is thrown, however if not then Mr. AI goes on his way. I think this would work too because if the character has moved, there is more of a difference, so more... concern bar is filled. Also what if overtime, during the period that Mr. AI is not looking towards the spot that the character is, the bar of concern "deflates", or drains. This could be related to the AIs memory, or some other attribute. Would this provide enough realism?

February 11, 2005 10:49 PM

superpig

Sure thing - but that's more getting into general AI 'suspicion' behaviour. You'd feed other stimulants into that, like sound or 'clues' (bloodstains on the floor, things that are where they shouldn't be, etc). This technique is just a way of generating those stimulants - a suspicion system's definitely a good plan, but it's not really what we're discussing, I think.

February 12, 2005 01:18 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

superpig

Author

Machine Vision

Comments

superpig

Latest Entries

members.gamedev.net

GDNet Slim

Activity streams

V5: User accounts and profiles

V5: What I've been working on recently

Service process account install gotcha

V5: Fun with MSBuild

V5: Continuous Integration and Deployment

V5 Guts: Text Sanitizer

Wheeee

Machine Vision

Comments

superpig

Latest Entries

members.gamedev.net

GDNet Slim

Activity streams

V5: User accounts and profiles

V5: What I've been working on recently

Service process account install gotcha

V5: Fun with MSBuild

V5: Continuous Integration and Deployment

V5 Guts: Text Sanitizer

Wheeee

Reticulating splines