I got thinking about the actions one could track with a camera - in particular, going beyond the basic "splatting and wiping" of the Play games. I've come up with an idea, though I have no idea how one would go about implementing it.
Here's the idea. You're a poltergeist in a haunted house. You float your way around through a first-person camera; movement is done by grabbing 'bars' at the side of the picture and dragging them to rotate in that direction, or pulling/pushing to move yourself forwards/backwards.
You can pick things up in this house, quite literally. The camera tracks your hand movements as you move your hand towards a vase on a table and make a gripping motion. The vase gives off a slight glow to indicate that you have grabbed it. You lift your hand up, the vase floats up off the table. You throw it against the wall, it flies against the wall and smashes.
Now, I imagine people could have a fair amount of fun just with an empty house like that, but here's the real showstopper: people. Being a haunted house, it is naturally quite popular with the neighbourhood's local teenagers; you could get small numbers of them turning up to explore, or make out, or whatever it is teenagers do in haunted houses.
What do you do with the people? It's completely up to you. You could make it your mission to scare the living daylights out of them, sending them running from the house, screaming for their mommies. You could try and entice them to stay, if you want the company. You could try and change the social dynamics of their group - give the kid who's been bullied into coming along a chance to see his bullies reduced to quivering wrecks. You could even kill them.
And naturally, they respond to your actions as people would respond to a ghost - fear, wonder, anger, curiosity... it's determined by the character of each visitor.
One of the other things I've been considering is sound analysis. We've got the camera; what if you want to speak to the characters, or make 'wooooo' noises at them? Full-scale speech recognition is not something I'd want to include, partly because of the tech's lack of robustness and its performance requirements, but also because ghosts don't, in general, speak clearly to people; it's much more effective to have characters react to categories of sound, such as "voice," "dog barking," "telephone ringing." Say you're playing the game and your friend comes in to ask you something - the characters on the screen would be going, "Hey, you hear that? Sounds like... voices..." Basic analysis of the properties of the sound would be more achievable, I suspect - defining "telephone ringing" as sound with a certain pitch range and pattern. Mixing in a little bit of voice recognition would be great for creating real scares, though - words like "die" and "kill" and "get out" would all be picked up.
This does all, of course, sound like a bit much for the PS2. I suspect it is; even if the house isn't very large, you're still doing sound/image processing, hardcore physics, and hardcore AI each frame. Plus, the amount of content would be limited - you can only fit a certain number of vases and characters onto a PS2 disc.
Enter Le Eyetoy Alternatif - or as you probably know it, the webcam.