Has this ever been done? Image processing...

Started by
13 comments, last by ibebrett 14 years, 10 months ago
Hi guys, does anyone know what the current research status is on taking camera input and removing any objects that dont move from the scene, i.e. if there is a motionless room removing everything but the person walking in it, keeping him/her when standing still offcourse. Anybody been working in this area or was confronted with it? -CProgrammer
Advertisement
I'm not familiar with any particular work in this area, but that doesn't mean much as I've spent virtually zero time thinking about machine vision [wink]

Just removing non-motion objects is pretty trivial: compare two images; if a given pixel value in the first image is within a few percent of its value in the second image, then you mark the pixel as non-movement and draw it as black or whatever.

The tricky part would be to keep a person visible after they stop moving. This would require some actual object-recognition and AFAIK that technology is still fairly limited.

You might be able to get away with some kind of hack, though: if a pixel was marked as "moving" at any time in the last, say, 5 frames, and suddenly it goes to "non-moving", then you keep it visible for a while and assume that it was something moving that is now standing still. Resolution and accuracy may be problems, but there should be ways around that (reveal a circle around each displayed pixel instead of just that pixel; adjust the threshold function to handle noise in the video stream; etc. etc.).

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

You can do as ApochPiQ stated, or you can compare each frame to the original frame without the person/actor, and remove all the similar pixels. This will retain the actor, whether they move or not; just don't let the actor wear anything colored similar to the background. This is the simplest way to perform digital/pixel green screening.
Quote:Original post by ApochPiQ
Just removing non-motion objects is pretty trivial: compare two images; if a given pixel value in the first image is within a few percent of its value in the second image, then you mark the pixel as non-movement and draw it as black or whatever.


This works only for trivial synthetic images. As soon as any kind of transformations are applied, it fails. Just the noise introduced by either RGB-YUV transformation, let alone video compression makes this next to impossible, since compression can distort individual values by 10% or more (8-bit channer will have variations of 20 or more, mostly due to YUV quantization of Cr and Cb channels).

Usually it's simpler to work in feature space. Extract some useful features such as edges, or corner points, and compare those. This way you can build a space determine by those features, which makes it simpler to compare them. For trivial algorithms, various statistical techniques are then used to determine changes.

Various MPEG algorithms tackle this problem in generic, but not necessarily semantically useful manner. You get either motion vectors or general transformation, but they cannot be directly classied.


Quote:Resolution and accuracy may be problems, but there should be ways around that (reveal a circle around each displayed pixel instead of just that pixel; adjust the threshold function to handle noise in the video stream; etc. etc.).


I've tried doing correlation using RMS comparison by correlating sections of images, but for non-ideal streams it fails miserably. Another problem with real video is that you will need sub-pixel accuracy, or jitter becomes problematic.

Once you have that, you can work out afine transforms, and if you are daring enough, use them to reconstruct camera parameters (reverse view transform). But math here gets complicated fast, since solutions will not be accurate, and need to be fitted, they may also contain incorrect readings. But when it works, you get full "model" space representation of the area.

IMHO, the easiest way is to find some trivial convolution filter for edge detection, build reference points, correlate those and use them to split the image into polygons. Then do intra-polygon comparisons. This is quick and simple, nVidia was even demoing some GPU accelerate classification at some point, and IIRC, can be even done in real-time. This works better with video, since you can use optimistic algorithms, and expect that two frames will be very similar.

Once you have that, you can then start classifying different regions.

But it's been several years since I've dealt with the topic, so I imagine a lot of new things popped up. I actually used it for scene reconstruction - shoot the scene with camera, and out pops the 3D model. Last I checked on the area, a lot of progress has been made with respect to it as well.

Edit: Come to think of it, I last worked with this about 8 years ago. How time flies, most of the above is therefore very dated advice.
Quote:Original post by CProgrammer
does anyone know what the current research status is on taking camera input and removing any objects that dont move from the scene, i.e. if there is a motionless room removing everything but the person walking in it, keeping him/her when standing still offcourse.


I'm probably skirting the edge of NDA breakage, but yes this is doable with a live stream of data.

Granted, I can't say how, but consider this a push in the direction of 'it isn't impossible on hardware around today' [smile]
There was a video of people using some sort of entropy encoding technology (same tech people use to encode video) to do video painting, resurfacing and erasing in realtime ( and it tracked movement too), that's probably the state of the art. I couldn't find the video again but its on youtube.

It relies on tracking pixels in higher dimensional space and an energy minimizing function of some sort, but I could be wrong, there was also a link to the paper.

Good Luck!

-ddn
Quote:Just removing non-motion objects is pretty trivial: compare two images; if a given pixel value in the first image is within a few percent of its value in the second image, then you mark the pixel as non-movement and draw it as black or whatever.


The first (reference) image needs to be an image without anything in it, (ie a blank wall). This way you always see the motion of the person.

Also, using the blank wall, you need to compare a series of images to find the noise level of the camera, and apply this value to ignore noise 'spikes'. Everything about the noise spike is probably a pixel of a moving object.

The change in light is a tough issue that you will have to deal with. For example, if the wall is near a window, the position of the sun could mess up the capture, as it's position changes, compared to the reference image. Same goes for lights turned on/off in the room.

Check out Super Play, the SNES inspired Game Engine: http://www.superplay.info

Quote:Original post by cdoty
The first (reference) image needs to be an image without anything in it, (ie a blank wall). This way you always see the motion of the person.

Also, using the blank wall, you need to compare a series of images to find the noise level of the camera, and apply this value to ignore noise 'spikes'. Everything about the noise spike is probably a pixel of a moving object.

The change in light is a tough issue that you will have to deal with. For example, if the wall is near a window, the position of the sun could mess up the capture, as it's position changes, compared to the reference image. Same goes for lights turned on/off in the room.



I specifically avoided the idea of a static "empty" reference image precisely because it would be trivial to break the algorithm by simply changing the light levels in the scene.

However, as Antheus noted, my suggestion has plenty of other problems, so... [smile]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Doesn't do the Xbox 360 game "You're in the movies" do something like that? I've never played it, but from what I've read it seems to do what you described.

Ok, that doesn't help you in any way when it comes to how to do it, but I thought I should mention it.
scale the image down, apply a gaussian convolution map and compare pixel values. Try to track moving pixels, and calculate both their speed and their acceleration and have a 'calculated new position' for each group of pixel, so you can see if an object is moving.

This topic is closed to new replies.

Advertisement