Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Approximation of Normals in Screen Space


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 11 February 2013 - 06:57 PM

So, I have been writing a science paper (more of a book than anything else) on the subject of computer vision, and approximating 3D scenes through still images. I have almost all the algorithms down and working, except 1 piece.

 

I need to know how I could possibly calculate the normals of a mesh through a screen space approximation. Others seem to have done this before, but I can't find a decent explanation of how. Can someone link me to a few papers, or perhaps even just explain it to me?

 

As soon as I finish this, I'll be able to finish my my paper (book).

 

 

PS:

Not entirely sure if this should go under AI, since its more graphics than computer vision.

 

Thanks for your time!


Edited by LouisCastricato, 11 February 2013 - 06:58 PM.


Sponsor:

#2 IADaveMark   Moderators   -  Reputation: 2509

Like
0Likes
Like

Posted 12 February 2013 - 03:52 PM

Yeah, I'm thinking that this isn't necessarily in the AI realm. Moving.


Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC

Professional consultant on game AI, mathematical modeling, simulation modeling
Co-advisor of the GDC AI Summit
Co-founder of the AI Game Programmers Guild
Author of the book, Behavioral Mathematics for Game AI

Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

#3 David Neubelt   Members   -  Reputation: 794

Like
1Likes
Like

Posted 13 February 2013 - 01:01 AM

Typically, you have more information then one image. 

 

If you have two images from different viewpoints then you can reconstruct depth and from depth you can integrate depth to get normal information.

 

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

 

If you have multiple images from one (the same) viewpoint and multiple known light directions then you can reconstruct the light direction using the previous technique and a least squares regression. Additionally, you can use expectation maximization or other non-linear solvers. The term to search for is photometric stereo.

 

Lastly, if you can have user input and one image and have the user pick highlights then you can try to determine the light direction and then reconstruct normals.

 

-= Dave


Edited by David Neubelt, 13 February 2013 - 01:06 AM.

Graphics Programmer - Ready At Dawn Studios

#4 PolyVox   Members   -  Reputation: 708

Like
0Likes
Like

Posted 13 February 2013 - 03:56 AM

If I understand you correctly then you can use the ddx/ddy instructions to compute a per-pixel normal in the fragment shader:

 

http://c0de517e.blogspot.nl/2008/10/normals-without-normals.html

 

I also have a code snippet here:

 

http://www.volumesoffun.com/polyvox/documentation/0.2.1/manual/Lighting.html#normal-calculation-for-cubic-meshes

 

Note that you will typically end up with a faceted appearance rather than smooth shading (which is fine for my application).



#5 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 13 February 2013 - 12:40 PM

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

 

I do like how that sounds, since one of the algorithms that I developed finds shadows within the image, and parents it to a light source. Then, from that I can find an estimated light direction.

 

Do you mind elaborating on the technique you are explaining? Perhaps provide some links 


Edited by LouisCastricato, 13 February 2013 - 12:40 PM.


#6 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 13 February 2013 - 12:44 PM

If I understand you correctly then you can use the ddx/ddy instructions to compute a per-pixel normal in the fragment shader:

 

http://c0de517e.blogspot.nl/2008/10/normals-without-normals.html

 

I also have a code snippet here:

 

http://www.volumesoffun.com/polyvox/documentation/0.2.1/manual/Lighting.html#normal-calculation-for-cubic-meshes

 

Note that you will typically end up with a faceted appearance rather than smooth shading (which is fine for my application).

 

 

I don't believe I can use this, since I need to (at least at one point during the pipeline) know the world matrix and every possible vertex within the object



#7 David Neubelt   Members   -  Reputation: 794

Like
0Likes
Like

Posted 14 February 2013 - 12:08 PM

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

 

I do like how that sounds, since one of the algorithms that I developed finds shadows within the image, and parents it to a light source. Then, from that I can find an estimated light direction.

 

Do you mind elaborating on the technique you are explaining? Perhaps provide some links 

 

I appologize, this is the worst case you can find yourself in. In general, the solution is underdetermined because a gradient has two components for a surface and you have one equation. If you can find two highlights in your image (from different light sources) then its very easy to solve. General photometric stereo techniques require at minimum two equations. Intuitively, this means the normals can take any isotropic rotation and give the same lighting intensity. For example, imagine a ball lit given an intensity of .747 any normal that has a rotation of 45 degrees from the +Z axis would satisfy this equation.

 

However, that doesn't stop an algorithm from working. Given enough ingenuity and some user input you can still solve it. There has been published algorithms that do accomplish what you are looking for but its a guided process and generates depth. From depth, it's easy to get back to normals. If you are still looking to go this way then let me know and I'll dig up the paper that does this when I get home from work.

 

Do you have any other information? If you are working with computer vision then typically you have either 3d information or at least depth?

 

-= Dave


Graphics Programmer - Ready At Dawn Studios

#8 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 14 February 2013 - 03:34 PM

 

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

 

I do like how that sounds, since one of the algorithms that I developed finds shadows within the image, and parents it to a light source. Then, from that I can find an estimated light direction.

 

Do you mind elaborating on the technique you are explaining? Perhaps provide some links 

 

I appologize, this is the worst case you can find yourself in. In general, the solution is underdetermined because a gradient has two components for a surface and you have one equation. If you can find two highlights in your image (from different light sources) then its very easy to solve. General photometric stereo techniques require at minimum two equations. Intuitively, this means the normals can take any isotropic rotation and give the same lighting intensity. For example, imagine a ball lit given an intensity of .747 any normal that has a rotation of 45 degrees from the +Z axis would satisfy this equation.

 

However, that doesn't stop an algorithm from working. Given enough ingenuity and some user input you can still solve it. There has been published algorithms that do accomplish what you are looking for but its a guided process and generates depth. From depth, it's easy to get back to normals. If you are still looking to go this way then let me know and I'll dig up the paper that does this when I get home from work.

 

Do you have any other information? If you are working with computer vision then typically you have either 3d information or at least depth?

 

-= Dave

 

I have no form of depth information, or 3D scene information/

 

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

 

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest   

 

PS: For the time being, lets pretend performance doesn't matter

 

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X


Edited by LouisCastricato, 14 February 2013 - 03:41 PM.


#9 David Neubelt   Members   -  Reputation: 794

Like
0Likes
Like

Posted 14 February 2013 - 04:16 PM

I have no form of depth information, or 3D scene information/

 

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

 

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest   

 

PS: For the time being, lets pretend performance doesn't matter

 

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X

 

If you have a mobile device then it can record video. Video can w/out a doubt reconstruct 3d surfaces. Let me know if you're interested in this or if you want to stick with the static one picture approach.

 

-= Dave


Graphics Programmer - Ready At Dawn Studios

#10 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 14 February 2013 - 05:07 PM

I have no form of depth information, or 3D scene information/

 

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

 

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest   

 

PS: For the time being, lets pretend performance doesn't matter

 

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X

 

If you have a mobile device then it can record video. Video can w/out a doubt reconstruct 3d surfaces. Let me know if you're interested in this or if you want to stick with the static one picture approach.

 

-= Dave

 

I think I wanna stay with the static approach, since I wouldn't have much of a science paper if I didn't (Mainly because, I wanna do something new, and extremely challenging)


Edited by LouisCastricato, 14 February 2013 - 05:21 PM.


#11 David Neubelt   Members   -  Reputation: 794

Like
0Likes
Like

Posted 16 February 2013 - 11:14 PM

I think I wanna stay with the static approach, since I wouldn't have much of a science paper if I didn't (Mainly because, I wanna do something new, and extremely challenging)

 

 

http://www.cse.ust.hk/~pang/papers/ID0225.pdf

 

It's a guided approach but it gives a baseline of this style of work. Honestly though multiple view points or multiple lighting setups can uniquely determine the solution.


Edited by David Neubelt, 16 February 2013 - 11:15 PM.

Graphics Programmer - Ready At Dawn Studios

#12 Sik_the_hedgehog   Crossbones+   -  Reputation: 1806

Like
0Likes
Like

Posted 17 February 2013 - 12:25 AM

Um, I can see from where the video suggestion comes. One of the methods mentioned requires multiple images, and an user who's holding a phone is not going to have a steady aim (unlike e.g. a tripod), especially not when pressing the button. So instead of taking a photo, you could take a few consecutive frames of video and use them for the algorithm. The user would still probably think it's just like taking a pic since the amount of time is very short =P

 

Alternatively you could take e.g. a pic when the user presses the button and a pic when the user releases it. Both pics would be from different viewpoints and could achieve the same result.


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#13 LouisCastricato   Members   -  Reputation: 128

Like
0Likes
Like

Posted 25 February 2013 - 01:05 PM


I think I wanna stay with the static approach, since I wouldn't have much of a science paper if I didn't (Mainly because, I wanna do something new, and extremely challenging)



http://www.cse.ust.hk/~pang/papers/ID0225.pdf

It's a guided approach but it gives a baseline of this style of work. Honestly though multiple view points or multiple lighting setups can uniquely determine the solution.

Thanks! That really helped. Based off my current system, I can do the method described in the paper without user input (besides the picture)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS