12 replies to this topic

Posted 03 September 2012 - 12:35 AM

I think that the perfect algorithm for rendering geometry would have a few certain characteristics -

-detail on a per-pixel level (naturally because you can't have more than that!)

-no wasted information (anything that is outside the view frustum or covered by another object will never be calculated)

If you meet these two conditions, you should have exactly one vertex for every screen pixel.

In theory this would mean the rendering would take roughly the same amount of time regardless of your view point or how many objects you are rendering.

So I was curious if this has been done? I'm certain it can be done, just not yet in realtime.

Also I'd like to post my idea on how it could be done. Keep in mind I'm no expert and this is just a general idea.

And if anything is incorrect please forgive me.

You start by cycling through each pixel on the screen. For each pixel [x, y] you find the point in 3D space that lies within the near clipping plane and matches the pixel's coordinates when projected into 2D. We will call this point "a". There are an infinite number of 3D points that match up with a given screen pixel, but only one that is in the near clipping plane.

The 3D point inside the near clipping plane that matches the 2D pixel will be your starting point.

From here there is a simple way to define all 3D points within the camera's frustum that would match up in 2D screen space with the pixel you are currently on.

You have a starting point, now you need a direction to go in and a distance to travel to come up with a line segment that contains all possible points that match the pixel you are on.

The direction to go in is simply Normalize(a - CamPosition)

The distance to travel = far clipping plane - near clipping plane.

Now you can find the 3D point on the far clipping plane that corresponds to the 2D pixel you are on. We will call this point "b"

b = a + Direction * (farPlane - nearPlane)

Now draw a line from a to b and you pass through all points in the frustum that match the 2D pixel.

Somewhere along this line segment is the 3D point you need for calculating the current pixel.

So basically you have to find the first intersection of line segment (a, b) and the geometry, starting with a and moving towards b.

This is pretty tricky, but I have one idea on how to do it if you are rendering a height field or using a noise function.

You make a point "p" and start it out at a and move in a certain interval at a time towards b (similar to "ray marching"). You will start out using a large interval.

Every time you move p forward towards b, you check the height at that x and z of the point p. Something like height = getHeight(p.X, p.Z);

If the height returned is larger than the height of p, then you have found an intersection, else keep moving forward.

Once you find an intersection, set p back one jump and decrease the interval you move at and start again from there, marching towards b and looking for an intersection.

Repeat this a few times, depending on how much precision you want in finding the intersection of the line segment and the geometry.

Now you will need a normal for this vertex, but this form of rendering doesn't use triangles or any sort of polygons at all.

There are many solutions for this, and you end up with a lot of flexibility for how much precision you want your normals to have.

Here is a quick and simple solution though - Create a triangle centered around the vertex that needs a normal. Get the height of the 3 vertices of this triangle and then calculate its normal.

With this type of algorithm, there would be no poly budget or any problems with creating curves.

The distribution of vertices would always be uniform thoughout the screen. There are too many advantages to name.

It certainly sounds too slow for now, but I can see it being possible in the future because each pixel can be calculated independently of every other pixel, making it ideal for parallel computing.

I think if you had a GPU that could work on every pixel simultaneously you would be set.

-detail on a per-pixel level (naturally because you can't have more than that!)

-no wasted information (anything that is outside the view frustum or covered by another object will never be calculated)

If you meet these two conditions, you should have exactly one vertex for every screen pixel.

In theory this would mean the rendering would take roughly the same amount of time regardless of your view point or how many objects you are rendering.

So I was curious if this has been done? I'm certain it can be done, just not yet in realtime.

Also I'd like to post my idea on how it could be done. Keep in mind I'm no expert and this is just a general idea.

And if anything is incorrect please forgive me.

You start by cycling through each pixel on the screen. For each pixel [x, y] you find the point in 3D space that lies within the near clipping plane and matches the pixel's coordinates when projected into 2D. We will call this point "a". There are an infinite number of 3D points that match up with a given screen pixel, but only one that is in the near clipping plane.

The 3D point inside the near clipping plane that matches the 2D pixel will be your starting point.

From here there is a simple way to define all 3D points within the camera's frustum that would match up in 2D screen space with the pixel you are currently on.

You have a starting point, now you need a direction to go in and a distance to travel to come up with a line segment that contains all possible points that match the pixel you are on.

The direction to go in is simply Normalize(a - CamPosition)

The distance to travel = far clipping plane - near clipping plane.

Now you can find the 3D point on the far clipping plane that corresponds to the 2D pixel you are on. We will call this point "b"

b = a + Direction * (farPlane - nearPlane)

Now draw a line from a to b and you pass through all points in the frustum that match the 2D pixel.

Somewhere along this line segment is the 3D point you need for calculating the current pixel.

So basically you have to find the first intersection of line segment (a, b) and the geometry, starting with a and moving towards b.

This is pretty tricky, but I have one idea on how to do it if you are rendering a height field or using a noise function.

You make a point "p" and start it out at a and move in a certain interval at a time towards b (similar to "ray marching"). You will start out using a large interval.

Every time you move p forward towards b, you check the height at that x and z of the point p. Something like height = getHeight(p.X, p.Z);

If the height returned is larger than the height of p, then you have found an intersection, else keep moving forward.

Once you find an intersection, set p back one jump and decrease the interval you move at and start again from there, marching towards b and looking for an intersection.

Repeat this a few times, depending on how much precision you want in finding the intersection of the line segment and the geometry.

Now you will need a normal for this vertex, but this form of rendering doesn't use triangles or any sort of polygons at all.

There are many solutions for this, and you end up with a lot of flexibility for how much precision you want your normals to have.

Here is a quick and simple solution though - Create a triangle centered around the vertex that needs a normal. Get the height of the 3 vertices of this triangle and then calculate its normal.

With this type of algorithm, there would be no poly budget or any problems with creating curves.

The distribution of vertices would always be uniform thoughout the screen. There are too many advantages to name.

It certainly sounds too slow for now, but I can see it being possible in the future because each pixel can be calculated independently of every other pixel, making it ideal for parallel computing.

I think if you had a GPU that could work on every pixel simultaneously you would be set.

Posted 03 September 2012 - 12:50 AM

Actually, if you want to produce an image without aliasing, you need to evaluate multiple geometry samples per pixel.-detail on a per-pixel level (naturally because you can't have more than that!)

e.g. if the edge of some geometry cut a 2D line through a pixel's bounding box, so that only 40% of the pixel is filled with that geometry, then to produce a correct image you need to find out which geometry fills the other 60% of the pixel.

This is called ray-tracing - computerized versions of the idea were described in the 60's and has been researched to the current day.You start by cycling through each pixel on the screen. For each pixel [x, y] you find the point in 3D space that lies within the near clipping plane and matches the pixel's coordinates when projected into 2D. We will call this point "a".

Now you can find the 3D point on the far clipping plane that corresponds to the 2D pixel you are on. We will call this point "b".

Now draw a line from a to b and you pass through all points in the frustum that match the 2D pixel.

Somewhere along this line segment is the 3D point you need for calculating the current pixel.

This form of rendering doesn't [have to] use triangles or any sort of polygons at all ... there would be no poly budget or any problems with creating curves

The key to fast ray-tracing is using a very fast method for determining the "which piece of geometry does this ray collide with" question.In theory this would mean the rendering would take roughly the same amount of time regardless of your view point or how many objects you are rendering.

The above quote is assuming that your ray-collision algorithm is not all all affected by the amount of objects in the scene or visible to the camera (

e.g. In films, ray-tracing

**Edited by Hodgman, 03 September 2012 - 12:56 AM.**

Posted 03 September 2012 - 12:51 AM

All of this can be done using traditional ray tracing with a suitably high degree of geometry tessellation or by using analytic curved surfaces. You also need to consider the format for the geometry to be rendered - do you use triangles, or just height fields? Can anyone make anything useful to be rendered using your geometry representation? Triangles are so popular because they are very good at representing all kinds of surfaces when you have enough of them.

Posted 03 September 2012 - 02:12 AM

To expand on this, the speed of traditional ray-tracing is dependent on screen resolution by a linear factor (increase render width by two, work needed increased by two) by definition and the relationship to scene complexity depends on the underlying acceleration structure used to make collision tests faster. Existing algorithms (octrees, kd-trees, BVH's) have brought down this cost to O(log(n)) in the number of primitives in the scene, and that's the best you can do in the general case. Thought you also have to take into account how efficient it is to *create* the acceleration structure, which is often not a concern for static scenes provided it is reasonable, but is a problem when you have dynamic objects.The key to fast ray-tracing is using a very fast method for determining the "which piece of geometry does this ray collide with" question.

The above quote is assuming that your ray-collision algorithm is not all all affected by the amount of objects in the scene or visible to the camera (that is to say, collision has O(1) complexity with regards to scene size). This is usually not the case, although the snake-oil salesmen at "Unlimited Detail" do make such a claim.

Compare this to rasterization, which is O(1) with respect to scene size, because rasterization is done in normalized coordinates and can be interpolated while preserving correctness (the reason higher resolutions are more expensive is not really rasterization but pixel shading, i.e. drawing all the pixels), and O(n) in scene complexity, as each triangle fed to the pipeline has to be rasterized somehow before it can be culled by falling off-screen, which isn't great. But on the plus side, you don't need any kind of preprocessing for rasterization, you just feed all the triangles, in any order, and it just works.

Using your heightfield approach, the same complexity issue arises: the more accurate the heightfield, the more "ray-marching steps" you'll need to do to actually reach that accuracy - and that's about a log(n) relationship if you use proper bisection to converge on the intersection point (but I feel your method is not foolproof and generates artifacts when the heightfield is not well-behaved).

Also, with regard to "Unlimited Detail" claims, let me just make it clear right here: it is impossible (unless you feel like contradicting information theory) to render a general scene, be it voxels, triangles, or atoms, in O(1) time with respect to the number of voxels, triangles, or atoms. The absolute best you can do is O(log(n)), and it can be proven. Even if UD can pull it off with some very specific scene (perhaps using instancing, as they seem to love doing), it cannot be done in the general case.

You can look it up on Youtube - you'll see videos of people rendering so-called "billions" of triangles in realtime, then you're realize it's just the same model instanced dozens of times in every dimension. Try that with different models, and surprise! It doesn't work anymore.

**Edited by Bacterius, 03 September 2012 - 02:21 AM.**

*“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”*

Posted 03 September 2012 - 07:53 AM

actually hashing could accelerate raycasting to do better than log(n). such as pre-computing the visibility to some degree for certain angles.

In case we would have a bucket list for each position and each view-angle and each pixel on the screen, we could cast a ray in O(1). thats impossible of course, but gives the idea. The difficulty is: how to reduce the search space without consuming too much memory -> compressing the bucket list might also be an idea. such as moving the viewpoint often does not cause major changes, which allows a high compression ration between the frames. Its like compressing a video - but with much more dimensions. it would be sufficient to compress the depthmap though - that already would speed up a lot.

Basically a cubemap per xyz position would be sufficient to not require taking care of the view angle.

In case we would have a bucket list for each position and each view-angle and each pixel on the screen, we could cast a ray in O(1). thats impossible of course, but gives the idea. The difficulty is: how to reduce the search space without consuming too much memory -> compressing the bucket list might also be an idea. such as moving the viewpoint often does not cause major changes, which allows a high compression ration between the frames. Its like compressing a video - but with much more dimensions. it would be sufficient to compress the depthmap though - that already would speed up a lot.

Basically a cubemap per xyz position would be sufficient to not require taking care of the view angle.

**Edited by spacerat, 03 September 2012 - 08:07 AM.**

Posted 03 September 2012 - 11:13 AM

I don't see how this will be a problem. It will find an intersection with accuracy equal to how many times you want to repeat the intersection finding process.Actually, if you want to produce an image without aliasing, you need to evaluate multiple geometry samples per pixel.

e.g. if the edge of some geometry cut a 2D line through a pixel's bounding box, so that only 40% of the pixel is filled with that geometry, then to produce a correct image you need to find out which geometry fills the other 60% of the pixel.

And yes I know what ray tracing is, just not a whole lot about it. I usually hear it described as just moving in a fixed step from a to b looking for a ray-polygon intersection, so I assumed what I described was different.

I was thinking of the time it takes to find a collision being equal to the time it would take to to march from point a to point b in the smallest interval you allow it to use. But now that I think about it, if you add objects in the scene, you have to check each object per ray movement, so it would add up time. But without any objects and just rendering a height field, wouldn't there be a maximum time it could take per pixel?The above quote is assuming that your ray-collision algorithm is not all all affected by the amount of objects in the scene or visible to the camera

I found another hole in my method though. When moving along the ray at a large interval, you could potentially miss the first intersection of the geometry and find another intersection and think it is the first.

But my methods for doing this aren't really the point in this post. I honestly don't know a lot about this stuff. I just wanted to see if a "one vertex per pixel" algorithm has been made, or even something similar, because I find it very interesting.

**Edited by frobot, 03 September 2012 - 11:18 AM.**

Posted 03 September 2012 - 01:06 PM

I don't see how this will be a problem. It will find an intersection with accuracy equal to how many times you want to repeat the intersection finding process.

And yes I know what ray tracing is, just not a whole lot about it. I usually hear it described as just moving in a fixed step from a to b looking for a ray-polygon intersection, so I assumed what I described was different.

No. The point is that to get correct visual results, you may need information from more than one surface for a given pixel. Whenever the geometry has higher frequency information than your resolution, you need to do some kind of super-sampling or multi-sampling if you want the image to reflect what's actually there. For example: http://www.newscientist.com/data/images/ns/cms/dn10521/dn10521-2_650.jpg

In fact, once you realize this it's pretty obvious that there is *no* upper bound on the number of primitives that might need to be evaluated to shade one pixel.

Posted 03 September 2012 - 06:27 PM

again I see what you mean here but how is this a problem?

If a pixel is made 75% of one information source and 25% another, it would usually just use the 75%

You just lose some precision

If a pixel is made 75% of one information source and 25% another, it would usually just use the 75%

You just lose some precision

Posted 03 September 2012 - 06:44 PM

The "lost precision" is the difference between this and this. And that's just the two-dimensional case. Consider it in three dimensions, especially with perspective projection which means objects far away will need more samples by definition since you see more of the scene per pixel. Have you ever played vanilla minecraft? Move the mouse one pixel to the left, and the whole screen seems to jitter randomly, because at a distance, much more than one texel (texture pixel) of information is condensed into a single pixel, and depending on the view angle, any one texel is chosen randomly, which produces a weird color soup which is very straining on the eyes - that's what the lack of multisampling results in. If you used multisampling, you could weigh each texel according to its contribution to the pixel's color, and produce a correct pixel color which will not change just by looking at it from a slightly different angle. It adds stability to the render.If a pixel is made 75% of one information source and 25% another, it would usually just use the 75%

You just lose some precision

(though note that while ray-tracing solves this naturally by jittering the camera ray by random amounts to capture subpixel information, traditional graphics hardware does this by mipmapping, i.e. using less detailed textures at a distance, which is cheaper than full-on multisampling. unfortunately minecraft uses neither, but that's just an example).

**Edited by Bacterius, 03 September 2012 - 06:48 PM.**

Posted 05 September 2012 - 11:16 AM

So from what I understand, It might not be too bad for up close objects, but the need for more detail never stops growing with distance.

So at some distance it would become impossible to render something correctly without more information per pixel.

You could try something like taking a number of samples based on the distance.

But I'm starting to see why there's no way to actually make an algorithm that takes the same time regardless of view and amount of objects. Just too good to be true >.<

So at some distance it would become impossible to render something correctly without more information per pixel.

You could try something like taking a number of samples based on the distance.

But I'm starting to see why there's no way to actually make an algorithm that takes the same time regardless of view and amount of objects. Just too good to be true >.<

Posted 06 September 2012 - 02:51 AM

Well, if the geometry were completely static, you could bake the geometry into a buffer containing as many prerendered hash table images (images containing all the geometry needed for each pixel) as possible, rendered from as many directions on the hemisphere as possible (orthographic projection). You could than use this buffer to render the geometry from any point with any view direction in constant time (as long as the hash tables only contain 1 geometry intersection per bucket).

You'd still only have one sample per pixel though :/

Also this could only work in theory. The buffer would probably be multiple Terabytes to support rendering HD images in constant time.

You'd still only have one sample per pixel though :/

Also this could only work in theory. The buffer would probably be multiple Terabytes to support rendering HD images in constant time.

**Edited by CryZe, 06 September 2012 - 03:02 AM.**

Posted 11 September 2012 - 06:52 PM

I think that that is what Reyes renderer do in general. Micropolygon tesselation basicaly?

http://en.wikipedia.org/wiki/Reyes_rendering

http://en.wikipedia.org/wiki/Reyes_rendering