[Theory] Unraveling the Unlimited Detail plausibility

Started by
166 comments, last by Ben Bowen 11 years, 10 months ago
the reasons why this project is going to flop guaranteed.

* the models you see arent unique, they are just duplications of the exact same objects

* the models all have their OWN level of detail, the only way he gets 64 atoms a millimetre is by SCALING SOME OF THEM SMALLER, the rest have SHIT detail.

* he cant paint his world uniquely like what happens in megatexture

* he cant perform csg operations, all he can do is soup yet more and more disjointed models together

* theres no way he could bake lighting at all, so the lighting all has to be dynamic and eat processing power

* this has nothing to do with voxels, you could get a similar effect just by rastering together lots of displacement mapped
models!!!
Advertisement

[quote name='zoborg' timestamp='1313243885' post='4848634']So anyone seriously interested in this should just start from the [Efficient SVO] paper or any of the other copious research that pops up from a quick google search.
That's not quite the same thing as what Chargh was pointing out, or what the title of this thread asks for though... The very first reply to the OP contains these kinds of existing research, but it would be nice to actually analyze the clues that UD have inadvertently revealed (seeing as they're so intent on being secretive...)

All UD is, is a data structure, which may well be something akin to an SVO (which is where the 'it's nothing special' point is true), but it's likely conceptually different somewhat -- having been developed by someone who has no idea what they're on about, and who started as long as 15 years ago.
[/quote]
Well, if you started 15 years ago from scratch, you'd have 15 years of experience in the topic. And it's not like you'd do that research in a complete vacuum. It's quite possible that he's invented something different, but I have no particular reason to believe that while he shows things that could definitely be done using well documented techniques.


There's been a few attempts in this thread to collect Dell's claims and actually try to analyze them and come up with possibilities. Some kind of SVO is a good guess, but if we actually investigate what he's said/shown, there's a lot of interesting clues. Chargh was pointing out that this interesting analysys has been drowned out by the 'religious' discussion about Dell being a 'scammer' vs 'marketer', UD being simple vs revolutionary, etc, etc...

For example, In bwhiting's link , you can clearly see aliasing and bad filtering in the shadows, which is likely caused by the use of shadow-mapping and a poor quality PCF filter. This leads me to believe that the shadows aren't baked in, and are actually done via a regular real-time shadow-mapping implementation, albeit in software.
[/quote]
Where do you think baked-in shadows come from? They have to be rendered sometime, and any offline shadow baking performed can be subject to similar quality issues. I'm just saying there's no way to infer from a shot that the lighting is dynamic, because any preprocess could generate the lighting in the same exact way with the same exact artifacts.

So I obviously don't know if it's baked or not, right? Well, there are several reasons to suspect this, and I prefer to take the tack that until given evidence otherwise, the simplest answer is correct.

Why do I think the shadows are baked?
1) First and foremost, the light never moves. This guy goes on and on about how magical everything else is, so why doesn't he ever mention lighting? Why doesn't he just move the light?
2) The light is top-down - the most convenient position for baked-in light and shadows because it allows for arbitrary orientation about the up axis. Why else would you choose this orientation since it makes the world so flat looking?
3) No specular. That's another reason the lighting looks terrible.
4) It fits in perfectly with the most obvious theory of the implementation.


Also, around this same part of the video, he accidentally flies though a leaf, and a near clipping-plane is revealed. If he were using regular ray-tracing/ray-casting, there'd be no need for him to implement this clipping-plane, and when combined with other other statements, this implies the traversal/projection is based on a frustum, not individual rays. Also, unlike rasterized polygons, the plane doesn't make a clean cut through the geometry, telling us something about the voxel structure and the way the clipping tests are implemented.
6038855530_3353f5d92a.jpg

[/quote]
Well, when you're ray-casting you don't need to explicitly implement a clipping plane to get that effect. You'd get that effect if you projected each ray from the near plane instead of the eye. But an irregular cut like that just suggests to me that yes, they're using voxels and raycasting and not triangle rasterization, so any discontinuities would be at voxel instead of pixel granularity.


It's this kind of analysis / reverse-engineering that's been largely downed out.

[font="arial, verdana, tahoma, sans-serif"]
The latter algorithm works for unlit geometry simply because each cell in the hierarchy can store the average color of all of the (potentially millions of) voxels it contains. But add in lighting, and there's no simple way to precompute the lighting function for all of those contained voxels. They can all have normals in different directions - there's no guarantee they're even close to one another (imagine if the cell contained a sphere - it would have a normal in every direction). You also wouldn't be able to blend surface properties such as specularity.
This doesn't mean it doesn't work, or isn't what they're doing, it just implies a big down-side (something Dell doesn't like talking about).
[/font][font="arial, verdana, tahoma, sans-serif"]For example, in current games, we might bake a 1million polygon model down to a 1000 polygon model. In doing so we bake all the missing details into texture maps. On every 1 low-poly triangle, it's textured with the data of 1000 high-poly triangles. Thanks to mip-mapping, if the model is far enough away that the low-poly triangle covers a single pixel, then the data from all 1000 of those high-poly triangles is averaged together.[/font]Yes, often this makes no sense, like you point out with normals and specularity, yet we do it anyway in current games. It causes artifacts for sure, but we still do it and so can Dell.
[/quote]
I think you're understating the potential artifacts. In their demo, a single pixel could contain ground, thousands of clumps of grass, dozens of trees, and even a few spare elephants. How do you approximate a light value for that that's good enough? We do approximations all the time in games, but we do that by throwing away perceptually unimportant details. The direction of a surface with respect to the light is something that can be approximated (e.g. normal-maps), but not if the surface is a chaotic mess. At best, your choice of normal would be arbitrary (say, up). But if they did that, you'd see noticeable lighting changes as the LoD reduces, whereas in the demo it's a continuous blend.

That's not to say dynamic lighting can't be implemented, just that they haven't demonstrated it. Off hand, if I were to attempt dynamic lighting for instanced voxels, I would probably approach it as a screen-space problem. I.e.
  1. Render the scene, but output a depth value along with each color pixel.
  2. Generate surface normals using depth gradients from adjacent pixels (with some fudge-factor to eliminate silhouette discontinuities).
  3. Perform lighting in view-space, as with typical gbuffer techniques.

To render shadows, you could do the same thing, but first render the scene depth-only from the light's perspective (with some screen-based warp to improve effective resolution). Off hand, I couldn't say how good the results of this technique would be, as generating surface normals from depth may result in a lot of noise and/or muted detail. But it is something ideally suited to a GPU implementation (which they insist they don't use for anything other than splatting the results on-screen).

But there's nothing in any of the demos to suggest they're doing this or any other form of dynamic lighting. I prefer to just take the simplest explanation: that his avoidance is intentional because he knows full well what the limitations of his technique are. They haven't shown anything that couldn't be baked-in, so I have no reason to believe they've done anything more complicated than that.
I did a little math earlier today to see what I could find out, and I'll hope you'll find the answer as interesting as I found it (or I've made a fool of myself :P).

First off, the computer he runs the demo on has 8 GB of memory, the resolution is 64 voxels per mm^3 (4*4*4), I estimate the size of the base of each block to be 1m^2, and let's assume that color is stored as a single byte (either through compression or by being palletized, which could actually even be the case). Since octrees are used, we very loosely assume that memory consumption doubles because of octree overhead for nodes, and that the shell of each block can be approximated by taking the 1m^2 base block, multiplying by 6 for each side of the new 3D-block, and then multiplying by 2 because the sides obviously aren't flat but has a rugged surface. (Yes, some are estimates, some may be high, some may be low, and some factor may be missing, but assume for now that it balances out)


8 GB = 8 * 1024 * 1024 * 1024 = 8589934592 bytes
sqrt(8589934592) = 92681 (side length in units for the entire square)
92681 / 4 / 1000 = 23 m (4 from 4x4x4, 64 voxels per mm^3, 1000 from meter)
23 * 23 = 529 m^2 blocks
529 / 6 / 2 = 44 final blocks (converting from flat 2D to 3D)
44 / 2 = 22 final blocks (compensating for the octree cost)

[size="4"]= 22 blocks (UD uses 24 blocks)


Now, there are a bunch of approximations and guesses here... but the fact that I even came within an order of magnitude of the actual 24 known models UD shows in their demo... says to me that they have indeed not made any significant progress, and even if I've made an error it apparently balances out. They might not even have made anything at all except possibly some optimizations to the SVO algorithm. Please correct me if I've made a serious mistake somewhere, but again, if my calculation would have said 2 or 200 (that would be bad for UD), it would still mean that they are flat out lying and memory consumption is most definately an issue they haven't solved, not even in the slightest.

EDIT: To clarify, this wasn't meant to show the potential of SVO memory optimizations, but rather that it is likely that UD is not using any fancy algorithms at all to mimize their memory consumption (I only assume the colors are palletized)... and that indeed, enormous memory consumption is the real reason why they only have 24 blocks, because those 24 blocks consume all 8GB of memory. This being meant to debunk their "Nono, memory is not the issue! Our artists are!"-ish statement.



[quote name='zoborg' timestamp='1313238605' post='4848610']
Agreed. There's definitely some good research being done in this area. One of the main things preventing it from becoming mainstream is that modern GPU hardware is designed to render triangles, very fast. Large voxel worlds (and ray-tracing for that matter) require non-linear memory access patterns that GPUs just weren't designed for. Any significant sea-change in how rendering is performed is going to require collaboration with the GPU vendors.

CUDA is a step in the right direction, but what we really need is some custom hardware that's good at handling intersections against large spatial databases (think texture unit, but for ray-casting). It's a shame Larrabee didn't work out, but it'll happen eventually. And it'll be a hardware vendor to do it, not some upstart with a magical new algorithm they can't describe or even show working well.


This reminds me of a question I have on the subject of hardware and ray casting. Isn't the new AMD Fusion chip what you describe? The GPU and CPU have shared memory with the GPU being programmable in a C++ like way, if I'm not mistaken.
[/quote]

Yes, though we'll have to wait to see if it yet approaches the level of practical. But ray-tracing (voxels or otherwise) is bound by memory accesses just as much (if not more-so) than processor speed and quantity.

The basic problem is O(N*K), where N is the number of pixels on screen, and K is average cost of intersecting a ray with the world. Ideally, K is log(M), where M is number of objects in the world. A spatial hierarchy such as an octree provides such a search algorithm.

However, the larger the database, the more spread out the results in memory. In a naive implementation, each ray through each pixel could incur multiple cache misses as it traverses nodes through the tree. This effect gets even worse as you increase the data size such that it exceeds memory and has to be streamed off-disk (or even from the internet). (BTW, this is another issue UD conveniently sidesteps - there is so little unique content it easily fits in a small amount of memory).

This can be improved by using more intelligent data structures that are structured for coherent memory accesses (a rather huge topic in and of itself). But that alone is not enough. No matter how the data is structured, you will still have loads of cache misses (unless your whole world manages to fit into just your cache memory). You need some way to hide the cost of those misses.

On a modern GPU, cache misses are a common occurrence (to the frame-buffer, texture units, vertex units, etc). It cleverly hides the cost of most of these misses by queuing up the reads and writes from a massive number of parallel threads. For instance, the pixel shader unit may be running a shader program for hundreds of pixel quads at a time. Each pixel unit cycle, the same instruction is processed for each in-flight quad. If that instruction happens to be a texture read, all the reads from all those hundreds of quad threads will be batched up for processing by the texture unit. Then hopefully, by the time the read results are needed in the next cycle or few, they'll already be in the cache execution can continue immediately.

This latency-hiding is critical for the speed of modern GPUs. Memory latency doesn't go down very much compared to processing speed or bandwidth increases. In fact, in relative cycle terms, cache miss penalties have only increased over the last decade (or longer).

To get comparable performance from ray-tracing (and ray-traced voxels), we'll need a similar method of latency hiding. With a general purpose collection of cores, you can do a whole lot of this work in software. But current PC cores are designed more for flexible bullet-proof caching than for massively parallel designs. This is why GPUs still blow CPUs out of the water for any algorithm that can be directly adapted to a gather (as opposed to scatter) approach.

To my knowledge, AMD's Fusion just combines the CPU and GPU cores onto a single chip, but the two are still separate. That has the potential to greatly improve memory latency for certain things (such as texture update as mentioned by Carmack), and reductions in chip sizes and costs. But as long as the main latency-hiding hardware is still fixed-function designed for things like 2D/3D texture accesses, we can't optimally implement latency-hiding for custom non-linear things, such as ray collision searches. But all these changes designed to make GPUs more general-purpose get us closer to the goal.

I did a little math earlier today to see what I could find out, and I'll hope you'll find the answer as interesting as I found it (or I've made a fool of myself :P).

First off, the computer he runs the demo on has 8 GB of memory, the resolution is 64 voxels per mm^3 (4*4*4), I estimate the size of the base of each block to be 1m^2, and let's assume that color is stored as a single byte (either through compression or by being palletized, which could actually even be the case). Since octrees are used, we very loosely assume that memory consumption doubles because of octree overhead for nodes, and that the shell of each block can be approximated by taking the 1m^2 base block, multiplying by 6 for each side of the new 3D-block, and then multiplying by 2 because the sides obviously aren't flat but has a rugged surface. (Yes, some are estimates, some may be high, some may be low, and some factor may be missing, but assume for now that it balances out)


8 GB = 8 * 1024 * 1024 * 1024 = 8589934592 bytes
sqrt(8589934592) = 92681 (^2) (side length in units for the entire square)
92681 / 4 / 1000 = 23 m^2 (4 from 4x4x4, 64 voxels per mm^3, 1000 from meter)
23 * 23 = 529 m^2 blocks
529 / 6 / 2 = 44 final blocks (converting from flat 2D to 3D)
44 / 2 = 22 final blocks (compensating for the octree cost)

[size="4"]= 22 blocks (UD uses 24 blocks)


Now, there are a bunch of approximations and guesses here... but the fact that I even came within an order of magnitude of the actual 24 known models UD shows in their demo... says to me that they have indeed not made any significant progress, and even if I've made an error it apparently balances out. They might not even have made anything at all except possibly some optimizations to the SVO algorithm. Please correct me if I've made a serious mistake somewhere, but again, if my calculation would have said 2 or 200 (that would be bad for UD), it would still mean that they are flat out lying and memory consumption is most definately an issue they haven't solved, not even in the slightest.


I'm not saying you're incorrect, but it's possible to do quite a lot better than that once you take into account recursive instancing.

Say you're right about each block of land being 1 meter on a side. If you were to fully populate the tree at that granularity, you'd get those results (or similar since it's an estimate). But now, imagine instead of fully populating the tree, you create a group of 100 of those blocks 10 meters on a side, then instance that over the entire world. Your tree just references that block of 100 ground plots rather than duplicating them. So now you've reduced the size requirement by approximately 100.

There's no limit to how far you can take this. The Sierpinski's pyramid is an excellent example of this - you can describe that whole world to an arbitrary size with a simple recursive function. The only unique data storage required for that demo is the model of the pink monster thingy.

As someone mentioned earlier, the storage requirement is more appropriately measured by the entropy of the world (how much unique stuff there is, including relative placement). The repetitive nature of the demo suggests very little of that, and thus very little actual storage requirement.

I'm not saying you're incorrect, but it's possible to do quite a lot better than that once you take into account recursive instancing.

Say you're right about each block of land being 1 meter on a side. If you were to fully populate the tree at that granularity, you'd get those results (or similar since it's an estimate). But now, imagine instead of fully populating the tree, you create a group of 100 of those blocks 10 meters on a side, then instance that over the entire world. Your tree just references that block of 100 ground plots rather than duplicating them. So now you've reduced the size requirement by approximately 100.

There's no limit to how far you can take this. The Sierpinski's pyramid is an excellent example of this - you can describe that whole world to an arbitrary size with a simple recursive function. The only unique data storage required for that demo is the model of the pink monster thingy.

As someone mentioned earlier, the storage requirement is more appropriately measured by the entropy of the world (how much unique stuff there is, including relative placement). The repetitive nature of the demo suggests very little of that, and thus very little actual storage requirement.


I'm not doubting you even one bit, what I meant to show was that with some very basic assumptions, some reasonable approximations and no real optimizations... I computed the number of blocks they could be using in their demo, and arrived at the same number of blocks that they are using in their demo. My point being, unless I've made a serious mistake, they aren't using anything fancy at all... like I mention, for all we know, they might even be using an 8-bit palette for the blocks. If I would have arrived at 2, then yeah, they would have used some fancy algoritms, but that memory consumption most likely is the actual reason they aren't showing more unique blocks.



I'm not disagreeing with you one bit, what I meant to show was that with some very basic assumptions, some reasonable approximations and no real optimizations... I arrived at the same number of blocks that they are using in their demo. My point being, unless I've made a serious mistake, they aren't using anything fancy at all... like I mention, for all we know, they might even be using an 8-bit palette for the blocks. If I would have arrived at 2, then yeah, they would have used some fancy algoritms, but sure enough, memory would still be a major issue regardless of what they say.


OK, then sorry for the misunderstanding. I do agree that there's no particular reason to assume they're doing anything fancy with compression. Likewise, if someone were to show me a ray-traced sphere above an infinite checkerboard plane, I wouldn't think "Wow! How did they manage to to store an infinite texture in finite memory?!"
Also, all this talk of instancing and compression and unlimited detail are just aspects of procedural content generation.

It's just a question of degree:
  1. A repeated texture. That's an incredibly simple function that's both obvious and boring, but it has unlimited detail (at least in the respect they're using the term, which is up to the precision constraints of the rendering system).
  2. A fractal image or environment. This function can be arbitrarily complex and the results can be spectacular. You just have very little input into the final results.
  3. Guided procedural content. The simplest example of this is just instancing. But it can be quite a bit more sophisticated, such as composing environments out of recursive functions in a 4k demo.
  4. Fully unique artist-modeled (or scanned) textures and environments, but with discretionary reuse of assets to save time and memory.
Procedural content saves us time and memory allowing us to make things that wouldn't otherwise be possible. But the drawback is loss of control - you get what the procedure gives you. If that's a tiled texture, or a fractal, or a huge environment of repetitive chunks of land, you just have to live with it. Or write a new procedure closer to what you want. Or add new content which consumes precious development time and hardware resources (thus making the content decidedly limited).

Again I want to point out this interview with Carmack, because I feel I'm just parroting him at this point. To paraphrase, "with proceduralism you get something, just not necessarily what you want."
its a hacked together piece of crud of an environment, and i dont see it getting much better, and it just makes me want to use a true unique world (like atomontage), with the storage/scale problem, instead of this repetitive crap.

its unlimited repetition, not unlimited detail.

the reasons why this project is going to flop guaranteed.

[1]* the models you see arent unique, they are just duplications of the exact same objects

[2]* the models all have their OWN level of detail, the only way he gets 64 atoms a millimetre is by SCALING SOME OF THEM SMALLER, the rest have SHIT detail.

[3]* he cant paint his world uniquely like what happens in megatexture

[4]* he cant perform csg operations, all he can do is soup yet more and more disjointed models together

[5]* theres no way he could bake lighting at all, so the lighting all has to be dynamic and eat processing power

[6]* this has nothing to do with voxels, you could get a similar effect just by rastering together lots of displacement mapped
models!!!

1) This is mentioned by Dell that they resorted to scanning objects in to get content for the video. Is this for saving memory via instancing? I personally can't tell. I mean they could have loaded a sponza model in to show things off.
2) Not sure what you mean. Some of the objects are polygon modeled and some are scanned in which utilize the full 64 atoms per cubed mm.
3) That's an assumption. Remember most of this is just surface detail. Meaning there is no data stored for the inside of the models. This brings us to your next complaint.
4) He mentioned that which is why he said he would like to work together with Atomontage. However, that's not saying implementing CSG is impossible via their model format. They just said it's not their goal.
5) A lot of engines choose to do dynamic lighting via SSAO along with other techniques (like crytek's radiosity). However, if they did bake the lighting into the models people would flip this around on them and go "they can't do dynamic lighting. It's all baked in" so it's a catch-22 unless they can do both really. (They didn't even say if they could bake the lighting).
6) Probably. You need DX11 for that to run well though. This technology is rendering the same detailed grooves a POM/QDM/Tesselation renderer would be doing except it's running on the CPU. QDM would probably run at the same performance though on the CPU as these effects, but the others require some serious hardware support.


[calculations]

lol, you did pretty much identical calculations I did a while when I saw the video. Yeah that's a pretty good approximation for the amount of data in a lossless format. Compression and streaming the data in are probably where their method will excel.

its a hacked together piece of crud of an environment, and i dont see it getting much better, and it just makes me want to use a true unique world (like atomontage), with the storage/scale problem, instead of this repetitive crap.

its unlimited repetition, not unlimited detail.

You don't think their GPU implementation will be much better than a 15-20 fps CPU version? That's kind of pessimistic. I mean the shading alone on the GPU will open up most every deferred/forward rendering post-processing effect. It's just a different way to populate the g-buffers. HDR alone would probably help along with demoing specular objects. The reason for the repetition at the moment is mostly just speculation.

The problem I see with atomontage is that his effect for rendering voxels when he lacks the detail is to blur them. This ends up looking really bad even in his newer videos. The UD system even when they went very close to objects has a very nice interpolation.

This topic is closed to new replies.

Advertisement