Jump to content

  • Log In with Google      Sign In   
  • Create Account

Any fast voxel rendering algorithms?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
18 replies to this topic

#1 usbman3   Members   -  Reputation: 101

Like
1Likes
Like

Posted 11 March 2011 - 01:39 PM

I have a 3D array of voxels, which represents a landscape (sort of).

I wrote a simple function to render these voxels:

for each pixel on screen:
{
	calculate the direction and starting position of the ray
	Vec3 cur_pos = ray_start_pos
	Voxel voxel = Nothing
	
	for( temp=0; temp+=VOXEL_SIZE; temp<RAY_LENGTH )
	{
		voxel = get_voxel( cur_pos );
		
		if ( voxel != Nothing )
			break;

		cur_pos += ray_direction * VOXEL_SIZE
	}
	
	if ( voxel != Nothing )
		set_pixel( voxel.r, voxel.g, voxel.b );
	else
		set_pixel( background color );
}

Rendering 80x80x100 (=640k) voxels at resolution 700x444, it takes about 20 seconds to render one frame (on AMD Phenom II X6 @ 3.4 GHz).

Although I'm using just one thread for rendering, the performance is absolutely unacceptable! I need at least 25 frames per second.

And this algorithm has some other problems than performance as well:
  • Aliasing artifacts
  • Some voxels are processed twice
  • Lots of empty voxels are processed even if the ray doesn't hit anything
I have searched the internet but I didn't find any good articles about rendering voxels that don't involve octrees or cuda.

So, do you know of any good voxel rendering algorithms?

Sponsor:

#2 ericbeg   Members   -  Reputation: 108

Like
0Likes
Like

Posted 11 March 2011 - 01:53 PM

Hello,

Maybe you can use the marching cube algorithm to build a mesh from your voxels. Then you can use that mesh to render your terrain.

#3 PolyVox   Members   -  Reputation: 579

Like
1Likes
Like

Posted 11 March 2011 - 03:14 PM

Last year I published an article in Game Engine Gems about rendering volumetric landscapes, and I recently discovered that it has been made available on Google Books. I believe it's fairly easy to read, you can have a look at it here:

http://books.google....epage&q&f=false

You can find more information and source code at http://www.thermite3d.org/

#4 Sirisian   Members   -  Reputation: 1286

Like
0Likes
Like

Posted 11 March 2011 - 04:17 PM

If you target DX10/11 hardware which has enough shader power then you can just use SVO traversal algorithms. Even a naive 3DDDA grid traversal algorithm will render massive scenes. I do not recommend converting voxels to geometry. Rendering them that way is drastically slower than optimal traversal techniques since traversal techniques almost always have an increase in performance as the complexity of the scene increases. (The ray hits something sooner). I wrote this in flash's pixel bender which is basically GLSL and it ran at 60 fps easily on a 260 GTX. I had a WebGL example, but WebGL no longer supports the required shader model version for it to run. The GLSL wasn't optimal so I won't paste it, but it ran at 60 fps on my brother's 8400m card.

Oddly enough I just started writing a small example program a few weeks ago for beginners getting into voxel rendering, but I've been busy with studies. I'm just using a few lines of SlimDX and HLSL.

If you use the CPU I don't want to be "that guy", but you need to use SSE and straight assembly for the whole thing. Prefetching data and such is pretty much mandatory since:
for (int y = 0; y < 1920; ++y)
{
	for (int x = 0; x < 1080; ++x)
    {
    	// One instruction here is really 2073600 extra instructions
    }
}

Note that you should probably use condition move instructions and not branch instructions wherever possible as traversal branching code is chaotic and the predictor will guess wrong a lot! In other words you must write branchless code for the whole thing. You're in luck since modern CPUs do like billions of instructions per second!
50 billion ips / (1920 * 1080 * 60) = 401 instructions!

(I'm rushed I might edit this later)

#5 radioteeth   Members   -  Reputation: 459

Like
0Likes
Like

Posted 11 March 2011 - 05:28 PM

Incorporate the use of octrees, so that when your ray encounters a big empty node it calculates the position it would exit the node, so that it skips that whole empty space, and then continues. The whole point is that every empty node leading up to a solid voxel will be how your ray jumps through empty spaces, instead of slowly stepping through them one voxel at a time.

#6 leopardpm   Members   -  Reputation: 148

Like
0Likes
Like

Posted 13 March 2011 - 11:11 AM

Last year I published an article in Game Engine Gems about rendering volumetric landscapes, and I recently discovered that it has been made available on Google Books. I believe it's fairly easy to read, you can have a look at it here:

http://books.google....epage&q&f=false

You can find more information and source code at http://www.thermite3d.org/


Hi PolyVox! I have been following your Thermite site and am interested in discovering more... I will post some questions in a few days. Nice to find you here!


If you target DX10/11 hardware which has enough shader power then you can just use SVO traversal algorithms. Even a naive 3DDDA grid traversal algorithm will render massive scenes. I do not recommend converting voxels to geometry. Rendering them that way is drastically slower than optimal traversal techniques since traversal techniques almost always have an increase in performance as the complexity of the scene increases. (The ray hits something sooner). I wrote this in flash's pixel bender which is basically GLSL and it ran at 60 fps easily on a 260 GTX. I had a WebGL example, but WebGL no longer supports the required shader model version for it to run. The GLSL wasn't optimal so I won't paste it, but it ran at 60 fps on my brother's 8400m card.

Oddly enough I just started writing a small example program a few weeks ago for beginners getting into voxel rendering, but I've been busy with studies. I'm just using a few lines of SlimDX and HLSL.

If you use the CPU I don't want to be "that guy", but you need to use SSE and straight assembly for the whole thing. Prefetching data and such is pretty much mandatory since:

for (int y = 0; y < 1920; ++y)
{
	for (int x = 0; x < 1080; ++x)
    {
    	// One instruction here is really 2073600 extra instructions
    }
}

Note that you should probably use condition move instructions and not branch instructions wherever possible as traversal branching code is chaotic and the predictor will guess wrong a lot! In other words you must write branchless code for the whole thing. You're in luck since modern CPUs do like billions of instructions per second!
50 billion ips / (1920 * 1080 * 60) = 401 instructions!

(I'm rushed I might edit this later)


I am VERY interested in seeing your example program on how to accomplish what you are describing. I am currently rendering my voxels very similiar to the original poster, with a rigid cubic grid compression which empty space skips (not as efficient as an Octree, BUT, I am not ready to delve into the madness of raycasting through an octree!) - I am getting 7-15 fps raycasting a 320x160 window into a 256x256x128 voxel world.... faster than his, BUT, neither of us would be happy with that performance. I saw that newer hardware provides raycasting, but I have no clue as to how to access it (realizing that the hardware has like 250 tiny
simple processors - all you need to trace a ray - makes me drool).

What I decided to do, currently, is pre-render my world scenes/tiles and as well my sprites/objects in 16 rotations from a set perspective. I include the Z info so I can easily slip objects between each other for inclusion into the final scene. What this gets me is basically an Isometric engine on steriods, with 16 rotations, detailed terrain with hidden surface removal, deformable terrain, etc. Its fine for my current purposes, BUT, I would like to remove the rotation and perspective restrictions and get to full 3D with voxels - they have me totally entranced!

#7 Sirisian   Members   -  Reputation: 1286

Like
0Likes
Like

Posted 13 March 2011 - 05:20 PM

Here's the basic code http://pastebin.com/YKJb449i . It's not really optimal or coded right. The main concept though is 3DDDA. However if you're doing this for real you'll want to use chunks and stream them to the GPU and perform a 2+ level traversal. That's what my current project does basically.

Read laine's Nvidia papers on SVO traversal. A big part of making things work though is to design a voxel format that does what you want. However, once you have your rendering algorithm working you'd probably be using a deferred renderer. When you hit the final voxel you can pull back texture, normal, specular, etc really easily.

#8 glaeken   Members   -  Reputation: 294

Like
0Likes
Like

Posted 14 March 2011 - 09:59 AM

I've got a few intro tutorials on my blog on volume rendering with raycasting. Might be helpful to you.

http://graphicsrunne...e%20Ray-Casting


By the end of the set the algorithm is basically:

--Subdivide volume into chunks so we can skip the empty outside chunks.
--Render front face positions to Target0
--Render back face positions to Target1
--Draw full screen quad with Target0 and Target1 set to calculate ray position and direction
--Do front-back composite as you step through the volume.

The last step could be skipped for you if you know that you'll always have a solid surface. You could exit at the first intersection.

#9 Sirisian   Members   -  Reputation: 1286

Like
0Likes
Like

Posted 14 March 2011 - 04:03 PM

I've got a few intro tutorials on my blog on volume rendering with raycasting. Might be helpful to you.

http://graphicsrunne...e%20Ray-Casting

I don't want to seem like I'm jumping to conclusions since I just glanced over your site, but why are you doing the raycasting like that instead of just using the normal grid/SVO traversal algorithms?

You can calculate transparency exactly using normal traversal algorithms since you have the time when you enter the voxel region and when you left it for every voxel. Was this just a speed optimization that you made by assuming constant fixed time intervals?

Also if anyone is curious this is a 2 level 2DDDA traversal algorithm. I wrote it a few weeks ago and haven't had time to verify if the algorithm is optimal, but it is one of many solutions. It's interesting to point out that when programming multilevel traversal algorithms if you assume the size of the fine voxel level is large the algorithm changes to incorporate a loop whereas this isn't true if you had just say an SVO traversal since the loop adds too much cost for a 2x2x2 level.

#10 glaeken   Members   -  Reputation: 294

Like
0Likes
Like

Posted 14 March 2011 - 04:27 PM


I've got a few intro tutorials on my blog on volume rendering with raycasting. Might be helpful to you.

http://graphicsrunne...e%20Ray-Casting

I don't want to seem like I'm jumping to conclusions since I just glanced over your site, but why are you doing the raycasting like that instead of just using the normal grid/SVO traversal algorithms?

You can calculate transparency exactly using normal traversal algorithms since you have the time when you enter the voxel region and when you left it for every voxel. Was this just a speed optimization that you made by assuming constant fixed time intervals?


The tutorials are based on direct volume rendering in the pixel shader ( <= ps 3.0 ) with volume textures. This and the fact that they're an intro into DVR, is the reason it takes the approach it does and doesn't include something more advanced like an SVO.

#11 spinningcube   Members   -  Reputation: 136

Like
0Likes
Like

Posted 17 March 2011 - 06:38 AM

Have you tried Sparse Voxel Octress (SVO)?

http://www.tml.tkk.fi/~samuli/

¨Pablo
Pablo de Heras Ciechomski, PhD - Founder and CEO http://www.sciencevisuals.com

#12 SuperVGA   Members   -  Reputation: 880

Like
0Likes
Like

Posted 17 March 2011 - 09:06 AM

Doing that in real time on the cpu is not really that bad an idea.
See Ken Silvermans Voxlap page, and use 3D wave surfing. It's a very, very fast, column based approach.
I'm on the gpu, but quit both polygons and raycasting.

-that's right.

#13 spinningcube   Members   -  Reputation: 136

Like
0Likes
Like

Posted 17 March 2011 - 09:54 AM

Doing that in real time on the cpu is not really that bad an idea.
See Ken Silvermans Voxlap page, and use 3D wave surfing. It's a very, very fast, column based approach.
I'm on the gpu, but quit both polygons and raycasting.

-that's right.


WOW! That is insane. Just downloaded the source. The guy's a genius :-)

Pablo

EDIT - I can totally see where all the SIGGRAPH papers got their original source code ... ;-)
Pablo de Heras Ciechomski, PhD - Founder and CEO http://www.sciencevisuals.com

#14 SuperVGA   Members   -  Reputation: 880

Like
0Likes
Like

Posted 17 March 2011 - 02:37 PM


Doing that in real time on the cpu is not really that bad an idea.
See Ken Silvermans Voxlap page, and use 3D wave surfing. It's a very, very fast, column based approach.
I'm on the gpu, but quit both polygons and raycasting.

-that's right.


WOW! That is insane. Just downloaded the source. The guy's a genius :-)



Here's a link to a cached page of where Silverman explains the technique. (Not completely in-depth, but he answers his mail, and i think i got it, too, if you need help)
JonoF's Forum (Archived)



#15 stefanbanev   Members   -  Reputation: 98

Like
0Likes
Like

Posted 19 March 2011 - 12:43 PM

Currently, the best multi-core cpu VR ray-caster outperforms the best gpu by factor 10 (in the similar hi-end price range)
  • >Aliasing artifacts
keep adaptive sampling along ray; asses the error for each step based on current ray energy opacity/rgb contribution current etc... the logarithmic time complexity for low-opacity uniform fog or/and hit and stop high opacity cases are obtainable.
  • >Lots of empty voxels are processed even if the ray doesn't hit anything
use octree or others subdivision techniques
>I have searched the internet but I didn't find any good articles about rendering voxels that don't involve octrees or cuda.

There is none, just CUDA propaganda to make you invest in GPU, it works great for brute force SIMD friendly algorithms but apparently sucks once code path depends on data in array what is a mandatory to get a logarithmic t-complexity.

>So, do you know of any good voxel rendering algorithms?

This question is similar to "So, do you know of any good chess algorithms?" any good one is very complex and you will not find a complete description just some general ideas and concepts which are anyway a banal.

If your goal is to have fan writing your own engine then indeed you will have plenty of fun but do not expect to master a competitive engine unless you are going to invest years solely in this project...

Stefanbanev





#16 spinningcube   Members   -  Reputation: 136

Like
0Likes
Like

Posted 20 March 2011 - 06:13 AM

Currently, the best multi-core cpu VR ray-caster outperforms the best gpu by factor 10 (in the similar hi-end price range)

>I have searched the internet but I didn't find any good articles about rendering voxels that don't involve octrees or cuda.

There is none, just CUDA propaganda to make you invest in GPU, it works great for brute force SIMD friendly algorithms but apparently sucks once code path depends on data in array what is a mandatory to get a logarithmic t-complexity.


This. CUDA is great for brute force algos but as soon as you branch, need more flexibility, memory etc... Forget it.

Pablo
Pablo de Heras Ciechomski, PhD - Founder and CEO http://www.sciencevisuals.com

#17 stefanbanev   Members   -  Reputation: 98

Like
0Likes
Like

Posted 20 March 2011 - 12:09 PM


Currently, the best multi-core cpu VR ray-caster outperforms the best gpu by factor 10 (in the similar hi-end price range)

>I have searched the internet but I didn't find any good articles about rendering voxels that don't involve octrees or cuda.

There is none, just CUDA propaganda to make you invest in GPU, it works great for brute force SIMD friendly algorithms but apparently sucks once code path depends on data in array what is a mandatory to get a logarithmic t-complexity.


This. CUDA is great for brute force algos but as soon as you branch, need more flexibility, memory etc... Forget it.

Pablo


The branching does not impact performance in significant way as soon as ALL threads follow the SAME code path - means the same-single instruction for all threads; CUDA API allows divergence happen so if warp from 32 threads has even single thread going different path the speed goes down 2 times etc... Apparently such limitation makes GPU not practical for flexible general purpose ray-tracing since ray coherency is rather exception then a rule. CUDA API fits fine to massive MIMD architecture but current GPU is really good only for SIMD friendlily algos. It should be noted that memory/PCI-Ex limitations are relatively minor and not principal GPU handicaps besides it's steadily improving - THE major handicap of GPU is its SIMD architecture.


Stefan

#18 spinningcube   Members   -  Reputation: 136

Like
0Likes
Like

Posted 20 March 2011 - 06:01 PM



Currently, the best multi-core cpu VR ray-caster outperforms the best gpu by factor 10 (in the similar hi-end price range)

>I have searched the internet but I didn't find any good articles about rendering voxels that don't involve octrees or cuda.

There is none, just CUDA propaganda to make you invest in GPU, it works great for brute force SIMD friendly algorithms but apparently sucks once code path depends on data in array what is a mandatory to get a logarithmic t-complexity.


This. CUDA is great for brute force algos but as soon as you branch, need more flexibility, memory etc... Forget it.

Pablo


The branching does not impact performance in significant way as soon as ALL threads follow the SAME code path - means the same-single instruction for all threads; CUDA API allows divergence happen so if warp from 32 threads has even single thread going different path the speed goes down 2 times etc... Apparently such limitation makes GPU not practical for flexible general purpose ray-tracing since ray coherency is rather exception then a rule. CUDA API fits fine to massive MIMD architecture but current GPU is really good only for SIMD friendlily algos. It should be noted that memory/PCI-Ex limitations are relatively minor and not principal GPU handicaps besides it's steadily improving - THE major handicap of GPU is its SIMD architecture.


Stefan


No. The major bottleneck is not the SIMD architecture. The major bottleneck is inflexibility. The inflexibility in programming, loading multiple programs (having different shaders etc... requiring you to either sort them according to program or construct mega-kernels), memory inflexibility in terms of not enough, having to nanny the GPU by paging in the correct data thus stalling the entire pipeline. Basically if doesn't all fit in one program and in one memory update you are not going to use the GPU to its fullest. Then there are problems with different languages, driver versions, hardware quirks etc...

For example I am really looking forward to the new AVX instructions on the AMD and Intel (8 float instructions) and that will NOT slow down performance.

Pablo
Pablo de Heras Ciechomski, PhD - Founder and CEO http://www.sciencevisuals.com

#19 usbman3   Members   -  Reputation: 101

Like
0Likes
Like

Posted 21 March 2011 - 07:51 AM

Voxels are simple, so I though rendering them would be simple as well. Apparently not.

I don't want to spend my life on this, so I forget voxels and start doing something else instead.

Thanks for your replies, though. :)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS