Sign in to follow this  

OpenGL Raytracing for dummies

This topic is 3376 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Fellow programmers, With all that talking about Global Illumination, guys like Vlad woke up my interrest to write a (simple) raytracer. I believe raytracing will take over some day, so I'd better be prepared :) My learning curve is ussually pretty long, and I don't have much free time (especially not with my other "normally rendered" hobby project), so the sooner I start, the better. Besides, I think its fun and very refreshing to do something 'different'. I know the basic principles of ray-tracing, and especially the (dis)advantages when comparing to the rasterization methods we use every day. But I wouldn't know how/where to start with RT. So, I made a list of questions. I must add that I'm looking for practical (read fast) implementations. I'm not looking for the cutting-edge or highest quality graphics. Let's say I'd like to program games, not graphics allone. In other words, the technique should be able to produce realtime results in the next ~4 years, suitable for a game. 1.- So... which technique is the most practical(fastest) for realtime/games? I've read a little bit ray-tracing and photon mapping. Are these 2 different things? 2.- Lights. From what I read so far I shoot rays from my eyes. They collide somewhere, but at that point we don't know yet if that point is litten yes/no. How to find that out? Shoot a ray from that point to all possible lightsources? And how about indirect lighting then? I could do it reversed, starting at the lights, but then there is not telling if its rays ever reach the camera. 3.- Does a RT still need OpenGL/DirectX/shaders ? I guess you can combine both (for example, render a scene normally, and add special effects such as GI/Caustics/Reflections via RT). What is used in common?' I can imagine a shader is used on top to smooth/blur the somewhat noisy results of a RT produced screen. 4.- How does RT access textures? I suppose you can use diffuse textures and normal/specular/gloss Maps just as well. You just access them via the RAM and eventually write your own filtering method? If that is true, it would mean you have lots of 'texture memory' and can directly change a texture as well (draw on it for example). 5.- Ray tracing has lots to do with collision detection. Now this is the part where I'm getting scared since my math is not very well. I wrote octrees and several collision detection functions, but I can't imagine them fast enough to run millions of rays... I mean 800x600 pixels = 480.000 rays. And that number multiplies if I want reflections/refractions(and we most certainly want that!). Do I underestemate the power of the CPU('s), do I count way too much rays, or is it indeed true that VERY OPTIMIZED algorithms are required here? 6.- Overall, how difficult is writing a RT? Writing a basic OpenGL program is simple, but implementating billions of effects with all kind of crazy tricks (FBO's, shaders for each and every material type, alpha blending, (cascaded) shadowMaps, mirroring, cubeMaps, probes, @#%$#@$) is difficult as well. At least, it takes a long time before you know all of them. Shoot me if I'm wrong, but I think a raytracer is "smaller"/simpler because all the effects you can achieve are done in the same way, based on relative simple physic laws. On the other hand, if you want to write a fast RT, you need to know your optimizations very well. Lousy programming leads to unacceptable slow rendering I guess. Although this was probably also true with rasterization when writing Quake 1. As the hardware speeds up, the tolerance for "bad programming" grows. But at this point, would you say writing a Raytracer is more difficult than a normal renderer(with all the special effects used nowadays) 7.- I'm not planning to use RT for big projects anywhere soon. I just like to play around for now. But nevertheless, what can I expect in the nearby future(5 years)? I think some of the RayTracers made are already capable of running simple games. But how does a RT coop with - Big/open scenes (Farcry) - Lots of local lights (Doom3) - Lots of dynamic objects (a race game) - Sprites / particles / fog - Post FX (blurring, DoF, Tone Mapping, Color enhancement, ...) - Memory requirements Or maybe any other big disadvantage that I need to be aware of before using RT blindly? 8.- So, where to start? Is there something like a "ray-tracing for dummies", or a Nehe tutorial kinda like website? Allrighty. Looking forward to your responses! Rick

Share this post


Link to post
Share on other sites
Quote:
Original post by spek

1.- So... which technique is the most practical(fastest) for realtime/games?
I've read a little bit ray-tracing and photon mapping. Are these 2 different things?


Photon mapping is a technique used to solve the global illumination problem, expecially the indirect contribute of lighting and for things like caustics. Basic Raytracing doesn't take into account indirect illumination, so the two tecniques can be used togheter: PM to precompute indirect illumination and caustics, and RT to render the scene using the Photon map to add the indirect contribute.

Quote:

2.- Lights. From what I read so far I shoot rays from my eyes. They collide somewhere, but at that point we don't know yet if that point is litten yes/no. How to find that out? Shoot a ray from that point to all possible lightsources? And how about indirect lighting then? I could do it reversed, starting at the lights, but then there is not telling if its rays ever reach the camera.

You are right. You shot the 'shadow ray' from your point to each light source (at least, those that you want to compute, you can discard those that are too far or too weak if you want).
Indirect illumination, as said, is not included by standard RT, so you need to use something else (radiosity, photon mapping, path tracing and so on).

Of course you can also generate rays from the lights: this is what the original raytracing was about: what we call raytracing actually is Backward Raytracing. Anyway, Bidirectional Path Tracing and Photon Mapping both shot rays from light sources, and both use methods to ensure that rays are not wasted (in both rays starting from lights are only a step of the whole rendering: there are rays starting from the eye anyway)

Quote:

3.- Does a RT still need OpenGL/DirectX/shaders ? I guess you can combine both (for example, render a scene normally, and add special effects such as GI/Caustics/Reflections via RT). What is used in common?' I can imagine a shader is used on top to smooth/blur the somewhat noisy results of a RT produced
screen.

Raytracing is sometimes used by games engine (IIRC) to achieve some effects, but GPU are not suited for these tasks. Many real time raytracers (like Arauna) render to a texture and then use shaders to perform tone mapping and appy other filters.

Quote:

4.- How does RT access textures? I suppose you can use diffuse textures and normal/specular/gloss Maps just as well. You just access them via the RAM and eventually write your own filtering method? If that is true, it would mean you have lots of 'texture memory' and can directly change a texture as well (draw on it for example).

As long as you write a software raytracer you can do what you want with textures (images, procedural, functions of other parameters like distance from the camera and so on). In my raytracer I can use a texture to modulate another. Lightwave let you use a texture as a render target, so you can have a texture that displays the same scene from another point of view (as in a security camera).

Quote:

5.- Ray tracing has lots to do with collision detection. Now this is the part where I'm getting scared since my math is not very well. I wrote octrees and several collision detection functions, but I can't imagine them fast enough to run millions of rays... I mean 800x600 pixels = 480.000 rays. And that number multiplies if I want reflections/refractions(and we most certainly want that!). Do I underestemate the power of the CPU('s), do I count way too much rays, or is it indeed true that VERY OPTIMIZED algorithms are required here?

Download Arauna by Jacco Bikker from the web: it is most probably the faster raytracer you can find, so you can see it for yourself what you can get from raytracing and what not. Be warned that such speed can be achieved only with a VERY HUGE work!

Quote:

6.- Overall, how difficult is writing a RT? Writing a basic OpenGL program is simple, but implementating billions of effects with all kind of crazy tricks
(FBO's, shaders for each and every material type, alpha blending, (cascaded) shadowMaps, mirroring, cubeMaps, probes, @#%$#@$) is difficult as well. At least, it takes a long time before you know all of them. Shoot me if I'm wrong, but I think a raytracer is "smaller"/simpler because all the effects you can achieve are done in the same way, based on relative simple physic laws.

Writing a raytracer is not all that hard: you must write everything from scratch, but if you already wrote spatial structures and ray/triangle routines, then you can go with the interesting part soon. You will discover how easy is getting new effects once the core is working. Of course, designing a full-featured raytracer is another beast...

Quote:

On the other hand, if you want to write a fast RT, you need to know your optimizations very well. Lousy programming leads to unacceptable slow rendering I guess. Although this was probably also true with rasterization when writing
Quake 1. As the hardware speeds up, the tolerance for "bad programming" grows. But at this point, would you say writing a Raytracer is more difficult than a normal renderer(with all the special effects used nowadays)

The main performances related critical points with raytracing are well known: ray/primitive intersections, bad spatial structures, cache misses, texture sampling and so on. The there are higher and lower levels of optimization (ray packing to enhance cache coherence and SSE patterns, multiple importance sampling to reduce noise in monte carlo sampling and so on...)

Quote:

7.- I'm not planning to use RT for big projects anywhere soon. I just like to play around for now. But nevertheless, what can I expect in the nearby future(5 years)? I think some of the RayTracers made are already capable of running
simple games. But how does a RT coop with
- Big/open scenes (Farcry)
- Lots of local lights (Doom3)
- Lots of dynamic objects (a race game)
- Sprites / particles / fog
- Post FX (blurring, DoF, Tone Mapping, Color enhancement, ...)
- Memory requirements
Or maybe any other big disadvantage that I need to be aware of before using RT blindly?

IMHO RT should handle open scenes better than rasterization.
Raytracing handles local lights better than rasterization.
Raytracing handles dynamic object (probably) not as easily as rasterization.
Sprites and so on... no problem.
Post FX: everithing you want and even more (you can made separate channels for everithing :-)
Memory Req: kd-tree can be a problem with very complex geometry, and probably memory usage of a rasterized scene will require less memory (probably)

Quote:

8.- So, where to start? Is there something like a "ray-tracing for dummies", or a Nehe tutorial kinda like website?

On DevMaster.net you can find a good RT tutorial series.


Good luck with your RT :-)

EDIT: when I say RT handles this better than rasterization, I don't mean that doing the same thing with RT is faster... there are other parameters (quality, special cases to take into account, efficency just to tell some).

EDIT 2: there are not many good & free resources that cover RT on the web. There are a few tutorial (the one I linked is the best one IMHO) and many papers written by researchers. But if you want to avoid wasting hours, I suggest you to write your first small RT following the tutorials, and then buy a book.
You can take a look on ompf.org where there are highly skilled people working on RT and related techniques.

Share this post


Link to post
Share on other sites
Quote:
Original post by nmi
Maybe you will also find this book interesting:
http://www.pbrt.org

Regarding realtime raytracing you may also find this interesting:
http://www.mpi-inf.mpg.de/~guenther/BVHonGPU/index.html


PBRT is a wonderful book, but I would never suggest it to someone who is going to write his first raytracer: it focuses on physically based rendering, and most of it covers advanced techniques like sampling, global illumination, sampling, BSDF, sampling and design issues (sorry for the 3 'sampling', but pbrt really gives a LOT of pages to this subject)...

I never read 'Raytracing from the ground up', but from the table of contents it seems that it might be better suited for beginners (I'm thinking about buying it, since I sometimes find very hard to understand PBRT)...

Share this post


Link to post
Share on other sites
Merci beaucoup for this kickstart! I'm exited about writing my first RT. It doesn't have to be high quality at all, I have my other (rasterization) game/hobby/project for that. I hope I can find time though. 24 hours per day is just not enough to work, please a girlfriend, hang out with friends, learn cooking, do some sports, and raise a little kid :)

Therefore I forgot one question. Are there already API's like OpenGL/DirectX for RT? Writing everything yourself is more fun, but... I guess the answer is 'yes', but maybe there are not really high quality or 'universal' libraries (yet). I've seen the name "Arauna" flashing by several times. Is this based on an existing API/tools, is it a library itself?


And then there is the hardware. From what I understand, Intel Larrabee is trying to give a boost. But what exactly is it? A specialized CPU, like the GPU? In which ways is it going to help, does it come with an API, ... And when is it available?

Probably there won't be any computers that get this piece of hardware by default. Just like the Agea physics card. I have no idea if that card works good, but as long as the average user/gamer does not have this equipment, its not really helping the developer, unless he/she is willing to write additional code that supports this hardware. So, I guess its wise just to write a RT focussing on my current hardware (Intel dual core CPU, 2000 Mhz).

Ok, time to click your link :)
Quote from devmaster Jacco
"And believe me, you haven't really lived until you see your first colors bleeding from one surface to another due to diffuse photon scattering…
"
That reminds me of seeing my first 2D sprite tank moving 8 years ago :) Pure magic

Thanks!
Rick

Share this post


Link to post
Share on other sites
Quote:
Original post by spek
Merci beaucoup for this kickstart! I'm exited about writing my first RT. It doesn't have to be high quality at all, I have my other (rasterization) game/hobby/project for that. I hope I can find time though. 24 hours per day is just not enough to work, please a girlfriend, hang out with friends, learn cooking, do some sports, and raise a little kid :)

Yeah, I know, that's why my current RT is currently waiting on my hdd :-)

Quote:

Therefore I forgot one question. Are there already API's like OpenGL/DirectX for RT? Writing everything yourself is more fun, but... I guess the answer is 'yes', but maybe there are not really high quality or 'universal' libraries (yet). I've seen the name "Arauna" flashing by several times. Is this based on an existing API/tools, is it a library itself?

There is something around, but AFAIK nothing really interesting nor standard. There has been a lib named OpenRT somewhere, but I don't know if it is free, or still developed.
Arauna has been developed from scratch, and has been already used for two small games, so I suppose thatcan be used as an engine (the author is a member of Gamedev, chance are he will reply here as well). If what you want is a game, then you might use existing tools, but really, I think that you will feel happier by doing it yourself :-)


Quote:

And then there is the hardware. From what I understand, Intel Larrabee is trying to give a boost. But what exactly is it? A specialized CPU, like the GPU? In which ways is it going to help, does it come with an API, ... And when is it available?

Larrabee will be a x86 based processor (up to 32 cores IIRC). It's much like as if we had to work with a standard CPU, it will just be optimized for highly parallel tasks. Since Intel used to advertise it using the 'raytracing' word thousand of times, I suppose they will provide a RT api, but I'm not sure. We wont see Larrabee until late 2009 (perhaps 2010), so you still have time to learn RT :-)

Quote:

Probably there won't be any computers that get this piece of hardware by default. Just like the Agea physics card. I have no idea if that card works good, but as long as the average user/gamer does not have this equipment, its not really helping the developer, unless he/she is willing to write additional code that supports this hardware. So, I guess its wise just to write a RT focussing on my current hardware (Intel dual core CPU, 2000 Mhz).

Intel states that Larrabee will enter the market as a competitor to nVidia and ATI, and they will provide OpenGL and DX driver. Selling it as a specialized device would be a suicide. The only question is: will it be able to offer the same performances as nVidia and ATI will?

Quote:

Ok, time to click your link :)
Quote from devmaster Jacco
"And believe me, you haven't really lived until you see your first colors bleeding from one surface to another due to diffuse photon scattering…
"
That reminds me of seeing my first 2D sprite tank moving 8 years ago :) Pure magic

Well, i never implemented GI yet, but also looking at your first RT shaded sphere is a wonderful experience.

Share this post


Link to post
Share on other sites
Quote:
Original post by spek
And then there is the hardware. From what I understand, Intel Larrabee is trying to give a boost. But what exactly is it? A specialized CPU, like the GPU? In which ways is it going to help, does it come with an API, ... And when is it available?


Here is a paper about the Larrabee architecture:
http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf

Basically they just put many Pentium processors into one chip. Single-threaded applications will not benefit from this, but parallel applications like a RT will.

Share this post


Link to post
Share on other sites
Glad I am no the only one. I don't understand why the code snippets in the pbrt book recursively refer to other code snippets. Having code is already a distraction in understanding the concepts.

Do you recommend other real time ray tracers?

Quote:

I never read 'Raytracing from the ground up', but from the table of contents it seems that it might be better suited for beginners (I'm thinking about buying it, since I sometimes find very hard to understand PBRT)...


Share this post


Link to post
Share on other sites
Quote:
Original post by rumble
Glad I am no the only one. I don't understand why the code snippets in the pbrt book recursively refer to other code snippets. Having code is already a distraction in understanding the concepts.

Do you recommend other real time ray tracers?

Quote:

I never read 'Raytracing from the ground up', but from the table of contents it seems that it might be better suited for beginners (I'm thinking about buying it, since I sometimes find very hard to understand PBRT)...


For code reference there are many OS RT:
-PovRay
-YAFRAY
-PBRT
-WinOSI
-SunFlow
-Blender

-Just to tell a few on the top of my head. None of them aim to real time though and honestly I don't know of any other OS RTRT except of Arauna (I wont be interested in RTRT until I feel comfortable with RT in the first place, because making RT real time is way beyond my possibilities :-(
I don't think there are many others OS RTRT that can compete with Arauna (and that are still actively mantained), if any.

Share this post


Link to post
Share on other sites
Quote:
Original post by LeGreg
There's a tutorial here (not yet complete, I know, need to translate the rest..) :

Raytracer in C++

LeGreg


Yeah, this is the other RT tut I was thinking of when I said 'few tutorials', but I wasn't able to remember where to find it. Thank you for posting :-)

Share this post


Link to post
Share on other sites
Thanks for the references guys :) I'm working through the Jacco Bikker tutorial. Funny, that guy teaches computer science on a school nearby my place.

Got my first raytracer working. Just a plane and a sphere with direct simple dot(L,N) lighting, no reflections, shadows or other 'wow, i wet my pants' stuff, but its a start. I love the amount of control you can have on everything. Basically your entire computer becomes a shader now, with 'no' limitations.

I wonder if my speed is correct though. It still takes ~150 ms to render 1 800x600 frame. I don't have optimizations of course, but I wonder what there is to optimize in a scene of 1 sphere, 1 light(sphere) and a plane...

Bad programming was my first thought. I tried the demo from the website I'm reading. A similiar scene there takes `2.3 seconds. That even much slower! I readed further. Tutorial 3 from that website contains some phong/reflections without any optimizations, takes ~9 seconds to update on his laptop. I assume that his 1700 Mhz laptop from 2005 or before should not be faster than my dual core (2x 1.66 Ghz). But guess what, my laptop needs ~2 minutes(!). Maybe only 1 core is used or something... but still, 2 minutes is really slow.

Something stinks here. But what could it be? Dual Core processors used wrong? Windows Vista? Any other particular setting wrong? The raytracer program gets ~50% CPU, the other processed are sleeping so that should not be a problem either.

Greetings,
Rick

Share this post


Link to post
Share on other sites
Quote:
Original post by spek
I wonder if my speed is correct though. It still takes ~150 ms to render 1 800x600 frame. I don't have optimizations of course, but I wonder what there is to optimize in a scene of 1 sphere, 1 light(sphere) and a plane...


Its hard and probably wrong to wonder about optimizations at this stage.

Quote:

Bad programming was my first thought. I tried the demo from the website I'm reading. A similiar scene there takes `2.3 seconds. That even much slower! I readed further. Tutorial 3 from that website contains some phong/reflections without any optimizations, takes ~9 seconds to update on his laptop. I assume that his 1700 Mhz laptop from 2005 or before should not be faster than my dual core (2x 1.66 Ghz). But guess what, my laptop needs ~2 minutes(!). Maybe only 1 core is used or something... but still, 2 minutes is really slow.

Chances are you have some issues somewhere. Remember that your code makes not use of the two cores: if you multithrad it you can get up to 2x speed. In the future you might also think to pack many rays and shot them togheter (this does wonders, they say). But most important is implementing a good spatial partitioning scheme.

Quote:

Something stinks here. But what could it be? Dual Core processors used wrong? Windows Vista? Any other particular setting wrong? The raytracer program gets ~50% CPU, the other processed are sleeping so that should not be a problem either.

Greetings,
Rick


Without more informations I cant help you much really... You should debug it to see (for example) that rays are tested on primitives exactly once. On the other side (and just to be sure) are you compiling in release mode, right?
How many primitives are you using? Without using spatial structures, increasing the number of primitives drastically increases the time required.

Share this post


Link to post
Share on other sites
Well, luckily I can't blame my own code so far. I downloaded the tutorials from devmasters.net. The very first tutorial already runs too slow I think. It's made of:
- 1 ground plane
- 2 spheres
- 2 lightsources(spheres)
- No trees or optimizations used. Each ray is checked on all 5 primitives
- Pixels have simple diffuse lighting. No additional rays are casted so far

According to the paper, the very same scene should be rendered within a second, and that includes shadow rays, reflections and a specular term. However, the simple program without this effects needs 2 seconds, and with this effects it takes 4 seconds.

My copy of both programs runs quite alot faster. 150 ms update interval for the simple scene, 380 ms for the same scene with phong/shadow/reflections. The tutorials have been re-compiled in C++ Visual Studio 2008 (the original ones were written in VC 6). My copies are written in Delphi.

It's strange that my program runs alot faster (but still not really fast overall). But if we forget about my version, the tutorial code also runs alot slower then the website suggests. Probably only 50% of my CPU is used indeed, so that would be at ~1.66 Ghz. A little bit slower than 1.7 Ghz from the author.

Greetings,
Rick

Share this post


Link to post
Share on other sites
Quote:
Original post by spek
Probably only 50% of my CPU is used indeed, so that would be at ~1.66 Ghz. A little bit slower than 1.7 Ghz from the author.

Greetings,
Rick


Without explicit use of more threads, there is no way that the raytracer uses more than one core, and this is true also for the delphy version. It seems a bit strange to me that for the same scene the delphy version runs so much faster than the c++ version. Have you put all the optimizations on in the VC compilers settings (there are many of them)? I had 10x/15x difference moving from debug to release with all optimizations on.

Share this post


Link to post
Share on other sites
Hmmm, that could be a reason. Basically the code of both versions is the same, so that shouldn't be a big problem. I'm not very familiar with Visual Studio, I downloaded there free version yesterday. I guess all the options are on the default.

The project properties/optimizations tab shows that all optimizations are disabled. If I switch to "maximum speed" or "full", I get a build error though ("'/Ox' and '/RTC1' command-line options are incompatible")

So I enabled "Favor Fast cpde /Ot" and Intrinsic functions. That didn't make a real difference though. Maybe there are other options I missed here...

I know its hard to tell what the speed should be. But a rather simple program like that first tutorial, would the 8 fps I get with the Delphi program be normal? I know the key to high speed is to avoid as much unnessary intersection tests as possible. But if there are only a few objects...

Thanks for helping,
Rick

Share this post


Link to post
Share on other sites
Go in the project property page (under the Project menu) and follow:
Configuration properties -> C/C++ -> Code generation

and set basic runtimecheck to default.

Then go to the optimization Tab and set everything to full optimization.
You might also try to set SSE/SSE2 on in the code generation panel and see if it makes any difference.

Share this post


Link to post
Share on other sites
That works. Speed gain of ~4x. Still slower than the website claims, but ok. And I can't optimize 100% thought. The Full Optimization option gives this error:
'/Ox' and '/ZI' command-line options are incompatible

Got refractions, reflections, (hard) shadows and diffuse lighting working now :) It's not for practical game usage, but its fun to do. At least I'm a little bit prepared if we ever switch over to raytracing.

I'm not sure if the RT produced scenes are looking better than "normal renderings" though... I'm not only talking about my own little experiment here, but about the screenshots I've seen in general. The reflections and refractions beat the hell out of a normal renderer, but most of the scenes I've seen still look fake though... Too sharp, too reflective, too glossy, too clean, too noisy somehow... While a rasterizer makes "thicker", more dirty/blurred scenes. The RT results remind me a little bit of older games that used a pre-rendered background (Myst, Phantasmagoria, 7th Guest). Beautifull for that time, but yet not 100% realistic.

Of course, that has everything to do with the limited speed and therefore relative simple environments/textures/effects. A RT follows physical laws (well, more or less) and therefore should be able to render true photo-realistic, in theory. I guess most cinematics and high quality 3D renderings are using raytracing on a higher level. While we programmers focus on simple scenes showing (too much) reflections and other technical stuff.

Oh well, that's another discussion :)
Rick

Share this post


Link to post
Share on other sites
If you're just looking for some reference source code (as opposed to complete tutorials), then you could take a look at RayWatch (http://sourceforge.net/projects/raywatch). See screenshots here: http://www.gamedev.net/community/forums/topic.asp?topic_id=481216

RayWatch is a simple raytracer, written in (OS-portable) C++, for educational purposes. It uses SDL for loading images (textures). The source code is written for clarity (not performance), is Object Oriented, and is released under the GPL license.

Share this post


Link to post
Share on other sites
The realism that RT can achieve is bound to many factor: for example, a very basic RT (like the one developed in the tutorial) usually does not implement the following features:
-fresnel reflections/transmission
-reflection models others than Phong
-HDR rendering with tone mapping operators

Actually, RT can produce stunning images, even without GI. But you need to implement a few more features. You will see that you just need to implement area shadows, bump mapping (normal mapping) and fresnel to get quite realistic images. Of course, you cant get photorealistic real time raytracing on common hardware yet, but there is virtually no limit to the accuracy a raytracer can render a scene.
The main reason why most renders on RT tutorials looks fake is that it is programmer art :-)

EDIT:
This is what I can get right now with my raytracer. Far from photorealism, but a lot of features have still to be implemented...

[Edited by - cignox1 on September 19, 2008 8:57:17 AM]

Share this post


Link to post
Share on other sites
You have the textures working :) Also normalMapping (I noticed the light scattared a little bit on the wall behind). I'm moving onto that part as well, can't live without textures.

You are right about programmer art and the missing effects such as HDR, DoF or tone mapping. I think its rather easy though to draw the raycast pixels to a OpenGL (or DX) buffer and then let the shaders do the rest. I was also thinking about interpolating to win speed. Never tried it of course, but how would it look if I skip 1 pixel each time? So each pixel will be surrounded by 9 empty pixels. Later on I can interpolate them on the CPU or GPU. I only have to 25% of the rays now, and I get some sort of AA in return for it. I loose quality and sharpness of course, but ironically, lately much effort is spend on making heavy blur shaders (HDR bloom, DoF, more filtering) in the GPU world.

Another interesting appliance of raycasting might be GI. As far as I know, the LightsMark benchmark program renders everything with normal shader techniques & OpenGL, but... the indirect lighting and reflections are sampled via raytracing and stored into a lightMap. It's still a heavy task, but then again, that lightMap does not have to be fully updated each cycle. I think its faster than rendering a realtime lightMap on the GPU (I have been doing that much lately :) ), and also more accurate. The downside is that CPU gets occupied. Could be a problem if you want AI and physics as well...

Greetings & success with your raytracer!
Rick

Share this post


Link to post
Share on other sites
Once I tried tracing one pixel every two and unlukily is not all that good as on might think. I also tried tracing one pixel every two and then tracing the missing pixel only if the difference between the sourrounding ones was above a threshold. Better, but performance gain was not so high (I don't remember how much though).
A few (perhaps) better approaches could be:
-If antialiasing is used, you can trace one pixel every two. Then if the difference is high, you trace only a few rays, taking into account also sourrounding pixels. That is, you improve the interpolation value by shooting a few rays (i.e: instead than tracing 16 rays you trace only four).
-You don't interpolate the color of the pixels, but instead surface properties, if all foour pixels belong to the same object (specifically position of the intersections and uv coodinates). This might become a bit tricky, for example when uv coords were modified by a texture mapper (you would need intrinsic surface uv) but this will save a lot of primary rays, wich in tipical cases are the most part.

There are tricks available also for shadow rays, but I wouldn't design my RT around them (unless I target a real time RT)...

Share this post


Link to post
Share on other sites

This topic is 3376 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628740
    • Total Posts
      2984472
  • Similar Content

    • By alex1997
      I'm looking to render multiple objects (rectangles) with different shaders. So far I've managed to render one rectangle made out of 2 triangles and apply shader to it, but when it comes to render another I get stucked. Searched for documentations or stuffs that could help me, but everything shows how to render only 1 object. Any tips or help is highly appreciated, thanks!
      Here's my code for rendering one object with shader!
       
      #define GLEW_STATIC #include <stdio.h> #include <GL/glew.h> #include <GLFW/glfw3.h> #include "window.h" #define GLSL(src) "#version 330 core\n" #src // #define ASSERT(expression, msg) if(expression) {fprintf(stderr, "Error on line %d: %s\n", __LINE__, msg);return -1;} int main() { // Init GLFW if (glfwInit() != GL_TRUE) { std::cerr << "Failed to initialize GLFW\n" << std::endl; exit(EXIT_FAILURE); } // Create a rendering window with OpenGL 3.2 context glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3); glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2); glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); glfwWindowHint(GLFW_RESIZABLE, GL_FALSE); // assing window pointer GLFWwindow *window = glfwCreateWindow(800, 600, "OpenGL", NULL, NULL); glfwMakeContextCurrent(window); // Init GLEW glewExperimental = GL_TRUE; if (glewInit() != GLEW_OK) { std::cerr << "Failed to initialize GLEW\n" << std::endl; exit(EXIT_FAILURE); } // ----------------------------- RESOURCES ----------------------------- // // create gl data const GLfloat positions[8] = { -0.5f, -0.5f, 0.5f, -0.5f, 0.5f, 0.5f, -0.5f, 0.5f, }; const GLuint elements[6] = { 0, 1, 2, 2, 3, 0 }; // Create Vertex Array Object GLuint vao; glGenVertexArrays(1, &vao); glBindVertexArray(vao); // Create a Vertex Buffer Object and copy the vertex data to it GLuint vbo; glGenBuffers(1, &vbo); glBindBuffer(GL_ARRAY_BUFFER, vbo); glBufferData(GL_ARRAY_BUFFER, sizeof(positions), positions, GL_STATIC_DRAW); // Specify the layout of the vertex data glEnableVertexAttribArray(0); // layout(location = 0) glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, 0); // Create a Elements Buffer Object and copy the elements data to it GLuint ebo; glGenBuffers(1, &ebo); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo); glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(elements), elements, GL_STATIC_DRAW); // Create and compile the vertex shader const GLchar *vertexSource = GLSL( layout(location = 0) in vec2 position; void main() { gl_Position = vec4(position, 0.0, 1.0); } ); GLuint vertexShader = glCreateShader(GL_VERTEX_SHADER); glShaderSource(vertexShader, 1, &vertexSource, NULL); glCompileShader(vertexShader); // Create and compile the fragment shader const char* fragmentSource = GLSL( out vec4 gl_FragColor; uniform vec2 u_resolution; void main() { vec2 pos = gl_FragCoord.xy / u_resolution; gl_FragColor = vec4(1.0); } ); GLuint fragmentShader = glCreateShader(GL_FRAGMENT_SHADER); glShaderSource(fragmentShader, 1, &fragmentSource, NULL); glCompileShader(fragmentShader); // Link the vertex and fragment shader into a shader program GLuint shaderProgram = glCreateProgram(); glAttachShader(shaderProgram, vertexShader); glAttachShader(shaderProgram, fragmentShader); glLinkProgram(shaderProgram); glUseProgram(shaderProgram); // get uniform's id by name and set value GLint uRes = glGetUniformLocation(shaderProgram, "u_Resolution"); glUniform2f(uRes, 800.0f, 600.0f); // ---------------------------- RENDERING ------------------------------ // while(!glfwWindowShouldClose(window)) { // Clear the screen to black glClear(GL_COLOR_BUFFER_BIT); glClearColor(0.0f, 0.5f, 1.0f, 1.0f); // Draw a rectangle made of 2 triangles -> 6 vertices glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, NULL); // Swap buffers and poll window events glfwSwapBuffers(window); glfwPollEvents(); } // ---------------------------- CLEARING ------------------------------ // // Delete allocated resources glDeleteProgram(shaderProgram); glDeleteShader(fragmentShader); glDeleteShader(vertexShader); glDeleteBuffers(1, &vbo); glDeleteVertexArrays(1, &vao); return 0; }  
    • By Vortez
      Hi guys, im having a little problem fixing a bug in my program since i multi-threaded it. The app is a little video converter i wrote for fun. To help you understand the problem, ill first explain how the program is made. Im using Delphi to do the GUI/Windows part of the code, then im loading a c++ dll for the video conversion. The problem is not related to the video conversion, but with OpenGL only. The code work like this:

       
      DWORD WINAPI JobThread(void *params) { for each files { ... _ConvertVideo(input_name, output_name); } } void EXP_FUNC _ConvertVideo(char *input_fname, char *output_fname) { // Note that im re-initializing and cleaning up OpenGL each time this function is called... CGLEngine GLEngine; ... // Initialize OpenGL GLEngine.Initialize(render_wnd); GLEngine.CreateTexture(dst_width, dst_height, 4); // decode the video and render the frames... for each frames { ... GLEngine.UpdateTexture(pY, pU, pV); GLEngine.Render(); } cleanup: GLEngine.DeleteTexture(); GLEngine.Shutdown(); // video cleanup code... }  
      With a single thread, everything work fine. The problem arise when im starting the thread for a second time, nothing get rendered, but the encoding work fine. For example, if i start the thread with 3 files to process, all of them render fine, but if i start the thread again (with the same batch of files or not...), OpenGL fail to render anything.
      Im pretty sure it has something to do with the rendering context (or maybe the window DC?). Here a snippet of my OpenGL class:
      bool CGLEngine::Initialize(HWND hWnd) { hDC = GetDC(hWnd); if(!SetupPixelFormatDescriptor(hDC)){ ReleaseDC(hWnd, hDC); return false; } hRC = wglCreateContext(hDC); wglMakeCurrent(hDC, hRC); // more code ... return true; } void CGLEngine::Shutdown() { // some code... if(hRC){wglDeleteContext(hRC);} if(hDC){ReleaseDC(hWnd, hDC);} hDC = hRC = NULL; }  
      The full source code is available here. The most relevant files are:
      -OpenGL class (header / source)
      -Main code (header / source)
       
      Thx in advance if anyone can help me.
    • By DiligentDev
      This article uses material originally posted on Diligent Graphics web site.
      Introduction
      Graphics APIs have come a long way from small set of basic commands allowing limited control of configurable stages of early 3D accelerators to very low-level programming interfaces exposing almost every aspect of the underlying graphics hardware. Next-generation APIs, Direct3D12 by Microsoft and Vulkan by Khronos are relatively new and have only started getting widespread adoption and support from hardware vendors, while Direct3D11 and OpenGL are still considered industry standard. New APIs can provide substantial performance and functional improvements, but may not be supported by older hardware. An application targeting wide range of platforms needs to support Direct3D11 and OpenGL. New APIs will not give any advantage when used with old paradigms. It is totally possible to add Direct3D12 support to an existing renderer by implementing Direct3D11 interface through Direct3D12, but this will give zero benefits. Instead, new approaches and rendering architectures that leverage flexibility provided by the next-generation APIs are expected to be developed.
      There are at least four APIs (Direct3D11, Direct3D12, OpenGL/GLES, Vulkan, plus Apple's Metal for iOS and osX platforms) that a cross-platform 3D application may need to support. Writing separate code paths for all APIs is clearly not an option for any real-world application and the need for a cross-platform graphics abstraction layer is evident. The following is the list of requirements that I believe such layer needs to satisfy:
      Lightweight abstractions: the API should be as close to the underlying native APIs as possible to allow an application leverage all available low-level functionality. In many cases this requirement is difficult to achieve because specific features exposed by different APIs may vary considerably. Low performance overhead: the abstraction layer needs to be efficient from performance point of view. If it introduces considerable amount of overhead, there is no point in using it. Convenience: the API needs to be convenient to use. It needs to assist developers in achieving their goals not limiting their control of the graphics hardware. Multithreading: ability to efficiently parallelize work is in the core of Direct3D12 and Vulkan and one of the main selling points of the new APIs. Support for multithreading in a cross-platform layer is a must. Extensibility: no matter how well the API is designed, it still introduces some level of abstraction. In some cases the most efficient way to implement certain functionality is to directly use native API. The abstraction layer needs to provide seamless interoperability with the underlying native APIs to provide a way for the app to add features that may be missing. Diligent Engine is designed to solve these problems. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common C++ front-end for all supported platforms and provides interoperability with underlying native APIs. It also supports integration with Unity and is designed to be used as graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. Full source code is available for download at GitHub and is free to use.
      Overview
      Diligent Engine API takes some features from Direct3D11 and Direct3D12 as well as introduces new concepts to hide certain platform-specific details and make the system easy to use. It contains the following main components:
      Render device (IRenderDevice  interface) is responsible for creating all other objects (textures, buffers, shaders, pipeline states, etc.).
      Device context (IDeviceContext interface) is the main interface for recording rendering commands. Similar to Direct3D11, there are immediate context and deferred contexts (which in Direct3D11 implementation map directly to the corresponding context types). Immediate context combines command queue and command list recording functionality. It records commands and submits the command list for execution when it contains sufficient number of commands. Deferred contexts are designed to only record command lists that can be submitted for execution through the immediate context.
      An alternative way to design the API would be to expose command queue and command lists directly. This approach however does not map well to Direct3D11 and OpenGL. Besides, some functionality (such as dynamic descriptor allocation) can be much more efficiently implemented when it is known that a command list is recorded by a certain deferred context from some thread.
      The approach taken in the engine does not limit scalability as the application is expected to create one deferred context per thread, and internally every deferred context records a command list in lock-free fashion. At the same time this approach maps well to older APIs.
      In current implementation, only one immediate context that uses default graphics command queue is created. To support multiple GPUs or multiple command queue types (compute, copy, etc.), it is natural to have one immediate contexts per queue. Cross-context synchronization utilities will be necessary.
      Swap Chain (ISwapChain interface). Swap chain interface represents a chain of back buffers and is responsible for showing the final rendered image on the screen.
      Render device, device contexts and swap chain are created during the engine initialization.
      Resources (ITexture and IBuffer interfaces). There are two types of resources - textures and buffers. There are many different texture types (2D textures, 3D textures, texture array, cubmepas, etc.) that can all be represented by ITexture interface.
      Resources Views (ITextureView and IBufferView interfaces). While textures and buffers are mere data containers, texture views and buffer views describe how the data should be interpreted. For instance, a 2D texture can be used as a render target for rendering commands or as a shader resource.
      Pipeline State (IPipelineState interface). GPU pipeline contains many configurable stages (depth-stencil, rasterizer and blend states, different shader stage, etc.). Direct3D11 uses coarse-grain objects to set all stage parameters at once (for instance, a rasterizer object encompasses all rasterizer attributes), while OpenGL contains myriad functions to fine-grain control every individual attribute of every stage. Both methods do not map very well to modern graphics hardware that combines all states into one monolithic state under the hood. Direct3D12 directly exposes pipeline state object in the API, and Diligent Engine uses the same approach.
      Shader Resource Binding (IShaderResourceBinding interface). Shaders are programs that run on the GPU. Shaders may access various resources (textures and buffers), and setting correspondence between shader variables and actual resources is called resource binding. Resource binding implementation varies considerably between different API. Diligent Engine introduces a new object called shader resource binding that encompasses all resources needed by all shaders in a certain pipeline state.
      API Basics
      Creating Resources
      Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. Graphics APIs usually have a native object that represents linear buffer. Diligent Engine uses IBuffer interface as an abstraction for a native buffer. To create a buffer, one needs to populate BufferDesc structure and call IRenderDevice::CreateBuffer() method as in the following example:
      BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); While there is usually just one buffer object, different APIs use very different approaches to represent textures. For instance, in Direct3D11, there are ID3D11Texture1D, ID3D11Texture2D, and ID3D11Texture3D objects. In OpenGL, there is individual object for every texture dimension (1D, 2D, 3D, Cube), which may be a texture array, which may also be multisampled (i.e. GL_TEXTURE_2D_MULTISAMPLE_ARRAY). As a result there are nine different GL texture types that Diligent Engine may create under the hood. In Direct3D12, there is only one resource interface. Diligent Engine hides all these details in ITexture interface. There is only one  IRenderDevice::CreateTexture() method that is capable of creating all texture types. Dimension, format, array size and all other parameters are specified by the members of the TextureDesc structure:
      TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); If native API supports multithreaded resource creation, textures and buffers can be created by multiple threads simultaneously.
      Interoperability with native API provides access to the native buffer/texture objects and also allows creating Diligent Engine objects from native handles. It allows applications seamlessly integrate native API-specific code with Diligent Engine.
      Next-generation APIs allow fine level-control over how resources are allocated. Diligent Engine does not currently expose this functionality, but it can be added by implementing IResourceAllocator interface that encapsulates specifics of resource allocation and providing this interface to CreateBuffer() or CreateTexture() methods. If null is provided, default allocator should be used.
      Initializing the Pipeline State
      As it was mentioned earlier, Diligent Engine follows next-gen APIs to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.). This approach maps directly to Direct3D12/Vulkan, but is also beneficial for older APIs as it eliminates pipeline misconfiguration errors. With many individual calls tweaking various GPU pipeline settings it is very easy to forget to set one of the states or assume the stage is already properly configured when in fact it is not. Using pipeline state object helps avoid these problems as all stages are configured at once.
      Creating Shaders
      While in earlier APIs shaders were bound separately, in the next-generation APIs as well as in Diligent Engine shaders are part of the pipeline state object. The biggest challenge when authoring shaders is that Direct3D and OpenGL/Vulkan use different shader languages (while Apple uses yet another language in their Metal API). Maintaining two versions of every shader is not an option for real applications and Diligent Engine implements shader source code converter that allows shaders authored in HLSL to be translated to GLSL. To create a shader, one needs to populate ShaderCreationAttribs structure. SourceLanguage member of this structure tells the system which language the shader is authored in:
      SHADER_SOURCE_LANGUAGE_DEFAULT - The shader source language matches the underlying graphics API: HLSL for Direct3D11/Direct3D12 mode, and GLSL for OpenGL and OpenGLES modes. SHADER_SOURCE_LANGUAGE_HLSL - The shader source is in HLSL. For OpenGL and OpenGLES modes, the source code will be converted to GLSL. SHADER_SOURCE_LANGUAGE_GLSL - The shader source is in GLSL. There is currently no GLSL to HLSL converter, so this value should only be used for OpenGL and OpenGLES modes. There are two ways to provide the shader source code. The first way is to use Source member. The second way is to provide a file path in FilePath member. Since the engine is entirely decoupled from the platform and the host file system is platform-dependent, the structure exposes pShaderSourceStreamFactory member that is intended to provide the engine access to the file system. If FilePath is provided, shader source factory must also be provided. If the shader source contains any #include directives, the source stream factory will also be used to load these files. The engine provides default implementation for every supported platform that should be sufficient in most cases. Custom implementation can be provided when needed.
      When sampling a texture in a shader, the texture sampler was traditionally specified as separate object that was bound to the pipeline at run time or set as part of the texture object itself. However, in most cases it is known beforehand what kind of sampler will be used in the shader. Next-generation APIs expose new type of sampler called static sampler that can be initialized directly in the pipeline state. Diligent Engine exposes this functionality: when creating a shader, textures can be assigned static samplers. If static sampler is assigned, it will always be used instead of the one initialized in the texture shader resource view. To initialize static samplers, prepare an array of StaticSamplerDesc structures and initialize StaticSamplers and NumStaticSamplers members. Static samplers are more efficient and it is highly recommended to use them whenever possible. On older APIs, static samplers are emulated via generic sampler objects.
      The following is an example of shader initialization:
      ShaderCreationAttribs Attrs; Attrs.Desc.Name = "MyPixelShader"; Attrs.FilePath = "MyShaderFile.fx"; Attrs.SearchDirectories = "shaders;shaders\\inc;"; Attrs.EntryPoint = "MyPixelShader"; Attrs.Desc.ShaderType = SHADER_TYPE_PIXEL; Attrs.SourceLanguage = SHADER_SOURCE_LANGUAGE_HLSL; BasicShaderSourceStreamFactory BasicSSSFactory(Attrs.SearchDirectories); Attrs.pShaderSourceStreamFactory = &BasicSSSFactory; ShaderVariableDesc ShaderVars[] = {     {"g_StaticTexture", SHADER_VARIABLE_TYPE_STATIC},     {"g_MutableTexture", SHADER_VARIABLE_TYPE_MUTABLE},     {"g_DynamicTexture", SHADER_VARIABLE_TYPE_DYNAMIC} }; Attrs.Desc.VariableDesc = ShaderVars; Attrs.Desc.NumVariables = _countof(ShaderVars); Attrs.Desc.DefaultVariableType = SHADER_VARIABLE_TYPE_STATIC; StaticSamplerDesc StaticSampler; StaticSampler.Desc.MinFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MagFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MipFilter = FILTER_TYPE_LINEAR; StaticSampler.TextureName = "g_MutableTexture"; Attrs.Desc.NumStaticSamplers = 1; Attrs.Desc.StaticSamplers = &StaticSampler; ShaderMacroHelper Macros; Macros.AddShaderMacro("USE_SHADOWS", 1); Macros.AddShaderMacro("NUM_SHADOW_SAMPLES", 4); Macros.Finalize(); Attrs.Macros = Macros; RefCntAutoPtr<IShader> pShader; m_pDevice->CreateShader( Attrs, &pShader );
      Creating the Pipeline State Object
      After all required shaders are created, the rest of the fields of the PipelineStateDesc structure provide depth-stencil, rasterizer, and blend state descriptions, the number and format of render targets, input layout format, etc. For instance, rasterizer state can be described as follows:
      PipelineStateDesc PSODesc; RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; RasterizerDesc.AntialiasedLineEnable = False; Depth-stencil and blend states are defined in a similar fashion.
      Another important thing that pipeline state object encompasses is the input layout description that defines how inputs to the vertex shader, which is the very first shader stage, should be read from the memory. Input layout may define several vertex streams that contain values of different formats and sizes:
      // Define input layout InputLayoutDesc &Layout = PSODesc.GraphicsPipeline.InputLayout; LayoutElement TextLayoutElems[] = {     LayoutElement( 0, 0, 3, VT_FLOAT32, False ),     LayoutElement( 1, 0, 4, VT_UINT8, True ),     LayoutElement( 2, 0, 2, VT_FLOAT32, False ), }; Layout.LayoutElements = TextLayoutElems; Layout.NumElements = _countof( TextLayoutElems ); Finally, pipeline state defines primitive topology type. When all required members are initialized, a pipeline state object can be created by IRenderDevice::CreatePipelineState() method:
      // Define shader and primitive topology PSODesc.GraphicsPipeline.PrimitiveTopologyType = PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; PSODesc.GraphicsPipeline.pVS = pVertexShader; PSODesc.GraphicsPipeline.pPS = pPixelShader; PSODesc.Name = "My pipeline state"; m_pDev->CreatePipelineState(PSODesc, &m_pPSO); When PSO object is bound to the pipeline, the engine invokes all API-specific commands to set all states specified by the object. In case of Direct3D12 this maps directly to setting the D3D12 PSO object. In case of Direct3D11, this involves setting individual state objects (such as rasterizer and blend states), shaders, input layout etc. In case of OpenGL, this requires a number of fine-grain state tweaking calls. Diligent Engine keeps track of currently bound states and only calls functions to update these states that have actually changed.
      Binding Shader Resources
      Direct3D11 and OpenGL utilize fine-grain resource binding models, where an application binds individual buffers and textures to certain shader or program resource binding slots. Direct3D12 uses a very different approach, where resource descriptors are grouped into tables, and an application can bind all resources in the table at once by setting the table in the command list. Resource binding model in Diligent Engine is designed to leverage this new method. It introduces a new object called shader resource binding that encapsulates all resource bindings required for all shaders in a certain pipeline state. It also introduces the classification of shader variables based on the frequency of expected change that helps the engine group them into tables under the hood:
      Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. Shader variable type must be specified during shader creation by populating an array of ShaderVariableDesc structures and initializing ShaderCreationAttribs::Desc::VariableDesc and ShaderCreationAttribs::Desc::NumVariables members (see example of shader creation above).
      Static variables cannot be changed once a resource is bound to the variable. They are bound directly to the shader object. For instance, a shadow map texture is not expected to change after it is created, so it can be bound directly to the shader:
      PixelShader->GetShaderVariable( "g_tex2DShadowMap" )->Set( pShadowMapSRV ); Mutable and dynamic variables are bound via a new Shader Resource Binding object (SRB) that is created by the pipeline state (IPipelineState::CreateShaderResourceBinding()):
      m_pPSO->CreateShaderResourceBinding(&m_pSRB); Note that an SRB is only compatible with the pipeline state it was created from. SRB object inherits all static bindings from shaders in the pipeline, but is not allowed to change them.
      Mutable resources can only be set once for every instance of a shader resource binding. Such resources are intended to define specific material properties. For instance, a diffuse texture for a specific material is not expected to change once the material is defined and can be set right after the SRB object has been created:
      m_pSRB->GetVariable(SHADER_TYPE_PIXEL, "tex2DDiffuse")->Set(pDiffuseTexSRV); In some cases it is necessary to bind a new resource to a variable every time a draw command is invoked. Such variables should be labeled as dynamic, which will allow setting them multiple times through the same SRB object:
      m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); Under the hood, the engine pre-allocates descriptor tables for static and mutable resources when an SRB objcet is created. Space for dynamic resources is dynamically allocated at run time. Static and mutable resources are thus more efficient and should be used whenever possible.
      As you can see, Diligent Engine does not expose low-level details of how resources are bound to shader variables. One reason for this is that these details are very different for various APIs. The other reason is that using low-level binding methods is extremely error-prone: it is very easy to forget to bind some resource, or bind incorrect resource such as bind a buffer to the variable that is in fact a texture, especially during shader development when everything changes fast. Diligent Engine instead relies on shader reflection system to automatically query the list of all shader variables. Grouping variables based on three types mentioned above allows the engine to create optimized layout and take heavy lifting of matching resources to API-specific resource location, register or descriptor in the table.
      This post gives more details about the resource binding model in Diligent Engine.
      Setting the Pipeline State and Committing Shader Resources
      Before any draw or compute command can be invoked, the pipeline state needs to be bound to the context:
      m_pContext->SetPipelineState(m_pPSO); Under the hood, the engine sets the internal PSO object in the command list or calls all the required native API functions to properly configure all pipeline stages.
      The next step is to bind all required shader resources to the GPU pipeline, which is accomplished by IDeviceContext::CommitShaderResources() method:
      m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); The method takes a pointer to the shader resource binding object and makes all resources the object holds available for the shaders. In the case of D3D12, this only requires setting appropriate descriptor tables in the command list. For older APIs, this typically requires setting all resources individually.
      Next-generation APIs require the application to track the state of every resource and explicitly inform the system about all state transitions. For instance, if a texture was used as render target before, while the next draw command is going to use it as shader resource, a transition barrier needs to be executed. Diligent Engine does the heavy lifting of state tracking.  When CommitShaderResources() method is called with COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES flag, the engine commits and transitions resources to correct states at the same time. Note that transitioning resources does introduce some overhead. The engine tracks state of every resource and it will not issue the barrier if the state is already correct. But checking resource state is an overhead that can sometimes be avoided. The engine provides IDeviceContext::TransitionShaderResources() method that only transitions resources:
      m_pContext->TransitionShaderResources(m_pPSO, m_pSRB); In some scenarios it is more efficient to transition resources once and then only commit them.
      Invoking Draw Command
      The final step is to set states that are not part of the PSO, such as render targets, vertex and index buffers. Diligent Engine uses Direct3D11-syle API that is translated to other native API calls under the hood:
      ITextureView *pRTVs[] = {m_pRTV}; m_pContext->SetRenderTargets(_countof( pRTVs ), pRTVs, m_pDSV); // Clear render target and depth buffer const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); m_pContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); Different native APIs use various set of function to execute draw commands depending on command details (if the command is indexed, instanced or both, what offsets in the source buffers are used etc.). For instance, there are 5 draw commands in Direct3D11 and more than 9 commands in OpenGL with something like glDrawElementsInstancedBaseVertexBaseInstance not uncommon. Diligent Engine hides all details with single IDeviceContext::Draw() method that takes takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
      DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); For compute commands, there is IDeviceContext::DispatchCompute() method that takes DispatchComputeAttribs structure that defines compute grid dimension.
      Source Code
      Full engine source code is available on GitHub and is free to use. The repository contains two samples, asteroids performance benchmark and example Unity project that uses Diligent Engine in native plugin.
      AntTweakBar sample is Diligent Engine’s “Hello World” example.

       
      Atmospheric scattering sample is a more advanced example. It demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to multiple render targets, using compute shaders and unordered access views, etc.

      Asteroids performance benchmark is based on this demo developed by Intel. It renders 50,000 unique textured asteroids and allows comparing performance of Direct3D11 and Direct3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures.

      Finally, there is an example project that shows how Diligent Engine can be integrated with Unity.

      Future Work
      The engine is under active development. It currently supports Windows desktop, Universal Windows and Android platforms. Direct3D11, Direct3D12, OpenGL/GLES backends are now feature complete. Vulkan backend is coming next, and support for more platforms is planned.
    • By michaeldodis
      I've started building a small library, that can render pie menu GUI in legacy opengl, planning to add some traditional elements of course.
      It's interface is similar to something you'd see in IMGUI. It's written in C.
      Early version of the library
      I'd really love to hear anyone's thoughts on this, any suggestions on what features you'd want to see in a library like this? 
      Thanks in advance!
    • By Michael Aganier
      I have this 2D game which currently eats up to 200k draw calls per frame. The performance is acceptable, but I want a lot more than that. I need to batch my sprite drawing, but I'm not sure what's the best way in OpenGL 3.3 (to keep compatibility with older machines).
      Each individual sprite move independently almost every frame and their is a variety of textures and animations. What's the fastest way to render a lot of dynamic sprites? Should I map all my data to the GPU and update it all the time? Should I setup my data in the RAM and send it to the GPU all at once? Should I use one draw call per sprite and let the matrices apply the transformations or should I compute the transformations in a world vbo on the CPU so that they can be rendered by a single draw call?
  • Popular Now