Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!

Vilem Otte

Member Since 11 May 2006
Offline Last Active Yesterday, 04:16 AM

#5198242 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 14 December 2014 - 10:40 PM

Alright, for some reason I woke up very early in the morning... so let's get started:


For each pixel you want: c=1/n sum_0^n x_n



  • c is the resulting color of the pixel
  • n is the number of samples rendered
  • x_n is the sample

The most obvious and easiest way to perform progressive rendering is to use a buffer where you accumulate samples, like:

Buffer accumulator;
int samples;


    samples = 0;

    for each pixel (i)
        accumulator[i] += ComputeSample(i);

    for each pixel (i)
        resultPixel[i] = accumulator[i] / (float)samples;

Now, this approach works (note, the shown code is written like pseudo-code, you can do a ton of optimizations in there using parallelization and moving that ugly slow division prior to for loop and use multiplication inside).


Each time you move something in your scene (or your camera has moved) you have to call ResetProgressiveRendering. There are several problems with this approach:

  1. You have to hold 32-bit floating point buffer for accumulator (or 16-bit floating point, or 16-bit integer ... but you will hit precision problems early ... for 16-bit integer it will be as soon as 256 samples per pixel, for 16-bit fp probably a bit later (it depends!) ... 32-bit fp should be okay)
  2. 32-bit fp RGB buffer means ... width * height * 4 * 3 bytes of data at least, that is 4 times more than for 8-bit integer RGB buffer; it is not such a big deal on PC, but on android devices you don't want to waste most of the memory on output buffer

There is another approach, where you can also always keep the buffer you're already showing and divide at the point where you add new sample into buffer. Time for pseudo-code:

Buffer result;
int samples;


    samples = 0;

    for each pixel (i)
        result[i] = result[i] * samples / (float)(samples + 1) + ComputeSample(i) / (float)(samples + 1);


Note, you must pre-compute samples / (float)(samples + 1) and 1.0f / (float)(samples + 1) ... most compilers are not intelligent enough to pre-compute these parts of the equation and you would be doing a lot more divisions (for 1000x1000 pixels it is 2M divisions per frame (from Intel Optimization Manual on modern Core i7 single fp division eats 24 clocks, multiplication 1 or 2 clocks (can't remember correctly) ... this means that by not pre-computing these you do at least 22 more clocks 2 milion times per frame ... thats 44M clocks wasted -> just to give you little insight into the optimization - and why it heavily matters here)


This solution has an advantage, if you do correct casting, result can be 8-bit integer RGB buffer - that means less memory, less fillrate used and that means more speed!


Of course it also has disadvantages:

  1. If you're not careful, you will lose precision (very quickly), otherwise you will jus lose it (definitely a lot faster than in the previous case for 32-bit fp solution). Although, if your path tracing algorithm is good enough (F.e. 4 samples per pixel at time, doing bi-directional path tracing ,etc.) it may be enough to combine just several frames of progressive resulting in smooth image (with smooth caustics!) ... quite good, don't you think?
  2. As the buffer where you sum samples is your result buffer, it may cause some troubles with tone-mapping or such (in case you combine your samples wrongly - because those inside result buffer will be already after tone mapping (in case of tone mapping)) ... or you can use F.e. 16-bit fp RGB buffer instead, and do tone mapping after everything (which will work well too!).

It is up to you which one will you select, honestly on Android I'd prefer going the way that uses the least memory and the least fillrate (and the least processing power), as there isn't that much computing power as on the PC. Note, inside NTrace project we have a path tracer (actually there is more path tracing algorithms, they use different sampling strategy, etc.) that uses 2nd solution on PCs and it works well (if I remember correctly we got quite good results with 8-bit integer targets there too!).

#5198225 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 14 December 2014 - 07:55 PM

What you generally want is a buffer where you accumulate samples and divide by number of sample count. I will elaborate more tomorrow as it is 3am here atm and there would be tons of mistakes from my side. wink.png

#5197702 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 11 December 2014 - 07:04 PM

I'm still at work today (I've got a demo day tomorrow with one application, so I will be most likely sleeping for just few hours). But as we're running tests that take very long time, so...


Why you see black/colored pixels


The hint you want had already been mentioned:

You cast a ray from camera origin to sphere - you compute your hitpoint as "HitPoint = Origin + Distance * Direction", which will most likely get you INSIDE the sphere (due to precision of floating point numbers).


Now, a quick hack is using something like "HitPoint = Origin + Distance * 0.99 * Direction", this isn't a real solution, but this hack works 99% of time and for start it is absolutely valid to use it. Technically you would like the constant to be relative to your distance (floats are quite precise between 0.0 and 1.0, less precise in thousands, and even less precise when you get to millions, billions and up ... the higher your value (in absolute value) is, the worse the problem is, and the constant offset also needs to be higher).


For a quick view into precision problem - go check this site https://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2


E.g. modify your hitpoint calculation as:

HitPoint = ray.origin.add(ray.direction.multiply(distance * 0.99f));

This way you should see something. (It won't be perfect and lots of rays miss, you will need better sampling than just randomly shoot rays - once you got this running, I will describe few ways how to improve it).

#5197109 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 09 December 2014 - 02:22 AM

Your random number generator and logic behind it is wrong. Sadly I have to go off from PC in few minutes so I won't be able to give full answer and some background why is that (I will do that later).


As for now - your random direction should be normalized (unless you normalize it in ray constructor - which I don't think you do). There are few rules about the "random" direction and generally random number generation behind - you should use different random number generators for different surfaces. As of now, go with:

// Initialization of RNG, try to do this only once for whole random direction generator object - SEED is some number (F.e. 123456 ... or current timestamp, or anything :) )
Random rng = new Random(SEED);


// Your random direction into sphere generation code

Vector3D v = new Vector3D();
v.x = rng().NextDouble() * 2.0 - 1.0;
v.y = rng().NextDouble() * 2.0 - 1.0;
v.z = rng().NextDouble() * 2.0 - 1.0;


// Your random direction into hemisphere
Vector3D v = GenerateRandomDirectionIntoSphere();
if (v.dot(normal) < 0.0)
    // Where negate does return new Vector3D(-v.x, -v.y, -v.z);
    v = v.negate();

The process of "how" you generate random numbers determines how good is your path tracer (I will elaborate later ;) ).

#5196637 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 06 December 2014 - 10:52 AM



Your ray generation code is a nonsense... for F.e. x = 0, y = 0 you create a vector (0, 0, 0) and normalize it - all of its components are divided by sqrt(0^2 + 0^2 + 0^2) ... afaik sqrt(0) = 0 (or technically +0 or -0, both are correct answers) ... nothing changes the fact that you're dividing by 0. (Fyi. in mathematics sqrt(0) depends on convention - there are multiple definitions - one of the conventions also specifies that 0^2 is undefined, thus also sqrt(0) is undefined).


You want your rays to go from left to right, where x=0 will be in the middle column of pixels, and y=0 in middle row of pixels - also your z determines your field of view angle. Technically you want something like this:

float xdir = (x / (float)width) * 2.0f - 1.0f; // Keep x direction in interval <-1; 1>
float ydir = ((y / (float)height) * 2.0f - 1.0f) * aspect; // Keep y direction in interval <-1; 1>, aspect is (height / width) in your case
float zdir = 1.0f / (float)tan(fov); // Where fov represents field-of-view angle (in radians)
Ray ray = new Ray(camera, new Vector3D(xdir, ydir, zdir).normalize());

I hope xdir and ydir makes sense (their computation), zdir can be calculated using trigonometry.


This will generate correctly your ray directions, assuming your intersection code is correct - you should see a sphere.

#5195808 Writing my own pathtracer on mobile device

Posted by Vilem Otte on 01 December 2014 - 07:40 PM

Welcome to the forums.


I'm glad there is another fan of ray tracing and another brave one to try writing them. Although for the beginning, allow me to share few things I've learned (mostly the hard way).


First of all, you need to know at least some math (3D math -> vectors, solving equations, etc.). I would recommend you to try naive ray tracing first (just primary rays, then try whitted ray tracing) - and next you can dive into more complex stuff. Be sure to check out few ways how to optimize your renderer (BVHs, KDTrees, Grids, Nested Grids, etc.), and don't forget also to optimize your intersection routines (you will need them precise and fast for path tracing). Once you've got this, you got basics to jump into wonderful world of global illumination.


Android is indeed an interesting plaform, I'm though not sure how fast you will get without native code (Java is slow - but for learning, why not ... you can also always move to native code later - and optimize from there).


Anyways, I wish you best of luck in wonderful world of infinitely bouncing rays, feel free to ask any further question, I'll try to answer... to your questions so far:


1.) Currently I'm working mostly with high performance ray tracing (where we decided to use just single type of primitive - triangle). Although I've worked with CPU-based general ray tracers, the approach worked like this:


There was a Primitive class - it had several virtual functions (including the one for Ray-Primitive test, and also GetAABB (Axis-aligned bounding box) which was used when building KD-Tree for the scene). Triangle, Sphere, Plane, etc. all these inherited Primitive class and implemented those functions. This approach has some overhead (V-tables), but it is the most clean way to allow multiple primitives inside your scene.


2.) Yes - in my current ray tracing project I have "Math" part of project, which contains "Numeric" folder - where you have all numeric classes needed -> float4 (4 component vector), mat4 (4x4 matrix), quat (quaternion - to represent spatial rotations).


Apart from that I also have "Geometric" folder - where I do have all primitives (Ray, Plane, Axis-Aligned Bounding Box, Sphere, Triangle, etc. - note. even though I use only triangles inside the scene it is important to note, that the others are also used inside the project).


Even though it is not absolutely necessary to distinguish the Math and Geometric structures, I recommend to do so (as your ray tracer will grow it can become bloody mess and then you will need refactoring).


3. and 4.) For naive ray tracing this is correct - like:

for (y = 0 ... height)
    for (x = 0 ... width)
        shadePixel(x, y);

Now, in naive (very simple ray tracer) your shadePixel function will look like:

shadePixel(x, y)
    d = INFINITY;
    id = -1;
    hit = false;
    for (i ... numOfObjects)
        if (id' = intersect(object[i], ray[x, y], &d'))
            if (d' < d & d' > 0)
                d = d'
                id = id'
                hit = true
    if (hit == true)
        pixel[x, y] = object[id].color

This is very basic shadePixel function - it guarantees that you always write out color of closest hit along the ray (that is non-negative).


I recommend to do 5. and 6.) once you have at least basic functionality like the above written. Hint - you can use recursion inside shadePixel, you just have to calculate secondary ray direction and origin (origin is a hitpoint, direction depends on what material have you hit).


If you update the progress (and possibly show results and some parts of code - I can give you further hints ;).

#5192613 Good C (not C++) math library?

Posted by Vilem Otte on 13 November 2014 - 03:18 AM

#Nairou C11 won't be in MSVC for quite long time (As Ravyne said there isn't a solid C99 compatibility as for now). Also I don't think Microsoft sees C compiler as their main focus (they focus on the C# mainly), and I don't really believe the situation will get better in next years.


I would second SIMDx86, although it has been 2 years since the last time I've used it - it is now marked as "inactive" on SF (there is SSE, SSE3 and Altivec implementation as for now - no AVX).

#5190071 Tube AreaLight - Falloff problem

Posted by Vilem Otte on 29 October 2014 - 08:34 PM



Important note - I'm trying to describe area lighting a bit, I couldn't test your code at this point. Anyways maybe you find my notes useful:


What we have - a tube light defined by 2 points (L0 and L1) and radius ® ... (for the simplicity let us assume that it forms a capsule shape). What we need to calculate for each point we're shading (P):

  • Closest point on tube (this can be used for attenuation)

Our 2 points define a line. Let us define that line as L0 + t * (L1 - L0) ... and we're searching for 't'. This can be done simply using formula derived here -> http://mathworld.wolfram.com/Point-LineDistance3-Dimensional.html - e.g.: t = dot(L1 - P, L2 - L1) / dot(L1 - L0, L1 - L0) - now we clamp t between 0 and 1 (e.g. if it is smaller than 0, then it equals 0 ... if it is larger than 1, then it equals 1). Thus we got our closest point on line... (lets note it as Cl)


Closest point on capsule (Cc) thus equals Cc = Cl + normalize(P - Cl) * r; (NOTE: for actual distance it is enough to use d = length(P - Cl) - r).


From here we can correctly compute attenuation.

  • Calculate diffuse light

Now this is a challenge. What we actually need to compute right now is how large solid angle is occupied by the capsule in hemisphere above given point (in direction of its normal).


Note: Your approach in code (using 2 endpoints to light it) is just wrong - you are not lighting using area light, but just using 2 points.


Generally solving solid angle (which is possible for simple analytic shapes like capsule is), would give us how much of whole capsule is visible at given point, from here we can calculate diffuse light. I won't write full process of integration here (I've quickly searched net whether someone actually computed equation for solid angle of capsule and I couldn't find any results) - it would take some time with fully running brain (it is generally not a good idea to do this kind of computations at 3am), so I will try during tomorrow.


The problem can be also solved using Monte-Carlo approach - using some good random number generator generate N points over the capsule (Note: that they need to be uniformly distributed over its surface - this is also a challenge, especially for capsule!), and use these points to calculate diffuse light.


  • Calculate specular light

We can't really take approach of monte carlo here (actually we can, but N would really need to be large number - and thus it would be slow), because seeing specular of 16 point lights instead of reflected capsule wouldn't look really convincing (using 10000 on the other hand, for good sized capsule, would look convincing though - but that is insane approach for realtime graphics).


We can do better, all we need is to derive ray-capsule intersection test (actually we're fine with ray-sphere and ray-cylinder tests - as every capsule can be composed of these ... Note: we could also use these 3 to simplify integration inprevious step). For each shading pixel we generate reflection vector and test that one against capsule (which is reflected (aka specular) light).


Now this will work for non-rough surfaces (wet surfaces), for rough surfaces we could use this approach - we generate reflected ray and compute minimal distance between capsule and reflected ray (note, it can be simplified to minimal distance between 2 lines), based upon this distance and radius of our light we can create quite good looking smooth specular (that won't "sparkle", compared to direct ray-capsule intersection).

#5161607 float or double depth buffer?

Posted by Vilem Otte on 19 June 2014 - 07:19 PM

OK, I've tried the different versions (lowest time - no triangle on screen, highest time - closeup view, triangles all over the screen):

tie50.exe -> from 22ms to 44ms

tie51.exe -> from 19ms to 31ms

tie53.exe -> from 22ms to 43ms
tie53.exe -> from 19ms to 38ms

#5161178 float or double depth buffer?

Posted by Vilem Otte on 17 June 2014 - 05:58 PM

I've tried it - I can confirm that no malware or old school virus has been installed (or it overcame my AV here). Regarding speed, I got 22ms to 44ms on Haswell-based Core i3 (in laptop). So it was quite fast...


What was visible though, your depth test isn't precise, the parts on hull flickered, dunno why (maybe it was just on my machine, or it's just setup of projection matrix that has intentionally set up near and far plane to test the precision).

#5160806 Build a render farm using DirectX and radiosity?

Posted by Vilem Otte on 16 June 2014 - 03:44 AM

On Windows you don't physically need a monitor hooked up, but in order to use D3D the video card needs to be set up as a display device with a compatible WDDM display driver. The same goes for CUDA, and I would assume OpenGL/OpenCL (I don't have experience with those so I can't tell you for sure).


Actually with OpenCL, the situation is a little bit more "complicated". You can use different device types to execute your OpenCL kernels - as for now, there are 3 supported:

CL_DEVICE_TYPE_CPU - OpenCL will use host processor for computation. This option works fine in terminal without starting X (in linux), although you would need CPU-based renderfarm for this (of course with enough CPUs with lots of cores you can beat GPUs) - but it will most likely be more expensive.

CL_DEVICE_TYPE_GPU - OpenCL will use GPU device to perform the computations. Each vendor implemennts his own OpenCL driver differently:

  • AMD implementation allows running OpenCL applications without running actual xserver (I think you need root privileges to run the app like that - but the last time I tried this was like year ago, things might have changed since then - back then the feature was called "console mode" in clinfo ... since Catalyst 13.4 as far as I remember)
  • NVidia, I haven't digged into this one. I know it is possible for CUDA, so I bet you can do the same with their OpenCL.

CL_DEVICE_TYPE_ACCELERATOR - Special OpenCL accelerators (IBM CELL Blade or Xeon Phi), I would like to get my hands on Xeon Phi and try it out. But well, it's very expensive and also hard to get (I mean like compare it to any current GPU in terms of "what you get for your money")

#5160715 Build a render farm using DirectX and radiosity?

Posted by Vilem Otte on 15 June 2014 - 04:36 PM

Uhm... just to clarify - MJP most likely used ray traced radiosity (thats why Optix). I believe there are implementations of fast rasterizers for CUDA too (and also for OpenCL - if you prefer it) - in case you wouldn't like to use D3D in the end.


The thing is, if you'd go with CUDA/OpenCL instead of D3D (or alternative) - you will most likely end with better system (allowing you to do more ... with OpenCL you will have even more advantages, like you could run it on various systems) ... unless you dive into compute shaders (but still I'd recommend going with CUDA/OpenCL if that is the case).



but it's difficult to find CUDA guys here.

I have to disagree with this ... ph34r.png

#5153516 Graphics baseline for a good-looking PC game?

Posted by Vilem Otte on 14 May 2014 - 03:10 AM

Technically taken, it is possible to use tessellation along with F.e. QDM or POM or relief mapping (anyone actually tried it? If so, could you share a screenshot?) ... but once you can tessellate to the level where your triangles are at size of pixel (which isn't a problem on modern GPUs), why would you need & want to do some offset mapping - it won't help the model to look better at that point.

#5130820 how alive is oop?

Posted by Vilem Otte on 12 February 2014 - 09:17 AM

I thought of posting the "Still alive" song here, but the people could down vote me. Anyways it is very popular, although multi paradigm is getting even more popular (especially with C++ (since C++11) and C# (there are of course others - like named Scala).


I always stand on the side - use those weapons that fits the enemy -> e.g. use OOP on the place, where OOP is strong. OOP is weak in some areas too, so why not use Functional, Procedural, or any other that fits this problem ~ e.g. don't try to force OOP on places where it doesn't fit well.

#5128088 How to building sah bvh ?

Posted by Vilem Otte on 01 February 2014 - 09:42 PM

I think he already replied you with that information...


From what I know, Lauterbach's LBVH construction was implemented on GPU and shown some good results (although compared to F.e. split-bvh they were a lot worse), there is also hierarchical LBVH - aka HLBVH, that was also built on GPU and this one yields better results than standard LBVH.


Google some HLBVH papers, there are more of them.wink.png