Writing my own pathtracer on mobile device

Started by
55 comments, last by IsItSharp 9 years, 3 months ago

I decided before i go to anti-aliasing i want to figure out how the continuous refresh is done. Currently i just render the image once and i want that it continues with increasing the rays per pixel. Currently i always render a pixel as long as the rays per pixel count is. But if i want that continuous refresh i have to render the whole image first in one ray per pixel, then in two and so on. So the method with a pixel after another is not working here sad.png Any hints how i can implement progressive pathtracing?

Advertisement

What you generally want is a buffer where you accumulate samples and divide by number of sample count. I will elaborate more tomorrow as it is 3am here atm and there would be tons of mistakes from my side. wink.png

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Alright, for some reason I woke up very early in the morning... so let's get started:

For each pixel you want: c=1/n sum_0^n x_n

Where:

  • c is the resulting color of the pixel
  • n is the number of samples rendered
  • x_n is the sample

The most obvious and easiest way to perform progressive rendering is to use a buffer where you accumulate samples, like:


Buffer accumulator;
int samples;

...

RestartProgressiveRendering()
{
    accumulator.clear();
    samples = 0;
}

RenderOneStep()
{
    for each pixel (i)
    {
        accumulator[i] += ComputeSample(i);
    }
    samples++;
}

DisplayResult()
{
    for each pixel (i)
    {
        resultPixel[i] = accumulator[i] / (float)samples;
    }
}

Now, this approach works (note, the shown code is written like pseudo-code, you can do a ton of optimizations in there using parallelization and moving that ugly slow division prior to for loop and use multiplication inside).

Each time you move something in your scene (or your camera has moved) you have to call ResetProgressiveRendering. There are several problems with this approach:

  1. You have to hold 32-bit floating point buffer for accumulator (or 16-bit floating point, or 16-bit integer ... but you will hit precision problems early ... for 16-bit integer it will be as soon as 256 samples per pixel, for 16-bit fp probably a bit later (it depends!) ... 32-bit fp should be okay)
  2. 32-bit fp RGB buffer means ... width * height * 4 * 3 bytes of data at least, that is 4 times more than for 8-bit integer RGB buffer; it is not such a big deal on PC, but on android devices you don't want to waste most of the memory on output buffer

There is another approach, where you can also always keep the buffer you're already showing and divide at the point where you add new sample into buffer. Time for pseudo-code:


Buffer result;
int samples;

...

RestartProgressiveRendering()
{
    result.clear();
    samples = 0;
}

RenderOneStep()
{
    for each pixel (i)
    {
        result[i] = result[i] * samples / (float)(samples + 1) + ComputeSample(i) / (float)(samples + 1);
    }
    samples++;
}

DisplayResult()
{
    result[i].show();
}

Note, you must pre-compute samples / (float)(samples + 1) and 1.0f / (float)(samples + 1) ... most compilers are not intelligent enough to pre-compute these parts of the equation and you would be doing a lot more divisions (for 1000x1000 pixels it is 2M divisions per frame (from Intel Optimization Manual on modern Core i7 single fp division eats 24 clocks, multiplication 1 or 2 clocks (can't remember correctly) ... this means that by not pre-computing these you do at least 22 more clocks 2 milion times per frame ... thats 44M clocks wasted -> just to give you little insight into the optimization - and why it heavily matters here)

This solution has an advantage, if you do correct casting, result can be 8-bit integer RGB buffer - that means less memory, less fillrate used and that means more speed!

Of course it also has disadvantages:

  1. If you're not careful, you will lose precision (very quickly), otherwise you will jus lose it (definitely a lot faster than in the previous case for 32-bit fp solution). Although, if your path tracing algorithm is good enough (F.e. 4 samples per pixel at time, doing bi-directional path tracing ,etc.) it may be enough to combine just several frames of progressive resulting in smooth image (with smooth caustics!) ... quite good, don't you think?
  2. As the buffer where you sum samples is your result buffer, it may cause some troubles with tone-mapping or such (in case you combine your samples wrongly - because those inside result buffer will be already after tone mapping (in case of tone mapping)) ... or you can use F.e. 16-bit fp RGB buffer instead, and do tone mapping after everything (which will work well too!).

It is up to you which one will you select, honestly on Android I'd prefer going the way that uses the least memory and the least fillrate (and the least processing power), as there isn't that much computing power as on the PC. Note, inside NTrace project we have a path tracer (actually there is more path tracing algorithms, they use different sampling strategy, etc.) that uses 2nd solution on PCs and it works well (if I remember correctly we got quite good results with 8-bit integer targets there too!).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Thank you very much for the explanations i really appreciate your help smile.png

I implemented your first version like this:


    public void RestartProgressiveRendering(){
        this.accumulator = new float[this.canvas.getWidth() * this.canvas.getHeight()];
        this.samples = 0;
    }

    public void RenderOneStep(){
        int width = this.canvas.getWidth();
        int height = this.canvas.getHeight();
        int i = 0;
        for(int x = 0; x < width; x++){
            for(int y = 0; y < height; y++){
                CColor sample = ComputeSample(x,y,width,height);
                accumulator[i] = Color.rgb((int)sample.r, (int)sample.g, (int)sample.b);
                i++;
            }
        }

        samples++;
    }

    public Canvas DisplayResult(){
        int width = this.canvas.getWidth();
        int height = this.canvas.getHeight();

        Paint p = new Paint();
        int i = 0;
        for(int x = 0; x < width; x++){
            for(int y = 0; y < height; y++){
                int value = (int)(accumulator[i] / (float)samples);
                p.setColor(value);
                this.canvas.drawPoint(x,y,p);
                i++;
            }
        }

        return this.canvas;
    }

    private CColor ComputeSample(int x, int y, int width, int height){
        float fov = 160.0f * (float)Math.PI / 180.0f;
        float zdir = 1.0f / (float)Math.tan(fov);
        float aspect = (float)height / (float)width;

        float xdir = (x / (float) width) * 2.0f - 1.0f;
        float ydir = ((y / (float) height) * 2.0f - 1.0f) * aspect;

        Vector3D direction = new Vector3D(xdir, ydir, zdir).normalize();
        Ray ray = new Ray(Camera, direction);

        return Trace(ray, 1);
    }

It works good now i have to look how i can move the computing out in a second process independently from the main UI-Thread.

Uhm, I'm not sure why you don't get exceptions with that code (as you should get them, there are few things done wrong + you won't see progressive at all afaik) ... time to refresh my java memory:


private float[] accumulator;
private int samples;

// You need 1 float value per channel!
public void RestartProgressiveRendering() {
    this.accumulator = new float[this.canvas.getWidth() * this.canvas.getHeight() * 3];
    this.samples = 0;
}

public void RenderOneStep() {
    int width = this.canvas.getWidth();
    int height = this.canvas.getHeight();
    int i = 0;
    for (int x = 0; x < width; x++) {
        for (int y = 0; y < height; y++) {
            CColor sample = ComputeSample(x, y, width, height);
            // The accumulator is used to sum all the samples since last RestartProgressiveRendering, thus you 
            // have to either use +=, or write it this-like
            accumulator[i * 3 + 0] = accumulator[i * 3 + 0] + sample.r;
            accumulator[i * 3 + 1] = accumulator[i * 3 + 1] + sample.g;
            accumulator[i * 3 + 2] = accumulator[i * 3 + 2] + sample.b;
            i++;
        }
    }

    samples++;
}

public Canvas DisplayResult() {
    int width = this.canvas.getWidth();
    int height = this.canvas.getHeight();
    Paint p = new Paint();
    int i = 0;
    for (int x = 0; x < width; x++) {
        for (int y = 0; y < height; y++) {
            // Here you get each single channel from accumulator and divide by pixels
            this.canvas.drawPoint(Color.rgb((int)(accumulator[i * 3 + 0] / (float)samples), 
                (int)(accumulator[i * 3 + 1] / (float)samples), 
                (int)(accumulator[i * 3 + 2] / (float)samples)));
            i++;
        }
    }

    ...
}

Just few notes on optimization:

  • in public void RestartProgressiveRendering method, you shouldn't re-allocate the buffer (even though allocations sometimes seems to be cheaper in Java virtual machines than in compiled languages, it isn't as cheap as zero-outing the buffer). The (re)allocation should be put into separate method called Resize (you should call that when user F.e. rotates screen)
  • Using canvas for this is in my opinion not performance wise. Technically you want to use OpenGL ES PBO (which is available for ES 3.0), for older versions there is GL_OES_EGL_image which works almost the same (for transforming data from CPU memory to GPU memory and thus better for rendering).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Uhm, I'm not sure why you don't get exceptions with that code (as you should get them, there are few things done wrong + you won't see progressive at all afaik)

It worked okay but the colors were of course wrong:

screenshot2014g49peu0fb1.png

With your method:

screenshot2014pbzsar49vg.png

Thank you so much for helping me!!! biggrin.png

Congratulations, you've just built your first path tracer that can converge. It still has some serious problems (apart from problems when handling large scale scenes), when you overcome these it will start becoming useful smile.png.

  1. Convergence in directly lit areas takes ages, convergence in shadowed areas too.
  2. Your ray generation when you hit surface is... well, naive is the best word describing it (we denote techniques (or algorithms) as naive, when they produce correct result (and they are simple), but they are definitely not efficient)
  3. Allowing for different shapes and/or materials
  4. Allowing for TONs of shapes and materials in scene

Okay, brace for informations!

1. Explicit path tracing

The general problem of path tracing is this: If the path we're following doesn't hit the light, the contribution of this path is zero. The smaller your lights get, the lower probability of hitting them we have (and for perfect point lights we are at zero probability of hitting the light!).

But, don't worry - we don't write some black magic code to have fast path tracers for direct lights. The actual idea is quite simple, if at each given step in path we can connect to (any) light the path will have some contribution to the result (aka, we separate direct lighting from indirect lighting). I've did the full derivation in my Bachelor thesis few years ago (for full derivation, see chapter 3.2 http://is.muni.cz/th/396530/fi_b/Bachelor.pdf, you don't need to remember that though ... shameless self-promotion ph34r.png) - the whole idea works like this, at each step in path we sample one (random) light, compute direct illumination from it (note, we need to cast shadow ray to determine visibility between current path step hitpoint and random point on that light) and continue in the path (if we accidentally hit the light, we have to discard that value!).

This way it is quite easy to achieve very good looking images in few samples per pixel ... there will still be problems (caustics still takes quite long to converge, dark room standing next to lit room, etc.) - they also have solutions (Bi-Directional Path Tracing).

Alright, I'm talking enough for this - time for some code:


float4 trace(Ray r, Scene s) {
    // Calculate next step on path
    RayResult res = s.findClosestIntersection(r);
    float4 hitPoint = r.origin + r.direction * res.distance * EPS;
    float3 normal = s.objects[res.hitID].getNormal(res.barycentric);
    float3 color= s.objects[res.hitID].getColor();

    float4 result = float4(0.0, 0.0, 0.0, 0.0);

    // Explicit step
    if(s.objects[res.hitID].isLight() == false) {
        int lightID = s.getRandomLight();
        float4 randomLightPoint = s.objects[lightID].getRandomPosition();
        Ray shadow = new Ray(hitPoint, (randomLightPoint - hitPoint).normalize());
        RayResult shadowResult = s.findClosestIntersection(shadow);
        if (s.objects[shadowResult.hitID].isLight() == true) {
            result = dot(normal, shadow.getDirection()) * attenuation * color * s.objects[shadowResult.hitID].getColor();
        }
    }

    // Next step, beware, you have to correctly weight depending on your PDF!
    return result + weight * color * trace(r.generateNextPathStep(), s);
}

2. Sampling....

Assuming you are still reading smile.png ... there is a way how to heavily implement areas where there is only indirect illumination. Now a little theory - before I described a way you use for generating your ray directions, the way works, but it is bad - you are working with perfectly diffuse materials (as for now, once you just into reflective you will have to change your ray generator to work for them), now the current generator will work for reflected rays too, but it has to (randombly) pick exactly the ray in direction of perfect reflection of incident ray (btw. probability of doing that is equal to 0). That is why we use different ray generator for reflected rays.

Where am I getting? All your samples should be weighted by cosine of angle between ray and normal of the surface (to keep energy conservation and of course to be physically correct). Now, as each sample is multiplied by the cosine of that angle - it means that rays at grazing (high angles - almost parallel with the surface they are hitting) have very, very low weight (their contribution is almost zero). Technically we want to generate more rays in direction similar to normal and less rays in directions close to be perpendicular to normal.

For further description you can also look inside my thesis (Chapter 3.3). There is also a code provided for faster uniform random number generator (and thus ray direction generator), and for cosine-weighted random number generator (better for diffuse surfaces). Of course for each different shading model you have another "ideal" ray generator. This technique is called importance sampling (you compute mainly those samples that are important).

3. Shapes & Materials

This is the part where you most likely get very soon, I'd advice to implement previous 2 before that (1. will allow you to see the resulting image very fast, thus increasing your speed in debugging shapes and materials; 2. (if implemented correctly & in a clever way) will allow you to use different sampling methods for different materials, which is very, very important for any useful path tracer.

4. Acceleration Structures

Once you have 1, 2, 3, and 4 - it is time to jump into large scenes. I won't dive into them now, but to give you little preview:

gallery_102163_282_2527673.png

This Crytek Sponza image is what I've rendered just now on NVidia Mobility GeForce 720M (which is low end), at full laptop display resolution (1366x768 ... minus something on window borders), frame rate 12 fps; On Radeon R7 (the power of desktop) next to it runs at realtime frame rates (I didn't do the benchmark though) ... that is why acceleration structures are so important.

Note, there is some aliasing, because - no AA filter is running in that image, no texture filter is running in that image (and yes, it decreases performance), no packet tracing or other kind of hack is used, the transparent materials are root of all evil (I have to terminate ray and create new one), acceleration structure is SplitBVH, ray tracing runs on OpenCL, the resulting buffer is showed using OpenGL and PBO.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Thank you very much for your detailed explanations Vilem Otte, i really appreciate your help smile.png

My next step before going to something else is to implement anti-aliasing. So i read something and in my ComputeSample() Function i now do this:


        CColor color = new CColor(0.0f, 0.0f, 0.0f);

        for(int row = 0; row < 2; row++){
            for(int col = 0; col < 2; col++){
                Camera.x = Camera.x + (col + 0.5f) / 2.0f;
                Camera.y = Camera.y + (row + 0.5f) / 2.0f;
                Vector3D direction = new Vector3D(xdir, ydir, zdir).normalize();
                Ray ray = new Ray(Camera, direction);
                color.add(Trace(ray,1));
            }
        }

        return color.divide(2.0f);

So i shoot now 4 rays per pixel and average the resulting color. But now my image turns completely black, why is that?

Edit:

Ah, now i tried it with this way:


        CColor color = new CColor(0.0f, 0.0f, 0.0f);

        for(int row = 0; row < 2; row++){
            for(int col = 0; col < 2; col++){
                Vector3D direction = new Vector3D(xdir, ydir, zdir).normalize();
                Ray ray = new Ray(Camera, direction);
                color.add(Trace(ray,1));
            }
        }

        return color.divide(4.0f);

I think the edges are smoother now but now everything is in black and white

Why your image turned black (respectively is black and white).

CColor class uses 1 byte to store each color channel (0-255), you are summing 4 values of range (0-255), you have overflow! You have to take care while working with data types - your have to sum colors F.e. in 32-bit value per channel. E.g.:


int r = 0;
int g = 0;
int b = 0;

for(int row = 0; row < 2; row++) {
    for(int col = 0; col < 2; col++){
        Vector3D direction = new Vector3D(xdir, ydir, zdir).normalize();
        Ray ray = new Ray(Camera, direction);
        CColor c = Trace(ray,1);
        r = r + (int)c.r;
        g = g + (int)c.g;
        b = b + (int)c.b;
    }
}

r = r / 4;
g = g / 4;
b = b / 4;

return new CColor(r, g, b);

You can also use floats instead of int (or technically even short - 16-bit integers - will work). That is why you saw the issue you saw.

Ad. CColor - it stores the color value in 1 integer - 1 byte for red, 1 byte for green, 1 byte for blue and 1 byte for alpha; technically you can initialize it with float, but it will still store channel just in 1 byte. This is why I'd recommend creating your own class for holding color (in our software we use "float4" - 4 floats packed in class). Why? Simply because we need more range than 1 byte per channel (and using of CColor-like type may be confusing - you can use it, for output - but not for computations).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Okay now it works: i did something wrong in the Color.divide function. But my CColor class uses floats as RGB values! But if i use this method:


      CColor color = new CColor(0.0f, 0.0f, 0.0f);

        Vector3D direction = new Vector3D(xdir, ydir, zdir).normalize();

        for(int row = 0; row < 2; row++){
            for(int col = 0; col < 2; col++){
                Camera.x = Camera.x + (col + 0.5f) / 2.0f;
                Camera.y = Camera.y + (row + 0.5f) / 2.0f;
                Ray ray = new Ray(Camera, direction);
                color.add(Trace(ray,1));
            }
        }

        return color.divide(4.0f);

the result is still black.

This topic is closed to new replies.

Advertisement