# DX12 My Software Triangle Rasterizer

This topic is 4040 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi all! Last weekend I took the time to code a class which can rasterize triangles basing my work on some code I could find on the net, particularly this one over at devmaster. Over the course of this week I extended it and now I would say it is a complete solution. Features:
• The rasterizer can support an arbitrary number of render targets (you will most likely use two, a color buffer and a depth buffer)
• The rasterization is completely decoupled from the actual shading of the visible pixels. You can configure the rasterizer with a pixel shader which does the actual work of computing and assigning a color value.
• It does only use integer math because i intend to use it on the GP2X which does not have an FPU.
• The rasterizer is tile based. Currently it uses blocks of 8x8 pixels.
• It interpolates an arbitrary number of integer varyings across the triangle (so you can use fixed point here). This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.
• It supports a clipping rectangle.
• It provides a means for the pixel shader to compute the derivative of the interpolated varyings. This is needed for example to compute the texture mimmap level from the texture coordinates.
• It allows for an early depth test. For example the shader could store the minimum depth value for each 8x8 block and than discad a whole block if the minimum expected depth value for this block is greater than the one stored.
• The source code is actually quite short ~600 lines with a lot of comments.
The only problem I can see right now is with small or large but thin triangles. Because the rasterizer is tile based it must at least scan a whole 8x8 block and test each pixel if it is inside the triangle or not. Large triangles are handled quite efficiently since for the inner part only the corners are tested for inout. What do you think about this. How big a performance problem might this be when targeting the GP2X? I include the code here for everyoune to look at. I would be very thankful for any input of possible improvements.
Header file:
/*

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#ifndef RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3
#define RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3

class Rasterizer {
public:
// some constants
static const int MAX_RENDER_TARGETS = 2;
static const int MAX_VARYING = 8;
static const int BLOCK_SIZE = 8;

public:
// Type definitions
struct Vertex {
int x, y; // in 28.4 fixed point
int z; // range from 0 to 0x7fffffff
int w; // in 16.16 fixed point
int varyings[MAX_VARYING];
};

// holds the pointers to each render target.
// These will be passed to the fragment shader which then can write to
// the pointed to location. Note: Only the first n pointers will be
// valid where n is the current number of render targets
struct BufferPointers {
void* ptr[MAX_RENDER_TARGETS];
};

// This is the data the fragment shader gets
struct FragmentData {
int z;
int varying[MAX_VARYING];
};

typedef FragmentData PixelBlock[BLOCK_SIZE][BLOCK_SIZE] ;

class RenderTarget {
public:
virtual int width() = 0;
virtual int height() = 0;
virtual int stride() = 0;
virtual void *buffer_pointer() = 0;
virtual int element_size() = 0;
virtual void clear(int x, int y, int w, int h) = 0;
};

public:
// This provides a means for an early depth test.
// x and y are the coordinates of the upper left corner of the current block.
// If the shader somewhere stores the minimum z of each block that value
// can be compared to the parameter z.
// returns false when the depth test failed. In this case the whole block
// can be culled.
virtual bool early_depth_test(int x, int y, int z) { return true; }

// This notifies the shader of any render target clears.
// This is meant to be used in conjunction with the early depth test to update
// any buffers used
virtual void clear(int target, int x, int y, int w, int h) {}

// To compute the mipmap level of detail one needs the derivativs in x and y of
// the texture coordinates. These can be computed from the values in the pixel
// block since all the fragment values have alredy been computed for this block
// when this is called
virtual void prepare_for_block(int x, int y, PixelBlock b) {}

// This tells the rasterizer how many varyings this fragment shader needs
virtual int  varying_count() = 0;

// This is called once for each visible fragment inside the triangle
// x and y are the coordinates within the block [0, BLOCK_SIZE[
// the pixel block is indexed with p[y][x] !!!
virtual void shade(const BufferPointers&, const PixelBlock& b, int x, int y) = 0;
};

private:
// Variables
struct RenderTargetParams {
int count;
int minwidth, minheight;

// cache these params to avoid too
// many virtual function calls
int stride[MAX_RENDER_TARGETS];
int element_size[MAX_RENDER_TARGETS];
} rendertarget_params_;

RenderTarget *rendertargets_[MAX_RENDER_TARGETS];

struct {
int x0, y0, x1, y1;
} clip_rect_;

private:
bool setup_valid();

public:
// constructor
Rasterizer();

public:
// main interface

// Set the render targets.
// This resets the clipping rectangle
void rendertargets(int n, RenderTarget* rt[]);

void clear();
void clear(int target);

void clip_rect(int x, int y, int w, int h);

// The triangle must be counter clockwise in screen space in order to be
// drawn.
void draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3);
};

#endif


Implementation file:
/*

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include "rasterizer.h"

#include <cmath>
#include <cassert>
#include <algorithm>

#ifndef _MSC_VER
#include <stdint.h>
#else
#include "stdint.h"
#endif

////////////////////////////////////////////////////////////////////////////////
// utility functions

namespace {
inline int min(int a, int b, int c)
{
return std::min(std::min(a,b), c);
}

inline int max(int a, int b, int c)
{
return std::max(std::max(a,b), c);
}

inline void compute_plane(
int v0x, int v0y,
int v1x, int v1y,
int v2x, int v2y,
int z0, int z1, int z2, int64_t plane[4])
{
const int px = v1x - v0x;
const int py = v1y - v0y;
const int pz = z1 - z0;

const int qx = v2x - v0x;
const int qy = v2y - v0y;
const int qz = z2 - z0;

/* Crossproduct "(a,b,c):= dv1 x dv2" is orthogonal to plane. */
const int64_t a = (int64_t)py * qz - (int64_t)pz * qy;
const int64_t b = (int64_t)pz * qx - (int64_t)px * qz;
const int64_t c = (int64_t)px * qy - (int64_t)py * qx;
/* Point on the plane = "r*(a,b,c) + w", with fixed "r" depending
on the distance of plane from origin and arbitrary "w" parallel
to the plane. */
/* The scalar product "(r*(a,b,c)+w)*(a,b,c)" is "r*(a^2+b^2+c^2)",
which is equal to "-d" below. */
const int64_t d = -(a * v0x + b * v0y + c * z0);

plane[0] = a;
plane[1] = b;
plane[2] = c;
plane[3] = d;
}

inline int solve_plane(int x, int y, const int64_t plane[4])
{
assert(plane[2] != 0);
return (int)((plane[3] + plane[0] * x + plane[1] * y) / -plane[2]);
}

template <int denominator>
inline void floor_divmod(int numerator, int &floor, int &mod)
{
assert(denominator > 0);
if(numerator >= 0) {
// positive case, C is okay
floor = numerator / denominator;
mod = numerator % denominator;
} else {
// Numerator is negative, do the right thing
floor = -((-numerator) / denominator);
mod = (-numerator) % denominator;
if(mod) {
// there is a remainder
floor--; mod = denominator - mod;
}
}
}

// Fixed point division
template <int p>
inline int32_t fixdiv(int32_t a, int32_t b)
{
#if 0
return (int32_t)((((int64_t)a) << p) / b);
#else
// The following produces the same results as the above but gcc 4.0.3
// generates fewer instructions (at least on the ARM processor).
union {
int64_t a;
struct {
int32_t l;
int32_t h;
};
} x;

x.l = a << p;
x.h = a >> (sizeof(int32_t) * 8 - p);
return (int32_t)(x.a / b);
#endif
}

// Perform a fixed point multiplication using a 64-bit intermediate result to
// prevent overflow problems.
template <int p>
inline int32_t fixmul(int32_t a, int32_t b)
{
return (int32_t)(((int64_t)a * b) >> p);
}
} // end anonymous namespace

////////////////////////////////////////////////////////////////////////////////

Rasterizer::Rasterizer() :
{
rendertarget_params_.count = 0;
}

bool Rasterizer::setup_valid()
{
return rendertarget_params_.count >= 1 &&
}

void Rasterizer::clear()
{
for (int i = 0; i < rendertarget_params_.count; ++i)
clear(i);
}

void Rasterizer::clear(int target)
{
assert(target <= rendertarget_params_.count);
rendertargets_[target]->clear(0, 0,
rendertarget_params_.minwidth, rendertarget_params_.minheight);

// notify shader about clear (might want to update internal data structutes)
rendertarget_params_.minwidth, rendertarget_params_.minheight);
}

void Rasterizer::clip_rect(int x, int y, int w, int h)
{
if (rendertarget_params_.count == 0) return;

clip_rect_.x0 = std::max(0, x);
clip_rect_.y0 = std::max(0, y);
clip_rect_.x1 = std::min(x + w, rendertarget_params_.minwidth);
clip_rect_.y1 = std::min(y + h, rendertarget_params_.minheight);
}

////////////////////////////////////////////////////////////////////////////////
// main interface

void Rasterizer::rendertargets(int n, RenderTarget* rt[])
{
assert(n <= MAX_RENDER_TARGETS);
RenderTargetParams &rtp = rendertarget_params_;
rtp.count = n;

if (n == 0) return;

rtp.minwidth = rt[0]->width();
rtp.minheight = rt[0]->height();
for (int i = 0; i < n; ++i) {
rendertargets_ = rt;
rtp.minwidth = std::min(rtp.minwidth, rt->width());
rtp.minheight = std::min(rtp.minheight, rt->height());

// cache these to avoid too many virtual function calls later
rtp.element_size = rt->element_size();
rtp.stride = rt->stride();
}

clip_rect_.x0 = 0;
clip_rect_.y0 = 0;
clip_rect_.x1 = rtp.minwidth;
clip_rect_.y1 = rtp.minheight;
}

{
assert(fs != 0);
}

void Rasterizer::draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3)
{
if (!setup_valid()) return;

int64_t zPlane[4];
int64_t wPlane[4];
int64_t vPlane[MAX_VARYING][4];
compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, v1.z, v2.z, v3.z, zPlane);
compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, // interpolate 1/w across triangle
fixdiv<16>(1 << 16, v1.w),
fixdiv<16>(1 << 16, v2.w),
fixdiv<16>(1 << 16, v3.w),
wPlane);
for (int i = 0; i < varying_count; ++i)
compute_plane(
v1.x, v1.y, v2.x, v2.y, v3.x, v3.y,
fixdiv<16>(v1.varyings, v1.w),
fixdiv<16>(v2.varyings, v2.w),
fixdiv<16>(v3.varyings, v3.w),
vPlane
);

// Deltas
const int DX12 = v1.x - v2.x;
const int DX23 = v2.x - v3.x;
const int DX31 = v3.x - v1.x;

const int DY12 = v1.y - v2.y;
const int DY23 = v2.y - v3.y;
const int DY31 = v3.y - v1.y;

// Fixed-point deltas
const int FDX12 = DX12 << 4;
const int FDX23 = DX23 << 4;
const int FDX31 = DX31 << 4;

const int FDY12 = DY12 << 4;
const int FDY23 = DY23 << 4;
const int FDY31 = DY31 << 4;

// Bounding rectangle
int minx = (min(v1.x, v2.x, v3.x) + 0xF) >> 4;
int maxx = (max(v1.x, v2.x, v3.x) + 0xF) >> 4;
int miny = (min(v1.y, v2.y, v3.y) + 0xF) >> 4;
int maxy = (max(v1.y, v2.y, v3.y) + 0xF) >> 4;

// consider clipping rectangle
minx = std::max(minx, clip_rect_.x0);
maxx = std::min(maxx, clip_rect_.x1);
miny = std::max(miny, clip_rect_.y0);
maxy = std::min(maxy, clip_rect_.y1);

// Start in corner of 8x8 block
minx &= ~(BLOCK_SIZE - 1);
miny &= ~(BLOCK_SIZE - 1);

BufferPointers buffers;

for (int i = 0; i < rendertarget_params_.count; ++i)
buffers.ptr = (char*)rendertargets_->buffer_pointer() + miny * rendertargets_->stride();

// Half-edge constants
int C1 = DY12 * v1.x - DX12 * v1.y;
int C2 = DY23 * v2.x - DX23 * v2.y;
int C3 = DY31 * v3.x - DX31 * v3.y;

// Correct for fill convention
if(DY12 < 0 || (DY12 == 0 && DX12 > 0)) C1++;
if(DY23 < 0 || (DY23 == 0 && DX23 > 0)) C2++;
if(DY31 < 0 || (DY31 == 0 && DX31 > 0)) C3++;

// Loop through blocks
for(int y = miny; y < maxy; y += BLOCK_SIZE)
{
for(int x = minx; x < maxx; x += BLOCK_SIZE)
{
// Corners of block
int x0 = x << 4;
int x1 = (x + BLOCK_SIZE - 1) << 4;
int y0 = y << 4;
int y1 = (y + BLOCK_SIZE - 1) << 4;

// Evaluate half-space functions
bool a00 = C1 + DX12 * y0 - DY12 * x0 > 0;
bool a10 = C1 + DX12 * y0 - DY12 * x1 > 0;
bool a01 = C1 + DX12 * y1 - DY12 * x0 > 0;
bool a11 = C1 + DX12 * y1 - DY12 * x1 > 0;
int a = (a00 << 0) | (a10 << 1) | (a01 << 2) | (a11 << 3);

bool b00 = C2 + DX23 * y0 - DY23 * x0 > 0;
bool b10 = C2 + DX23 * y0 - DY23 * x1 > 0;
bool b01 = C2 + DX23 * y1 - DY23 * x0 > 0;
bool b11 = C2 + DX23 * y1 - DY23 * x1 > 0;
int b = (b00 << 0) | (b10 << 1) | (b01 << 2) | (b11 << 3);

bool c00 = C3 + DX31 * y0 - DY31 * x0 > 0;
bool c10 = C3 + DX31 * y0 - DY31 * x1 > 0;
bool c01 = C3 + DX31 * y1 - DY31 * x0 > 0;
bool c11 = C3 + DX31 * y1 - DY31 * x1 > 0;
int c = (c00 << 0) | (c10 << 1) | (c01 << 2) | (c11 << 3);

// Skip block when outside an edge
if(a == 0x0 || b == 0x0 || c == 0x0) continue;

#define CLIP_TEST(X, Y) 				((X) >= clip_rect_.x0 && (X) < clip_rect_.x1 && 				(Y) >= clip_rect_.y0 && (Y) < clip_rect_.y1)

// test for the clipping rectangle
bool clip00 = CLIP_TEST(x, y);
bool clip10 = CLIP_TEST(x + 7, y);
bool clip01 = CLIP_TEST(x, y + 7);
bool clip11 = CLIP_TEST(x + 7, y + 7);

// skip block if all is clippled
if (!clip00 && !clip10 && !clip01 && !clip11) continue;
bool clip_all_in = clip00 && clip10 && clip01 && clip11;

//! compute attribute interpolants at corners
FragmentData f00;
FragmentData f10;
FragmentData f01;
FragmentData f11;

int xx1 = (x + BLOCK_SIZE) << 4;
int yy1 = (y + BLOCK_SIZE) << 4;
f00.z = solve_plane(x0, y0, zPlane);
f10.z = solve_plane(xx1, y0, zPlane);
f01.z = solve_plane(x0, yy1, zPlane);
f11.z = solve_plane(xx1, yy1, zPlane);

if (!fragment_shader_->early_depth_test(x, y, std::min(std::min(std::min(f00.z, f10.z), f01.z), f11.z)))
continue;

int w00 = fixdiv<16>(1 << 16, solve_plane(x0, y0, wPlane));
int w10 = fixdiv<16>(1 << 16, solve_plane(xx1, y0, wPlane));
int w01 = fixdiv<16>(1 << 16, solve_plane(x0, yy1, wPlane));
int w11 = fixdiv<16>(1 << 16, solve_plane(xx1, yy1, wPlane));
for (int i = 0; i < varying_count; ++i) {
f00.varying = fixmul<16>(solve_plane(x0, y0, vPlane), w00);
f10.varying = fixmul<16>(solve_plane(xx1, y0, vPlane), w10);
f01.varying = fixmul<16>(solve_plane(x0, yy1, vPlane), w01);
f11.varying = fixmul<16>(solve_plane(xx1, yy1, vPlane), w11);
}

//! compute attribute step y left and right
struct varying_step_t {
struct step_info_t {
int step;
int rem;
int error_term;
step_info_t():error_term(0){}

int dostep() {
int r = step;
error_term += rem;
if (error_term >= BLOCK_SIZE) {
error_term -= BLOCK_SIZE;
r++;
}
return r;
}
};

step_info_t z;
step_info_t varying[MAX_VARYING];

varying_step_t(FragmentData& p1, FragmentData& p2, int vc)
{
floor_divmod<BLOCK_SIZE>(p2.z - p1.z, z.step, z.rem);
for (int i = 0; i < vc; ++i) {
floor_divmod<BLOCK_SIZE>(p2.varying - p1.varying, varying.step, varying.rem);
}
}
};

varying_step_t step_left(f00, f01, varying_count);
varying_step_t step_right(f10, f11, varying_count);

BufferPointers block_buffers = buffers;

#define RENDER_TARGET_LOOP 				for (int i = 0; i < rendertarget_params_.count; ++i)

#define STEP_POINTERS_BY_ELEMENTSIZE(VAR, FACTOR) 				{ 					RENDER_TARGET_LOOP 					(char*&)VAR.ptr += FACTOR * rendertarget_params_.element_size; 				}

#define	STEP_POINTERS_BY_STRIDE(VAR) 				{ 					RENDER_TARGET_LOOP 					(char*&)VAR.ptr += rendertarget_params_.stride; 				}

#define STEP_FRAGMENTDATA(FDVAR, STEPVAR) 				{ 					FDVAR.z += STEPVAR.z.dostep(); 					for (int i = 0; i < varying_count; ++i) 						FDVAR.varying += STEPVAR.varying.dostep(); 				}

// only copy the neccessary varyings
#define EFFICIENT_COPY(SRC, DST) 				{ 					DST.z = SRC.z; 					for (int i = 0; i < varying_count; ++i) 						DST.varying = SRC.varying; 				}

#define BLOCK_BEGIN 				fragment_shader_->prepare_for_block(x, y, pixel_block); 				for (int iy = 0; iy < BLOCK_SIZE; ++iy) { 					BufferPointers inner = block_buffers; 					STEP_POINTERS_BY_ELEMENTSIZE(inner, x); 					for (int ix = 0; ix < BLOCK_SIZE; ++ix) {

#define BLOCK_END 						STEP_POINTERS_BY_ELEMENTSIZE(inner, 1); 					} 					STEP_POINTERS_BY_STRIDE(block_buffers); 				}

PixelBlock pixel_block;

bool skip_flag[BLOCK_SIZE][BLOCK_SIZE];
memset(skip_flag, 0, sizeof(skip_flag));

if (!clip_all_in) {
for (int iy = 0; iy < BLOCK_SIZE; ++iy)
for (int ix = 0; ix < BLOCK_SIZE; ++ix)
if (!CLIP_TEST(ix + x, iy + y))
skip_flag[iy][ix] = true;
}

// Accept whole block when totally covered
if(a == 0xF && b == 0xF && c == 0xF)
{
// first compute all fragment data
for(int iy = 0; iy < BLOCK_SIZE; iy++)
{
//! compute attribute step x for this scanline
varying_step_t stepx(f00, f10, varying_count);
FragmentData fragment_data = f00;

for(int ix = 0; ix < BLOCK_SIZE; ix++)
{
EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]);
STEP_FRAGMENTDATA(fragment_data, stepx);
}

//! step left and right attrib y
STEP_FRAGMENTDATA(f00, step_left);
STEP_FRAGMENTDATA(f10, step_right);
}

//! fragment_shader_block (can now use derivatives of attributes)
if (clip_all_in) {
BLOCK_BEGIN
BLOCK_END
} else {
BLOCK_BEGIN
if (!skip_flag[iy][ix])
BLOCK_END
}
}
else // Partially covered block
{
int CY1 = C1 + DX12 * y0 - DY12 * x0;
int CY2 = C2 + DX23 * y0 - DY23 * x0;
int CY3 = C3 + DX31 * y0 - DY31 * x0;

for(int iy = 0; iy < BLOCK_SIZE; iy++)
{
int CX1 = CY1;
int CX2 = CY2;
int CX3 = CY3;

//! compute attribute step x for this scanline
varying_step_t stepx(f00, f10, varying_count);

FragmentData fragment_data = f00;

for(int ix = 0; ix < BLOCK_SIZE; ix++)
{
if(!(CX1 > 0 && CX2 > 0 && CX3 > 0))
skip_flag[iy][ix] = true;

// we still need to do this since the fragment shader might want
// to compute the derivative of attibutes
EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]);

CX1 -= FDY12;
CX2 -= FDY23;
CX3 -= FDY31;

STEP_FRAGMENTDATA(fragment_data, stepx);
}

CY1 += FDX12;
CY2 += FDX23;
CY3 += FDX31;

//! step left and right attrib y
STEP_FRAGMENTDATA(f00, step_left);
STEP_FRAGMENTDATA(f10, step_right);
}

//! fragment_shader_block (can now use derivatives of attributes)
BLOCK_BEGIN
if (!skip_flag[iy][ix])
BLOCK_END
}
}

for (int i = 0; i < rendertarget_params_.count; ++i)
(char*&)buffers.ptr += BLOCK_SIZE * rendertargets_->stride();

}
}


Test program:
/*

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include "SDL.h"
#include "rasterizer.h"

#include <cmath>
#include <algorithm>

class SDL_SurfaceRenderTarget : public Rasterizer::RenderTarget {
SDL_Surface *surface;
public:
SDL_SurfaceRenderTarget(SDL_Surface *s):surface(s) {}
virtual int width() { return surface->w; }
virtual int height() { return surface->h; }
virtual int stride() { return surface->pitch; }
virtual int element_size() { return sizeof(int); }
virtual void* buffer_pointer() { return surface->pixels; }
virtual void clear(int x, int y, int w, int h) {}
};

public:
virtual int varying_count() { return 3; }
virtual void shade(const Rasterizer::BufferPointers& ptrs, const Rasterizer::PixelBlock& block, int x, int y)
{
unsigned int* color_buffer = (unsigned int*)ptrs.ptr[0];

// unfortunaltely at the corners of the triangle we can get negative
// values for the interpolants -> std::max
int r = std::max(0, block[y][x].varying[0]);
int g = std::max(0, block[y][x].varying[1]);
int b = std::max(0, block[y][x].varying[2]);
int color = r << 16 | g << 8 | b;
*color_buffer = color;
}
};

int main(int ac, char *av[])
{
SDL_Init(SDL_INIT_VIDEO);
SDL_Surface *screen = SDL_SetVideoMode(320, 240, 32, SDL_SWSURFACE);

Rasterizer r;
SDL_SurfaceRenderTarget color_target(screen);

Rasterizer::RenderTarget* rendertargets[] = { &color_target};
r.rendertargets(1, rendertargets);
r.clip_rect(45, 70, 100, 100);

Rasterizer::Vertex v[3];

v[0].x = (int)(120.0f * 16.0f);
v[0].y = (int)(50.0f * 16.0f);
v[0].z = 0;
v[0].w = 1 << 16;
v[0].varyings[0] = 255;
v[0].varyings[1] = 0;
v[0].varyings[2] = 0;

v[1].x = (int)(20.0f  * 16.0f);
v[1].y = (int)(100.0f * 16.0f);
v[1].z = 0x7fffffff;
v[1].w = 1 << 16;
v[1].varyings[0] = 0;
v[1].varyings[1] = 255;
v[1].varyings[2] = 0;

v[2].x = (int)(150.0f * 16.0f);
v[2].y = (int)(220.0f * 16.0f);
v[2].z = 0x7fffffff >> 1;
v[2].w = 1 << 16;
v[2].varyings[0] = 0;
v[2].varyings[1] = 0;
v[2].varyings[2] = 255;

SDL_Rect rect;
rect.x = 45;
rect.y = 70;
rect.w = 100;
rect.h = 100;
SDL_FillRect(screen, &rect, 0xffffffff);

r.draw_triangle(v[0], v[1], v[2]);

SDL_Flip(screen);

SDL_Event e;
while (SDL_WaitEvent(&e) && e.type != SDL_QUIT);

SDL_Quit();
return 0;
}


I don't include the stdint.h file. You can get it yourself if needed. OK, i now have a domain www.trenki.net where I uploaded a vastly improved version of the renderer. The above code is obsolete, the features are retained. [Edited by - Trenki on August 22, 2007 5:11:28 AM]

##### Share on other sites
Quote:
 Original post by TrenkiThis is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.

Doesn't this cause errors near the edges of the triangles?

##### Share on other sites
Quote:
Original post by C0D1F1ED
Quote:
 Original post by TrenkiThis is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.

Doesn't this cause errors near the edges of the triangles?

In the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.
But I also looked at the mesa source code and they have the same problem at the edges and AFAIK they are doing the perspective correction per pixel. They have extra code to clamp the colors at the edges of the triangle.

##### Share on other sites
for me the devmaster link doesn't work. could you tell us what article you mean?
thanks

##### Share on other sites
Quote:
 Original post by TrenkiIn the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.

Clamping helps for colors, but texture coordinates are unbounded. So at the edges of triangle you could get unwanted wrapping.
Quote:
 But I also looked at the mesa source code and they have the same problem at the edges and AFAIK they are doing the perspective correction per pixel. They have extra code to clamp the colors at the edges of the triangle.

Interesting. Is that just to prevent the components from becoming negative due to rounding error then?

##### Share on other sites
Quote:
 Original post by Eitschfor me the devmaster link doesn't work. could you tell us what article you mean? thanks

Advanced Rasterization by Nicolas Capens, also known as c0d1f1ed. [wink]

##### Share on other sites
Quote:
Original post by C0D1F1ED
Quote:
 Original post by TrenkiIn the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.

Clamping helps for colors, but texture coordinates are unbounded. So at the edges of triangle you could get unwanted wrapping.

Yes, this might happen. In December I tried to code a triangle rasterizer following Chris Heckers series on perspective texture mapping. There the varyings are interpolated along the edges but I still got the wrapping effect with the texture coordinates. Maybe I did something wrong... In the end I stopped working on it because I didn't like the code at all and it also was quite involved and not as general as the one above.

##### Share on other sites
Quote:
Original post by C0D1F1ED
Quote:
 Original post by Eitschfor me the devmaster link doesn't work. could you tell us what article you mean? thanks

Advanced Rasterization by Nicolas Capens, also known as c0d1f1ed. [wink]
Huh, that's you?

I'm coding up a softrast based on what you wrote -- though I'm also doing a bit of C++ voodoo with it. That way I don't suffer from the heavy penalty of doing multiple virtual calls for every single fragment (which is what the OP is doing), along with a few other niceties.

##### Share on other sites
Quote:
 That way I don't suffer from the heavy penalty of doing multiple virtual calls for every single fragment (which is what the OP is doing), along with a few other niceties.

There should be only one virtual function call per fragment and I don't believe this virtual function call is a heavy penalty. I could be using a function pointer instead but do you really believe that would be faster?

Quote:
 along with a few other niceties.

Could you please elaborate on this and point out what exacly you mean.

[Edited by - Trenki on May 27, 2007 1:06:21 PM]

##### Share on other sites
Have you looked at this paper:
Triangle Scan Conversion using 2D Homogenous Coordinates

It dodges quite nicely many problems related to projection of 3D triangles to 2D, such as, w component being 0. Basicly, you don't need frustum clipping if you use homogenous rasterizer. I definitely recommend you to at least read the paper.

Also one idea worth investigating is using hierarchical z-buffer, which may improve performance especially in case of high depth complexity scenes.

1. 1
Rutin
19
2. 2
3. 3
4. 4
5. 5
frob
12

• 15
• 13
• 9
• 12
• 10
• ### Forum Statistics

• Total Topics
631442
• Total Posts
3000104
×