My Software Triangle Rasterizer

Started by
24 comments, last by Trenki 16 years, 10 months ago
Hi all! Last weekend I took the time to code a class which can rasterize triangles basing my work on some code I could find on the net, particularly this one over at devmaster. Over the course of this week I extended it and now I would say it is a complete solution. Features:
  • The rasterizer can support an arbitrary number of render targets (you will most likely use two, a color buffer and a depth buffer)
  • The rasterization is completely decoupled from the actual shading of the visible pixels. You can configure the rasterizer with a pixel shader which does the actual work of computing and assigning a color value.
  • It does only use integer math because i intend to use it on the GP2X which does not have an FPU.
  • The rasterizer is tile based. Currently it uses blocks of 8x8 pixels.
  • It interpolates an arbitrary number of integer varyings across the triangle (so you can use fixed point here). This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.
  • It supports a clipping rectangle.
  • It provides a means for the pixel shader to compute the derivative of the interpolated varyings. This is needed for example to compute the texture mimmap level from the texture coordinates.
  • It allows for an early depth test. For example the shader could store the minimum depth value for each 8x8 block and than discad a whole block if the minimum expected depth value for this block is greater than the one stored.
  • The source code is actually quite short ~600 lines with a lot of comments.
The only problem I can see right now is with small or large but thin triangles. Because the rasterizer is tile based it must at least scan a whole 8x8 block and test each pixel if it is inside the triangle or not. Large triangles are handled quite efficiently since for the inner part only the corners are tested for inout. What do you think about this. How big a performance problem might this be when targeting the GP2X? I include the code here for everyoune to look at. I would be very thankful for any input of possible improvements.
Header file:

/*
Copyright (c) 2007, Markus Trenkwalder

All rights reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, 
  this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation 
  and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors 
  may be used to endorse or promote products derived from this software 
  without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#ifndef RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3
#define RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3

class Rasterizer {
public:
	// some constants
	static const int MAX_RENDER_TARGETS = 2;
	static const int MAX_VARYING = 8;
	static const int BLOCK_SIZE = 8;	

public:
	// Type definitions
	struct Vertex {
		int x, y; // in 28.4 fixed point
		int z; // range from 0 to 0x7fffffff
		int w; // in 16.16 fixed point
		int varyings[MAX_VARYING];
	};

	// holds the pointers to each render target. 
	// These will be passed to the fragment shader which then can write to
	// the pointed to location. Note: Only the first n pointers will be
	// valid where n is the current number of render targets
	struct BufferPointers {
		void* ptr[MAX_RENDER_TARGETS];
	};

	// This is the data the fragment shader gets
	struct FragmentData {
		int z;
		int varying[MAX_VARYING];
	};

	typedef FragmentData PixelBlock[BLOCK_SIZE][BLOCK_SIZE] ;

	class RenderTarget {
	public:
		virtual int width() = 0;
		virtual int height() = 0;
		virtual int stride() = 0;
		virtual void *buffer_pointer() = 0;
		virtual int element_size() = 0;
		virtual void clear(int x, int y, int w, int h) = 0;
	};

	class FragmentShader {
	public:
		// This provides a means for an early depth test.
		// x and y are the coordinates of the upper left corner of the current block.
		// If the shader somewhere stores the minimum z of each block that value
		// can be compared to the parameter z.
		// returns false when the depth test failed. In this case the whole block 
		// can be culled.
		virtual bool early_depth_test(int x, int y, int z) { return true; }

		// This notifies the shader of any render target clears.
		// This is meant to be used in conjunction with the early depth test to update
		// any buffers used
		virtual void clear(int target, int x, int y, int w, int h) {}

		// To compute the mipmap level of detail one needs the derivativs in x and y of 
		// the texture coordinates. These can be computed from the values in the pixel
		// block since all the fragment values have alredy been computed for this block
		// when this is called
		virtual void prepare_for_block(int x, int y, PixelBlock b) {}

		// This tells the rasterizer how many varyings this fragment shader needs
		virtual int  varying_count() = 0;

		// This is called once for each visible fragment inside the triangle
		// x and y are the coordinates within the block [0, BLOCK_SIZE[
		// the pixel block is indexed with p[y][x] !!!
		virtual void shade(const BufferPointers&, const PixelBlock& b, int x, int y) = 0;
	};

private:
	// Variables
	struct RenderTargetParams {
		int count;
		int minwidth, minheight;

		// cache these params to avoid too 
		// many virtual function calls
		int stride[MAX_RENDER_TARGETS];
		int element_size[MAX_RENDER_TARGETS];
	} rendertarget_params_;

	RenderTarget *rendertargets_[MAX_RENDER_TARGETS];
	FragmentShader *fragment_shader_;

	struct {
		int x0, y0, x1, y1;
	} clip_rect_;

private:
	bool setup_valid();

public:
	// constructor
	Rasterizer();

public:
	// main interface

	// Set the render targets.
	// This resets the clipping rectangle
	void rendertargets(int n, RenderTarget* rt[]);

	// set the fragment shader
	void fragment_shader(FragmentShader *fs);

	void clear();
	void clear(int target);

	void clip_rect(int x, int y, int w, int h);

	// The triangle must be counter clockwise in screen space in order to be
	// drawn.
	void draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3);
};

#endif




Implementation file:

/*
Copyright (c) 2007, Markus Trenkwalder

All rights reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, 
  this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation 
  and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors 
  may be used to endorse or promote products derived from this software 
  without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include "rasterizer.h"

#include <cmath>
#include <cassert>
#include <algorithm>

#ifndef _MSC_VER
#include <stdint.h>
#else
#include "stdint.h"
#endif

////////////////////////////////////////////////////////////////////////////////
// utility functions

namespace {
	inline int min(int a, int b, int c)
	{
		return std::min(std::min(a,b), c);
	}

	inline int max(int a, int b, int c) 
	{
		return std::max(std::max(a,b), c);
	}

	inline void compute_plane(
		int v0x, int v0y,
		int v1x, int v1y,
		int v2x, int v2y,
		int z0, int z1, int z2, int64_t plane[4])
	{
	   const int px = v1x - v0x;
	   const int py = v1y - v0y;
	   const int pz = z1 - z0;

	   const int qx = v2x - v0x;
	   const int qy = v2y - v0y;
	   const int qz = z2 - z0;

	   /* Crossproduct "(a,b,c):= dv1 x dv2" is orthogonal to plane. */
	   const int64_t a = (int64_t)py * qz - (int64_t)pz * qy;
	   const int64_t b = (int64_t)pz * qx - (int64_t)px * qz;
	   const int64_t c = (int64_t)px * qy - (int64_t)py * qx;
	   /* Point on the plane = "r*(a,b,c) + w", with fixed "r" depending
		  on the distance of plane from origin and arbitrary "w" parallel
		  to the plane. */
	   /* The scalar product "(r*(a,b,c)+w)*(a,b,c)" is "r*(a^2+b^2+c^2)",
		  which is equal to "-d" below. */
	   const int64_t d = -(a * v0x + b * v0y + c * z0);

	   plane[0] = a;
	   plane[1] = b;
	   plane[2] = c;
	   plane[3] = d;
	}

	inline int solve_plane(int x, int y, const int64_t plane[4])
	{
	   assert(plane[2] != 0);
	   return (int)((plane[3] + plane[0] * x + plane[1] * y) / -plane[2]);
	}

	template <int denominator>
	inline void floor_divmod(int numerator, int &floor, int &mod)
	{
		assert(denominator > 0);
		if(numerator >= 0) {
			// positive case, C is okay
			floor = numerator / denominator;
			mod = numerator % denominator;
		} else {
			// Numerator is negative, do the right thing
			floor = -((-numerator) / denominator);
			mod = (-numerator) % denominator;
			if(mod) {
				// there is a remainder
				floor--; mod = denominator - mod;
			}
		}
	}

	// Fixed point division
	template <int p>
	inline int32_t fixdiv(int32_t a, int32_t b)
	{
	#if 0
		return (int32_t)((((int64_t)a) << p) / b);
	#else	
		// The following produces the same results as the above but gcc 4.0.3 
		// generates fewer instructions (at least on the ARM processor).
		union {
			int64_t a;
			struct {
				int32_t l;
				int32_t h;
			};
		} x;
		
		x.l = a << p;
		x.h = a >> (sizeof(int32_t) * 8 - p);
		return (int32_t)(x.a / b);
	#endif
	}

	// Perform a fixed point multiplication using a 64-bit intermediate result to
	// prevent overflow problems.
	template <int p>
	inline int32_t fixmul(int32_t a, int32_t b)
	{
		return (int32_t)(((int64_t)a * b) >> p);
	}
} // end anonymous namespace

////////////////////////////////////////////////////////////////////////////////

Rasterizer::Rasterizer() :
	fragment_shader_(0)	
{
	rendertarget_params_.count = 0;
}

bool Rasterizer::setup_valid()
{
	return rendertarget_params_.count >= 1 &&
		fragment_shader_ != 0;
}

void Rasterizer::clear()
{
	for (int i = 0; i < rendertarget_params_.count; ++i) 
		clear(i);
}

void Rasterizer::clear(int target)
{
	assert(target <= rendertarget_params_.count);
	rendertargets_[target]->clear(0, 0, 
		rendertarget_params_.minwidth, rendertarget_params_.minheight);

	// notify shader about clear (might want to update internal data structutes)
	if (fragment_shader_)
		fragment_shader_->clear(target, 0, 0, 
		rendertarget_params_.minwidth, rendertarget_params_.minheight);
}

void Rasterizer::clip_rect(int x, int y, int w, int h)
{
	if (rendertarget_params_.count == 0) return;

	clip_rect_.x0 = std::max(0, x);
	clip_rect_.y0 = std::max(0, y);
	clip_rect_.x1 = std::min(x + w, rendertarget_params_.minwidth);
	clip_rect_.y1 = std::min(y + h, rendertarget_params_.minheight);
}

////////////////////////////////////////////////////////////////////////////////
// main interface

void Rasterizer::rendertargets(int n, RenderTarget* rt[])
{
	assert(n <= MAX_RENDER_TARGETS);
	RenderTargetParams &rtp = rendertarget_params_;
	rtp.count = n;

	if (n == 0) return;

	rtp.minwidth = rt[0]->width();
	rtp.minheight = rt[0]->height();
	for (int i = 0; i < n; ++i) {
		rendertargets_ = rt;
		rtp.minwidth = std::min(rtp.minwidth, rt->width());
		rtp.minheight = std::min(rtp.minheight, rt->height());

		// cache these to avoid too many virtual function calls later
		rtp.element_size = rt->element_size();
		rtp.stride = rt->stride();
	}

	clip_rect_.x0 = 0;
	clip_rect_.y0 = 0;
	clip_rect_.x1 = rtp.minwidth;
	clip_rect_.y1 = rtp.minheight;
}

void Rasterizer::fragment_shader(FragmentShader *fs)
{
	assert(fs != 0);
	fragment_shader_ = fs;
}

void Rasterizer::draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3)
{
	if (!setup_valid()) return;

	int64_t zPlane[4];
	int64_t wPlane[4];
	int64_t vPlane[MAX_VARYING][4];
	compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, v1.z, v2.z, v3.z, zPlane);
	compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, // interpolate 1/w across triangle
		fixdiv<16>(1 << 16, v1.w),
		fixdiv<16>(1 << 16, v2.w),
		fixdiv<16>(1 << 16, v3.w),
		wPlane);
	int varying_count = fragment_shader_->varying_count();
	for (int i = 0; i < varying_count; ++i)
		compute_plane(
			v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, 
			fixdiv<16>(v1.varyings, v1.w), 
			fixdiv<16>(v2.varyings, v2.w),
			fixdiv<16>(v3.varyings, v3.w),
			vPlane
		);

	// Deltas
	const int DX12 = v1.x - v2.x;
	const int DX23 = v2.x - v3.x;
	const int DX31 = v3.x - v1.x;

	const int DY12 = v1.y - v2.y;
	const int DY23 = v2.y - v3.y;
	const int DY31 = v3.y - v1.y;

	// Fixed-point deltas
	const int FDX12 = DX12 << 4;
	const int FDX23 = DX23 << 4;
	const int FDX31 = DX31 << 4;

	const int FDY12 = DY12 << 4;
	const int FDY23 = DY23 << 4;
	const int FDY31 = DY31 << 4;

	// Bounding rectangle
	int minx = (min(v1.x, v2.x, v3.x) + 0xF) >> 4;
	int maxx = (max(v1.x, v2.x, v3.x) + 0xF) >> 4;
	int miny = (min(v1.y, v2.y, v3.y) + 0xF) >> 4;
	int maxy = (max(v1.y, v2.y, v3.y) + 0xF) >> 4;

	// consider clipping rectangle
	minx = std::max(minx, clip_rect_.x0);
	maxx = std::min(maxx, clip_rect_.x1);
	miny = std::max(miny, clip_rect_.y0);
	maxy = std::min(maxy, clip_rect_.y1);

	// Start in corner of 8x8 block
	minx &= ~(BLOCK_SIZE - 1);
	miny &= ~(BLOCK_SIZE - 1);

	BufferPointers buffers;

	for (int i = 0; i < rendertarget_params_.count; ++i)
		buffers.ptr = (char*)rendertargets_->buffer_pointer() + miny * rendertargets_->stride();
	
	// Half-edge constants
	int C1 = DY12 * v1.x - DX12 * v1.y;
	int C2 = DY23 * v2.x - DX23 * v2.y;
	int C3 = DY31 * v3.x - DX31 * v3.y;

	// Correct for fill convention
	if(DY12 < 0 || (DY12 == 0 && DX12 > 0)) C1++;
	if(DY23 < 0 || (DY23 == 0 && DX23 > 0)) C2++;
	if(DY31 < 0 || (DY31 == 0 && DX31 > 0)) C3++;

	// Loop through blocks
	for(int y = miny; y < maxy; y += BLOCK_SIZE)
	{
		for(int x = minx; x < maxx; x += BLOCK_SIZE)
		{
			// Corners of block
			int x0 = x << 4;
			int x1 = (x + BLOCK_SIZE - 1) << 4;
			int y0 = y << 4;
			int y1 = (y + BLOCK_SIZE - 1) << 4;

			// Evaluate half-space functions
			bool a00 = C1 + DX12 * y0 - DY12 * x0 > 0;
			bool a10 = C1 + DX12 * y0 - DY12 * x1 > 0;
			bool a01 = C1 + DX12 * y1 - DY12 * x0 > 0;
			bool a11 = C1 + DX12 * y1 - DY12 * x1 > 0;
			int a = (a00 << 0) | (a10 << 1) | (a01 << 2) | (a11 << 3);
    
			bool b00 = C2 + DX23 * y0 - DY23 * x0 > 0;
			bool b10 = C2 + DX23 * y0 - DY23 * x1 > 0;
			bool b01 = C2 + DX23 * y1 - DY23 * x0 > 0;
			bool b11 = C2 + DX23 * y1 - DY23 * x1 > 0;
			int b = (b00 << 0) | (b10 << 1) | (b01 << 2) | (b11 << 3);
    
			bool c00 = C3 + DX31 * y0 - DY31 * x0 > 0;
			bool c10 = C3 + DX31 * y0 - DY31 * x1 > 0;
			bool c01 = C3 + DX31 * y1 - DY31 * x0 > 0;
			bool c11 = C3 + DX31 * y1 - DY31 * x1 > 0;
			int c = (c00 << 0) | (c10 << 1) | (c01 << 2) | (c11 << 3);

			// Skip block when outside an edge
			if(a == 0x0 || b == 0x0 || c == 0x0) continue;

			#define CLIP_TEST(X, Y) 				((X) >= clip_rect_.x0 && (X) < clip_rect_.x1 && 				(Y) >= clip_rect_.y0 && (Y) < clip_rect_.y1)

			// test for the clipping rectangle
			bool clip00 = CLIP_TEST(x, y);
			bool clip10 = CLIP_TEST(x + 7, y);
			bool clip01 = CLIP_TEST(x, y + 7);
			bool clip11 = CLIP_TEST(x + 7, y + 7);

			// skip block if all is clippled
			if (!clip00 && !clip10 && !clip01 && !clip11) continue;
			bool clip_all_in = clip00 && clip10 && clip01 && clip11;
			
			//! compute attribute interpolants at corners
			FragmentData f00; 
			FragmentData f10; 
			FragmentData f01; 
			FragmentData f11; 

			int xx1 = (x + BLOCK_SIZE) << 4;
			int yy1 = (y + BLOCK_SIZE) << 4;
			f00.z = solve_plane(x0, y0, zPlane);			
			f10.z = solve_plane(xx1, y0, zPlane);
			f01.z = solve_plane(x0, yy1, zPlane);
			f11.z = solve_plane(xx1, yy1, zPlane);
			
			if (!fragment_shader_->early_depth_test(x, y, std::min(std::min(std::min(f00.z, f10.z), f01.z), f11.z)))
				continue;

			int w00 = fixdiv<16>(1 << 16, solve_plane(x0, y0, wPlane));
			int w10 = fixdiv<16>(1 << 16, solve_plane(xx1, y0, wPlane));
			int w01 = fixdiv<16>(1 << 16, solve_plane(x0, yy1, wPlane));
			int w11 = fixdiv<16>(1 << 16, solve_plane(xx1, yy1, wPlane));
			for (int i = 0; i < varying_count; ++i) {
				f00.varying = fixmul<16>(solve_plane(x0, y0, vPlane), w00);
				f10.varying = fixmul<16>(solve_plane(xx1, y0, vPlane), w10);
				f01.varying = fixmul<16>(solve_plane(x0, yy1, vPlane), w01);
				f11.varying = fixmul<16>(solve_plane(xx1, yy1, vPlane), w11);
			}

			//! compute attribute step y left and right
			struct varying_step_t {
				struct step_info_t {
					int step;
					int rem;
					int error_term;
					step_info_t():error_term(0){}

					int dostep() {
						int r = step;
						error_term += rem;
						if (error_term >= BLOCK_SIZE) { 
							error_term -= BLOCK_SIZE;
							r++;
						}
						return r;
					}
				};

				step_info_t z;
				step_info_t varying[MAX_VARYING];

				varying_step_t(FragmentData& p1, FragmentData& p2, int vc)
				{
					floor_divmod<BLOCK_SIZE>(p2.z - p1.z, z.step, z.rem);
					for (int i = 0; i < vc; ++i) {
						floor_divmod<BLOCK_SIZE>(p2.varying - p1.varying, varying.step, varying.rem);
					}
				}
			};
			
			varying_step_t step_left(f00, f01, varying_count);
			varying_step_t step_right(f10, f11, varying_count);			

			BufferPointers block_buffers = buffers;

			#define RENDER_TARGET_LOOP 				for (int i = 0; i < rendertarget_params_.count; ++i)

			#define STEP_POINTERS_BY_ELEMENTSIZE(VAR, FACTOR) 				{ 					RENDER_TARGET_LOOP 					(char*&)VAR.ptr += FACTOR * rendertarget_params_.element_size; 				}
				
			#define	STEP_POINTERS_BY_STRIDE(VAR) 				{ 					RENDER_TARGET_LOOP 					(char*&)VAR.ptr += rendertarget_params_.stride; 				}

			#define STEP_FRAGMENTDATA(FDVAR, STEPVAR) 				{ 					FDVAR.z += STEPVAR.z.dostep(); 					for (int i = 0; i < varying_count; ++i) 						FDVAR.varying += STEPVAR.varying.dostep(); 				}
			
			// only copy the neccessary varyings
			#define EFFICIENT_COPY(SRC, DST) 				{ 					DST.z = SRC.z; 					for (int i = 0; i < varying_count; ++i) 						DST.varying = SRC.varying; 				}
			
			#define BLOCK_BEGIN 				fragment_shader_->prepare_for_block(x, y, pixel_block); 				for (int iy = 0; iy < BLOCK_SIZE; ++iy) { 					BufferPointers inner = block_buffers; 					STEP_POINTERS_BY_ELEMENTSIZE(inner, x); 					for (int ix = 0; ix < BLOCK_SIZE; ++ix) {

			#define BLOCK_END 						STEP_POINTERS_BY_ELEMENTSIZE(inner, 1); 					} 					STEP_POINTERS_BY_STRIDE(block_buffers); 				}

			PixelBlock pixel_block;
			
			bool skip_flag[BLOCK_SIZE][BLOCK_SIZE];
			memset(skip_flag, 0, sizeof(skip_flag));

			if (!clip_all_in) {
				for (int iy = 0; iy < BLOCK_SIZE; ++iy)
				for (int ix = 0; ix < BLOCK_SIZE; ++ix)
					if (!CLIP_TEST(ix + x, iy + y))
						skip_flag[iy][ix] = true;
			}

			// Accept whole block when totally covered
			if(a == 0xF && b == 0xF && c == 0xF)
			{
				// first compute all fragment data
				for(int iy = 0; iy < BLOCK_SIZE; iy++)
				{
					//! compute attribute step x for this scanline
					varying_step_t stepx(f00, f10, varying_count);
					FragmentData fragment_data = f00;

					for(int ix = 0; ix < BLOCK_SIZE; ix++)
					{
						EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]);
						STEP_FRAGMENTDATA(fragment_data, stepx);
					}

					//! step left and right attrib y
					STEP_FRAGMENTDATA(f00, step_left);
					STEP_FRAGMENTDATA(f10, step_right);
				}

				//! fragment_shader_block (can now use derivatives of attributes)
				if (clip_all_in) {
					BLOCK_BEGIN
						fragment_shader_->shade(inner, pixel_block, ix, iy);
					BLOCK_END
				} else {
					BLOCK_BEGIN
						if (!skip_flag[iy][ix]) 
							fragment_shader_->shade(inner, pixel_block, ix, iy);
					BLOCK_END
				}
			}
			else // Partially covered block
			{
				int CY1 = C1 + DX12 * y0 - DY12 * x0;
				int CY2 = C2 + DX23 * y0 - DY23 * x0;
				int CY3 = C3 + DX31 * y0 - DY31 * x0;

				for(int iy = 0; iy < BLOCK_SIZE; iy++)
				{
					int CX1 = CY1;
					int CX2 = CY2;
					int CX3 = CY3;
					
					//! compute attribute step x for this scanline
					varying_step_t stepx(f00, f10, varying_count);
					
					FragmentData fragment_data = f00;

					for(int ix = 0; ix < BLOCK_SIZE; ix++)
					{
						if(!(CX1 > 0 && CX2 > 0 && CX3 > 0))
							skip_flag[iy][ix] = true;

						// we still need to do this since the fragment shader might want
						// to compute the derivative of attibutes
						EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]);

						CX1 -= FDY12;
						CX2 -= FDY23;
						CX3 -= FDY31;

						STEP_FRAGMENTDATA(fragment_data, stepx);
					}
					
					CY1 += FDX12;
					CY2 += FDX23;
					CY3 += FDX31;

					//! step left and right attrib y
					STEP_FRAGMENTDATA(f00, step_left);
					STEP_FRAGMENTDATA(f10, step_right);
				}
				
				//! fragment_shader_block (can now use derivatives of attributes)
				BLOCK_BEGIN
					if (!skip_flag[iy][ix])
						fragment_shader_->shade(inner, pixel_block, ix, iy);
				BLOCK_END
			}
		}

		for (int i = 0; i < rendertarget_params_.count; ++i)
			(char*&)buffers.ptr += BLOCK_SIZE * rendertargets_->stride();

	}
}




Test program:

/*
Copyright (c) 2007, Markus Trenkwalder

All rights reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, 
  this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation 
  and/or other materials provided with the distribution.

* Neither the name of the <ORGANIZATION> nor the names of its contributors 
  may be used to endorse or promote products derived from this software 
  without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include "SDL.h"
#include "rasterizer.h"

#include <cmath>
#include <algorithm>

class SDL_SurfaceRenderTarget : public Rasterizer::RenderTarget {
	SDL_Surface *surface;
public:
	SDL_SurfaceRenderTarget(SDL_Surface *s):surface(s) {}
	virtual int width() { return surface->w; }
	virtual int height() { return surface->h; }
	virtual int stride() { return surface->pitch; }
	virtual int element_size() { return sizeof(int); }
	virtual void* buffer_pointer() { return surface->pixels; }
	virtual void clear(int x, int y, int w, int h) {}
};

class TestFragmentShader : public Rasterizer::FragmentShader {
public:
	virtual int varying_count() { return 3; }
	virtual void shade(const Rasterizer::BufferPointers& ptrs, const Rasterizer::PixelBlock& block, int x, int y)
	{
		unsigned int* color_buffer = (unsigned int*)ptrs.ptr[0];

		// unfortunaltely at the corners of the triangle we can get negative
		// values for the interpolants -> std::max
		int r = std::max(0, block[y][x].varying[0]);
		int g = std::max(0, block[y][x].varying[1]);
		int b = std::max(0, block[y][x].varying[2]);
		int color = r << 16 | g << 8 | b;
		*color_buffer = color;
	}
};

int main(int ac, char *av[])
{
	SDL_Init(SDL_INIT_VIDEO);
	SDL_Surface *screen = SDL_SetVideoMode(320, 240, 32, SDL_SWSURFACE);

	Rasterizer r;
	SDL_SurfaceRenderTarget color_target(screen);
	TestFragmentShader fragment_shader;

	Rasterizer::RenderTarget* rendertargets[] = { &color_target};
	r.rendertargets(1, rendertargets);
	r.fragment_shader(&fragment_shader);
	r.clip_rect(45, 70, 100, 100);

	Rasterizer::Vertex v[3];

	v[0].x = (int)(120.0f * 16.0f);
	v[0].y = (int)(50.0f * 16.0f);
	v[0].z = 0;
	v[0].w = 1 << 16;
	v[0].varyings[0] = 255;
	v[0].varyings[1] = 0;
	v[0].varyings[2] = 0;

	v[1].x = (int)(20.0f  * 16.0f);
	v[1].y = (int)(100.0f * 16.0f);
	v[1].z = 0x7fffffff;
	v[1].w = 1 << 16;
	v[1].varyings[0] = 0;
	v[1].varyings[1] = 255;
	v[1].varyings[2] = 0;

	v[2].x = (int)(150.0f * 16.0f);
	v[2].y = (int)(220.0f * 16.0f);
	v[2].z = 0x7fffffff >> 1;
	v[2].w = 1 << 16;
	v[2].varyings[0] = 0;
	v[2].varyings[1] = 0;
	v[2].varyings[2] = 255;

	SDL_Rect rect;
	rect.x = 45;
	rect.y = 70;
	rect.w = 100;
	rect.h = 100;
	SDL_FillRect(screen, &rect, 0xffffffff);

	r.draw_triangle(v[0], v[1], v[2]);

	SDL_Flip(screen);

	SDL_Event e;
	while (SDL_WaitEvent(&e) && e.type != SDL_QUIT);

	SDL_Quit();
	return 0;
}




I don't include the stdint.h file. You can get it yourself if needed. OK, i now have a domain www.trenki.net where I uploaded a vastly improved version of the renderer. The above code is obsolete, the features are retained. [Edited by - Trenki on August 22, 2007 5:11:28 AM]
Advertisement
Quote:Original post by Trenki
This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.

Doesn't this cause errors near the edges of the triangles?
Quote:Original post by C0D1F1ED
Quote:Original post by Trenki
This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide.

Doesn't this cause errors near the edges of the triangles?


In the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.
But I also looked at the mesa source code and they have the same problem at the edges and AFAIK they are doing the perspective correction per pixel. They have extra code to clamp the colors at the edges of the triangle.
for me the devmaster link doesn't work. could you tell us what article you mean?
thanks
Quote:Original post by Trenki
In the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.

Clamping helps for colors, but texture coordinates are unbounded. So at the edges of triangle you could get unwanted wrapping.
Quote:But I also looked at the mesa source code and they have the same problem at the edges and AFAIK they are doing the perspective correction per pixel. They have extra code to clamp the colors at the edges of the triangle.

Interesting. Is that just to prevent the components from becoming negative due to rounding error then?
Quote:Original post by Eitsch
for me the devmaster link doesn't work. could you tell us what article you mean?
thanks

Advanced Rasterization by Nicolas Capens, also known as c0d1f1ed. [wink]
Quote:Original post by C0D1F1ED
Quote:Original post by Trenki
In the test program I interpolate three colors over the triangle. In the fragment shader it could happen that one of the interpolated rgb components became nagative at the edges of the triangle so I had to clamp them to zero.

Clamping helps for colors, but texture coordinates are unbounded. So at the edges of triangle you could get unwanted wrapping.


Yes, this might happen. In December I tried to code a triangle rasterizer following Chris Heckers series on perspective texture mapping. There the varyings are interpolated along the edges but I still got the wrapping effect with the texture coordinates. Maybe I did something wrong... In the end I stopped working on it because I didn't like the code at all and it also was quite involved and not as general as the one above.

Quote:Original post by C0D1F1ED
Quote:Original post by Eitsch
for me the devmaster link doesn't work. could you tell us what article you mean?
thanks

Advanced Rasterization by Nicolas Capens, also known as c0d1f1ed. [wink]
Huh, that's you?

I'm coding up a softrast based on what you wrote -- though I'm also doing a bit of C++ voodoo with it. That way I don't suffer from the heavy penalty of doing multiple virtual calls for every single fragment (which is what the OP is doing), along with a few other niceties.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:That way I don't suffer from the heavy penalty of doing multiple virtual calls for every single fragment (which is what the OP is doing), along with a few other niceties.


There should be only one virtual function call per fragment and I don't believe this virtual function call is a heavy penalty. I could be using a function pointer instead but do you really believe that would be faster?


Quote:along with a few other niceties.

Could you please elaborate on this and point out what exacly you mean.

[Edited by - Trenki on May 27, 2007 1:06:21 PM]
Have you looked at this paper:
Triangle Scan Conversion using 2D Homogenous Coordinates

It dodges quite nicely many problems related to projection of 3D triangles to 2D, such as, w component being 0. Basicly, you don't need frustum clipping if you use homogenous rasterizer. I definitely recommend you to at least read the paper.

Also one idea worth investigating is using hierarchical z-buffer, which may improve performance especially in case of high depth complexity scenes.

This topic is closed to new replies.

Advertisement