Software Rasterizer

Hello every one,
i'm having a bit of a trouble tracking a bug,
i'm getting weird edges at the end of a triangle only when using SSE, the output is just fine with scalars,
here is an image tothe problem,
and that's the code, it basically performs a point triangle intersection test based on fgiesen blog series

for (p.y = iMinY; p.y <= iMaxY; p.y += SEdge::s_iYStepSize, iIdx += iDepthBufferWidth)
			Simd::I32x4 i4w0 = i4W0Row;
			Simd::I32x4 i4w1 = i4W1Row;
			Simd::I32x4 i4w2 = i4W2Row;
			I32 iXIdx = iIdx;

			for (p.x = iMinX; p.x <= iMaxX; p.x += SEdge::s_iXStepSize,
				iXIdx += SEdge::s_iXStepSize,
				// one step to the right.
				i4w0 = Simd::AddPackedI32(i4w0, e12.i4StepX),
				i4w1 = Simd::AddPackedI32(i4w1, e20.i4StepX),
				i4w2 = Simd::AddPackedI32(i4w2, e01.i4StepX))
				const Simd::I32x4 i4Mask = _mm_cmplt_epi32(zero, Simd::OrPackedI32(i4w0, Simd::OrPackedI32(i4w1, i4w2)));
				if (_mm_test_all_zeros(i4Mask, i4Mask))

				Simd::F32x4 depth = z0;
				depth = Simd::AddPacked(depth, Simd::MulPacked(_mm_cvtepi32_ps(i4w1), z1));
				depth = Simd::AddPacked(depth, Simd::MulPacked(_mm_cvtepi32_ps(i4w2), z2));

				Simd::F32x4 prevDepth = _mm_load_ps(&pRenderTarget[iXIdx]);
				Simd::F32x4 depthMask = _mm_cmplt_ps(depth, prevDepth);
				Simd::I32x4 finalMask = _mm_and_si128(i4Mask, _mm_castps_si128(depthMask));
				depth = Simd::BlendPacked(prevDepth, depth, _mm_castsi128_ps(finalMask));

				Simd::StorePacked(&pRenderTarget[iXIdx], depth);

			i4W0Row = Simd::AddPackedI32(i4W0Row, e12.i4StepY);
			i4W1Row = Simd::AddPackedI32(i4W1Row, e20.i4StepY);
			i4W2Row = Simd::AddPackedI32(i4W2Row, e01.i4StepY);

SIMD Swizzle ? xyzw <->wzyx ?

i don't think so, the store implicitly changes the order, and i'v tried it but it didn't work.



Actually that this solve those edges, but now introduces interiors


