Advertisement Jump to content


This topic is now archived and is closed to further replies.

Ren Aissanz

Clamping values using inline __asm and SSE2

This topic is 5337 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

OK, I've gotten to the point where I've processed the data, and have a result that is four single-precision floating point values in an XMMi register (XMM0 in this case). I want to clamp the values in that register to (low value: 0.0f, high value: 255.0f). I'm drawing a complete blank, so here's the code:
	;	build anti-log table for uchars
	mov		ecx,	[UCHAR_MAX]								;
	shr		ecx,	2										;	ecx / 4
	mov		ebp,	[AntiLogScaleLUT]						;	uchar LUT pointer
	movaps	XMM7,	[increment]								;	increment values
	loop_lut:												;	loop for both ushrt and uchar
		;	uchar processing
		movaps	XMM0,	XMM3								;	preserve increment
		divps	XMM0,	[ucharRange]						;	x = (n + [0..3]) / 255.0
		mulps	XMM0,	XMM7								;	x = zNearMinusZFar * x
		addps	XMM0,	[zFarVal]							;	x = x + zFar
		divps	XMM5,	XMM0								;	y = zNearTimesZFar / x
		movaps	XMM0,	XMM5								;	preserve data (x = y)
		movaps	XMM5,	[zNearTimesZFar]					;	reset zNearTimesZFar
		subps	XMM0,	[zNearVal]							;	x = x - zNear
		divps	XMM0,	[zFarMinusZNear]					;	x = x / zFarMinuzZNear
		mulps	XMM0,	[ucharRange]						;	x = x * 255.0
		;	need to insert clamping code here!				;
		cvtps2dq	XMM1,	XMM0							;	convert to unsigned ints
		movdqa		ebp,	XMM1							;	move the result to LUT
		add			ebp,	16								;	increment the LUT ptr
		addps		XMM3,	[countBy]						;	increment the loop values (n + i)
		dec		ecx											;	decrement LCV
	jnz	loop_lut											;	jump while ecx != 0
	emms													;	clear FPU state (necessary?)

You can see the point where I get confused. I've tried a few masking operations, but everything seems to end up 0 or 255. If anyone can lend a hand, I'd greatly appreciate it. (PS. feel free to use the rest of the code in your own apps if its something you can use) -- Ren [edited by - Ren Aissanz on June 4, 2004 2:44:28 PM] [edited by - Ren Aissanz on June 4, 2004 2:46:25 PM]

Share this post

Link to post
Share on other sites
So I take it you''re trying to do the equivalent of this in asm? This does it with no branches, check out what it compiles to.

inline int clipMinMax(int value, const int minVal = 0, const int maxVal = 255) {
int bLess = value < minVal, bGreater = value > maxVal;
return (maxVal & (-(int)bGreater)) | (
((minVal & (-(int)bLess)) | (value & (-(int)!bLess))
) & (-(int)!bGreater));

Share this post

Link to post
Share on other sites
OK, I understand what you're trying to do here, but it leaves out the building of the logarithmic to linear lookup table. If I inline that function, I lose the SIMD pipeline, correct?

PS. A bit off the topic, but that inline function actually uses more calls than a simple comparison.

[edited by - Ren Aissanz on June 5, 2004 3:07:06 AM]

Share this post

Link to post
Share on other sites
Guest Anonymous Poster
err... how bout?

//global goodness ;)

#pragma pack(push, 16)
static float clamp_min[] = { 0, 0, 0, 0};
static float clamp_max[] = { 255, 255, 255, 255};
#pragma pack( pop)

//let''s clamp xmm0

maxps xmm0, clamp_min
minps xmm0, clamp_max


Share this post

Link to post
Share on other sites

Cheers yourself! A simple solution! Here I am messing with greater than/less than, masking, etc, and you slashed thru my Gordian knot with precision! Thanks so much for your help! Ren

Share this post

Link to post
Share on other sites

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!