• Create Account

### #Actualwodinoneeye

Posted 12 October 2013 - 02:36 AM

"I wrote the shifting logic that way because I've been burnt by a micro-coded shift-by-variable instruction, where shifting by a constant takes 1 cycle, but shifting by a variable breaks down into a little for loop that shifts by 1 n times, over n*k cycles... I can't actually remember whether this is the case on modern PC CPUs or not

"

But why all the bulking up the cose with  unsigned int(4)   unsigned int(8)   when  4 , 8 ...     is all you need  The shift only takes ints so you could unclutter it a bit at least for that

Also since you own this function couldnt you shorten the variable names  (its really fairly obvious what it does and is quite repetative and a simple comment would tell any less knowlegable person what it does

also  what is  the guts of   quantize4()   and could you imbed it even further to chop out extraneous operations ???

A table lookup for a non linear conversion function with a byte domain  (feeding it the float to int calc directly as subscript)  ???

or mutate the inline quantize4()   function  with the    * 255.0 + .5  imbedded inside it return an unsigned int from it and imbed the whole call right into the  shift equation sequence    (probably need to speed testcompare  it to see if any such condensations  make any diference besides looking cleaner  .... getting rid of the intermediary variables....)

heh,  you could also reuse  the 0 -7 qA variables for the second set 8 -15

something less bulky like

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{
unsigned int    qA0,qA1,qA2,QA3,qA4,qA5,qA6,qA7;
DXT3AlphaBlock  dxt3Alpha;

qA0 = quantize4(int(colors[0].w * 255.0 + .5));
qA1 = quantize4(int(colors[1].w * 255.0 + .5));
qA2 = quantize4(int(colors[2].w * 255.0 + .5));
qA3 = quantize4(int(colors[3].w * 255.0 + .5));
qA4 = quantize4(int(colors[4].w * 255.0 + .5));
qA5 = quantize4(int(colors[5].w * 255.0 + .5));
qA6 = quantize4(int(colors[6].w * 255.0 + .5));
qA7 = quantize4(int(colors[7].w * 255.0 + .5));

dxt3Alpha.alphas[0] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

qA0 = quantize4(int(colors[8].w * 255.0 + .5));
qA1 = quantize4(int(colors[9].w * 255.0 + .5));
qA2 = quantize4(int(colors[10].w * 255.0 + .5));
qA3 = quantize4(int(colors[11].w * 255.0 + .5));
qA4 = quantize4(int(colors[12].w * 255.0 + .5));
qA5 = quantize4(int(colors[13].w * 255.0 + .5));
qA6 = quantize4(int(colors[14].w * 255.0 + .5));
qA7 = quantize4(int(colors[15].w * 255.0 + .5));

dxt3Alpha.alphas[1] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

return dxt3Alpha;
}



This is a function taht looks like it will be crunching alot of bulk data for texture conversion  so  doing such (and similar)  optimization could add up for the actuual programs

---

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{

DXT3AlphaBlock  dxt3Alpha;

dxt3Alpha.alphas[0] =
quantize4(int(colors[0].w * 255.0 + .5)) << 0  |
quantize4(int(colors[1].w * 255.0 + .5)) << 4  |
quantize4(int(colors[2].w * 255.0 + .5)) << 8  |
quantize4(int(colors[3].w * 255.0 + .5)) << 12 |
quantize4(int(colors[4].w * 255.0 + .5)) << 16 |
quantize4(int(colors[5].w * 255.0 + .5)) << 20 |
quantize4(int(colors[6].w * 255.0 + .5)) << 24 |
quantize4(int(colors[7].w * 255.0 + .5)) << 28;

dxt3Alpha.alphas[1] =
quantize4(int(colors[8].w * 255.0 + .5)) << 0  |
quantize4(int(colors[9].w * 255.0 + .5)) << 4  |
quantize4(int(colors[10].w * 255.0 + .5)) << 8  |
quantize4(int(colors[11].w * 255.0 + .5)) << 12 |
quantize4(int(colors[12].w * 255.0 + .5)) << 16 |
quantize4(int(colors[13].w * 255.0 + .5)) << 20 |
quantize4(int(colors[14].w * 255.0 + .5)) << 24 |
quantize4(int(colors[15].w * 255.0 + .5)) << 28;

return dxt3Alpha;
}


wrapper of    ((unsigned int)  quantize4(...))       possibly needed  if the  shift is wonky

pointer math on the      colors[].w       Float ptr  with      ptr += 4   to eliminate the .array index multiply ???

etc....

### #4wodinoneeye

Posted 12 October 2013 - 02:26 AM

"I wrote the shifting logic that way because I've been burnt by a micro-coded shift-by-variable instruction, where shifting by a constant takes 1 cycle, but shifting by a variable breaks down into a little for loop that shifts by 1 n times, over n*k cycles... I can't actually remember whether this is the case on modern PC CPUs or not

"

But why all the bulking up the cose with  unsigned int(4)   unsigned int(8)   when  4 , 8 ...     is all you need  The shift only takes ints so you could unclutter it a bit at least for that

Also since you own this function couldnt you shorten the variable names  (its really fairly obvious what it does and is quite repetative and a simple comment would tell any less knowlegable person what it does

also  what is  the guts of   quantize4()   and could you imbed it even further to chop out extraneous operations ???

A table lookup for a non linear conversion function with a byte domain  (feeding it the float to int calc directly as subscript)  ???

or mutate the inline quantize4()   function  with the    * 255.0 + .5  imbedded inside it return an unsigned int from it and imbed the whole call right into the  shift equation sequence    (probably need to speed testcompare  it to see if any such condensations  make any diference besides looking cleaner  .... getting rid of the intermediary variables....)

heh,  you could also reuse  the 0 -7 qA variables for the second set 8 -15

something less bulky like

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{
unsigned int    qA0,qA1,qA2,QA3,qA4,qA5,qA6,qA7;
DXT3AlphaBlock  dxt3Alpha;

qA0 = quantize4(int(colors[0].w * 255.0 + .5));
qA1 = quantize4(int(colors[1].w * 255.0 + .5));
qA2 = quantize4(int(colors[2].w * 255.0 + .5));
qA3 = quantize4(int(colors[3].w * 255.0 + .5));
qA4 = quantize4(int(colors[4].w * 255.0 + .5));
qA5 = quantize4(int(colors[5].w * 255.0 + .5));
qA6 = quantize4(int(colors[6].w * 255.0 + .5));
qA7 = quantize4(int(colors[7].w * 255.0 + .5));

dxt3Alpha.alphas[0] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

qA0 = quantize4(int(colors[8].w * 255.0 + .5));
qA1 = quantize4(int(colors[9].w * 255.0 + .5));
qA2 = quantize4(int(colors[10].w * 255.0 + .5));
qA3 = quantize4(int(colors[11].w * 255.0 + .5));
qA4 = quantize4(int(colors[12].w * 255.0 + .5));
qA5 = quantize4(int(colors[13].w * 255.0 + .5));
qA6 = quantize4(int(colors[14].w * 255.0 + .5));
qA7 = quantize4(int(colors[15].w * 255.0 + .5));

dxt3Alpha.alphas[1] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

return dxt3Alpha;
}



This is a function taht looks like it will be crunching alot of bulk data for texture conversion  so  doing such (and similar)  optimization could add up for the actuual programs

---

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{

DXT3AlphaBlock  dxt3Alpha;

dxt3Alpha.alphas[0] =
quantize4(int(colors[0].w * 255.0 + .5)) << 0  |
quantize4(int(colors[1].w * 255.0 + .5)) << 4  |
quantize4(int(colors[2].w * 255.0 + .5)) << 8  |
quantize4(int(colors[3].w * 255.0 + .5)) << 12 |
quantize4(int(colors[4].w * 255.0 + .5)) << 16 |
quantize4(int(colors[5].w * 255.0 + .5)) << 20 |
quantize4(int(colors[6].w * 255.0 + .5)) << 24 |
quantize4(int(colors[7].w * 255.0 + .5)) << 28;

dxt3Alpha.alphas[1] =
quantize4(int(colors[8].w * 255.0 + .5)) << 0  |
quantize4(int(colors[9].w * 255.0 + .5)) << 4  |
quantize4(int(colors[10].w * 255.0 + .5)) << 8  |
quantize4(int(colors[11].w * 255.0 + .5)) << 12 |
quantize4(int(colors[12].w * 255.0 + .5)) << 16 |
quantize4(int(colors[13].w * 255.0 + .5)) << 20 |
quantize4(int(colors[14].w * 255.0 + .5)) << 24 |
quantize4(int(colors[15].w * 255.0 + .5)) << 28;

return dxt3Alpha;
}


wrapper of    ((unsigned int)  quantize4(...))       possibly needed  if the  shift is wonky

### #3wodinoneeye

Posted 12 October 2013 - 02:23 AM

"I wrote the shifting logic that way because I've been burnt by a micro-coded shift-by-variable instruction, where shifting by a constant takes 1 cycle, but shifting by a variable breaks down into a little for loop that shifts by 1 n times, over n*k cycles... I can't actually remember whether this is the case on modern PC CPUs or not

"

But why all the bulking up the cose with  unsigned int(4)   unsigned int(8)   when  4 , 8 ...     is all you need  The shift only takes ints so you could unclutter it a bit at least for that

Also since you own this function couldnt you shorten the variable names  (its really fairly obvious what it does and is quite repetative and a simple comment would tell any less knowlegable person what it does

also  what is  the guts of   quantize4()   and could you imbed it even further to chop out extraneous operations ???

A table lookup for a non linear conversion function with a byte domain  (feeding it the float to int calc directly as subscript)  ???

or mutate the inline quantize4()   function  with the    * 255.0 + .5  imbedded inside it return an unsigned int from it and imbed the whole call right into the  shift equation sequence    (probably need to speed testcompare  it to see if any such condensations  make any diference besides looking cleaner  .... getting rid of the intermediary variables....)

heh,  you could also reuse  the 0 -7 qA variables for the second set 8 -15

something less bulky like

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{
unsigned int    qA0,qA1,qA2,QA3,qA4,qA5,qA6,qA7;
DXT3AlphaBlock  dxt3Alpha;

qA0 = quantize4(int(colors[0].w * 255.0 + .5));
qA1 = quantize4(int(colors[1].w * 255.0 + .5));
qA2 = quantize4(int(colors[2].w * 255.0 + .5));
qA3 = quantize4(int(colors[3].w * 255.0 + .5));
qA4 = quantize4(int(colors[4].w * 255.0 + .5));
qA5 = quantize4(int(colors[5].w * 255.0 + .5));
qA6 = quantize4(int(colors[6].w * 255.0 + .5));
qA7 = quantize4(int(colors[7].w * 255.0 + .5));

dxt3Alpha.alphas[0] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

qA0 = quantize4(int(colors[8].w * 255.0 + .5));
qA1 = quantize4(int(colors[9].w * 255.0 + .5));
qA2 = quantize4(int(colors[10].w * 255.0 + .5));
qA3 = quantize4(int(colors[11].w * 255.0 + .5));
qA4 = quantize4(int(colors[12].w * 255.0 + .5));
qA5 = quantize4(int(colors[13].w * 255.0 + .5));
qA6 = quantize4(int(colors[14].w * 255.0 + .5));
qA7 = quantize4(int(colors[15].w * 255.0 + .5));

dxt3Alpha.alphas[1] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

return dxt3Alpha;
}



This is a function taht looks like it will be crunching alot of bulk data for texture conversion  so  doing such (and similar)  optimization could add up for the actuual programs

---

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{

DXT3AlphaBlock  dxt3Alpha;

dxt3Alpha.alphas[0] =
quantize4(int(colors[0].w * 255.0 + .5)) << 0  |
quantize4(int(colors[1].w * 255.0 + .5)) << 4  |
quantize4(int(colors[2].w * 255.0 + .5)) << 8  |
quantize4(int(colors[3].w * 255.0 + .5)) << 12 |
quantize4(int(colors[4].w * 255.0 + .5)) << 16 |
quantize4(int(colors[5].w * 255.0 + .5)) << 20 |
quantize4(int(colors[6].w * 255.0 + .5)) << 24 |
quantize4(int(colors[7].w * 255.0 + .5)) << 28;

dxt3Alpha.alphas[1] =
quantize4(int(colors[8].w * 255.0 + .5)) << 0  |
quantize4(int(colors[9].w * 255.0 + .5)) << 4  |
quantize4(int(colors[10].w * 255.0 + .5)) << 8  |
quantize4(int(colors[11].w * 255.0 + .5)) << 12 |
quantize4(int(colors[12].w * 255.0 + .5)) << 16 |
quantize4(int(colors[13].w * 255.0 + .5)) << 20 |
quantize4(int(colors[14].w * 255.0 + .5)) << 24 |
quantize4(int(colors[15].w * 255.0 + .5)) << 28;

return dxt3Alpha;
}


### #2wodinoneeye

Posted 12 October 2013 - 02:13 AM

"I wrote the shifting logic that way because I've been burnt by a micro-coded shift-by-variable instruction, where shifting by a constant takes 1 cycle, but shifting by a variable breaks down into a little for loop that shifts by 1 n times, over n*k cycles... I can't actually remember whether this is the case on modern PC CPUs or not

"

But why all the bulking up the cose with  unsigned int(4)   unsigned int(8)   when  4 , 8 ...     is all you need  The shift only takes ints so you could unclutter it a bit at least for that

Also since you own this function couldnt you shorten the variable names  (its really fairly obvious what it does and is quite repetative and a simple comment would tell any less knowlegable person what it does

also  what is  the guts of   quantize4()   and could you imbed it even further to chop out extraneous operations ???

A table lookup for a non linear conversion function with a byte domain  (feeding it the float to int calc directly as subscript)  ???

or mutate the inline quantize4()   function  with the    * 255.0 + .5  imbedded inside it return an unsigned int from it and imbed the whole call right into the  shift equation sequence    (probably need to speed testcompare  it to see if any such condensations  make any diference besides looking cleaner  .... getting rid of the intermediary variables....)

heh,  you could also reuse  the 0 -7 qA variables for the second set 8 -15

something less bulky like

DXT3AlphaBlock compressDXT3Alpha(vec4 colors[16])
{
unsigned int    qA0,qA1,qA2,QA3,qA4,qA5,qA6,qA7;
DXT3AlphaBlock  dxt3Alpha;

qA0 = quantize4(int(colors[0].w * 255.0 + .5));
qA1 = quantize4(int(colors[1].w * 255.0 + .5));
qA2 = quantize4(int(colors[2].w * 255.0 + .5));
qA3 = quantize4(int(colors[3].w * 255.0 + .5));
qA4 = quantize4(int(colors[4].w * 255.0 + .5));
qA5 = quantize4(int(colors[5].w * 255.0 + .5));
qA6 = quantize4(int(colors[6].w * 255.0 + .5));
qA7 = quantize4(int(colors[7].w * 255.0 + .5));

dxt3Alpha.alphas[0] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

qA0 = quantize4(int(colors[8].w * 255.0 + .5));
qA1 = quantize4(int(colors[9].w * 255.0 + .5));
qA2 = quantize4(int(colors[10].w * 255.0 + .5));
qA3 = quantize4(int(colors[11].w * 255.0 + .5));
qA4 = quantize4(int(colors[12].w * 255.0 + .5));
qA5 = quantize4(int(colors[13].w * 255.0 + .5));
qA6 = quantize4(int(colors[14].w * 255.0 + .5));
qA7 = quantize4(int(colors[15].w * 255.0 + .5));

dxt3Alpha.alphas[1] =
qA0 << 0  |
qA1 << 4  |
qA2 << 8  |
qA3 << 12 |
qA4 << 16 |
qA5 << 20 |
qA6 << 24 |
qA7 << 28;

return dxt3Alpha;
}



This is a function taht looks like it will be crunching alot of bulk data for texture conversion  so  doing such (and similar)  optimization could add up for the actuual programs

### #1wodinoneeye

Posted 12 October 2013 - 01:41 AM

"I wrote the shifting logic that way because I've been burnt by a micro-coded shift-by-variable instruction, where shifting by a constant takes 1 cycle, but shifting by a variable breaks down into a little for loop that shifts by 1 n times, over n*k cycles... I can't actually remember whether this is the case on modern PC CPUs or not

"

But why all the bulking up the cose with  unsigned int(4)   unsigned int(8)   when  4 , 8 ...     is all you need  The shift only takes ints so you could unclutter it a bit at least for that

PARTNERS