Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


#Actualcr88192

Posted 09 February 2013 - 11:45 AM

For those interested in fast CPU side texture compression, say for streaming JPG->DXT etc., check out the code and article here: http://software.intel.com/en-us/vcsource/samples/dxt-compression
 
Some searching should also help find a host of other approaches. I'd not recommend this in general for loading textures in most situations as the better quality/performance ratios are likely simply using packed (Zip etc.) pre-compressed texture loading, but if you have huge amounts of detailed textures you need to stream this is a good approach.

felt curious, tested my own "DXT1F" encoder (with MSVC):

I am getting 112Mp/s if compiled with optimizations turned on ("/O2"), and 42Mp/s debug ("/Z7").

granted, this is single-threaded scalar code.
and it also assumes only 2 colors.


the version written to shim into the JPEG decoder (by itself) is actually pulling off 314Mp/s (180Mp/s debug).
still single-threaded scalar code.

the main difference is that it works on planar YUV input, and assumes 4:2:0, and requires less math than for RGBA (and we only need the min and max Y, UV=simple average).

(EDIT/ADD: after changing it to take on more of the work, basically working on raw IDCT block output, rather than precooked output, it is 170Mp/s optimized).

not entirely sure what a SIMD or multithreaded version would do here.

granted, this version would be used on the front-end of a JPEG decoder, which would make it slower.

(EDIT/ADD:
http://pastebin.com/emDK9jwc
http://pastebin.com/EyEY5W9P
)

CPU speed (my case) = 2.8 GHz.

or such...

#2cr88192

Posted 09 February 2013 - 11:03 AM

For those interested in fast CPU side texture compression, say for streaming JPG->DXT etc., check out the code and article here: http://software.intel.com/en-us/vcsource/samples/dxt-compression
 
Some searching should also help find a host of other approaches. I'd not recommend this in general for loading textures in most situations as the better quality/performance ratios are likely simply using packed (Zip etc.) pre-compressed texture loading, but if you have huge amounts of detailed textures you need to stream this is a good approach.

felt curious, tested my own "DXT1F" encoder (with MSVC):

I am getting 112Mp/s if compiled with optimizations turned on ("/O2"), and 42Mp/s debug ("/Z7").

granted, this is single-threaded scalar code.
and it also assumes only 2 colors.


the version written to shim into the JPEG decoder (by itself) is actually pulling off 314Mp/s (180Mp/s debug).
still single-threaded scalar code.

the main difference is that it works on planar YUV input, and assumes 4:2:0, and requires less math than for RGBA (and we only need the min and max Y, UV=simple average).

(EDIT/ADD: after changing it to take on more of the work, basically working on raw IDCT block output, rather than precooked output, it is 170Mp/s optimized).

not entirely sure what a SIMD or multithreaded version would do here.

granted, this version would be used on the front-end of a JPEG decoder, which would make it slower.


CPU speed (my case) = 2.8 GHz.

or such...

#1cr88192

Posted 09 February 2013 - 09:52 AM

For those interested in fast CPU side texture compression, say for streaming JPG->DXT etc., check out the code and article here: http://software.intel.com/en-us/vcsource/samples/dxt-compression
 
Some searching should also help find a host of other approaches. I'd not recommend this in general for loading textures in most situations as the better quality/performance ratios are likely simply using packed (Zip etc.) pre-compressed texture loading, but if you have huge amounts of detailed textures you need to stream this is a good approach.

felt curious, tested my own "DXT1F" encoder (with MSVC):

I am getting 112Mp/s if compiled with optimizations turned on ("/O2"), and 42Mp/s debug ("/Z7").

granted, this is single-threaded scalar code.
and it also assumes only 2 colors.


the version written to shim into the JPEG decoder (by itself) is actually pulling off 314Mp/s (180Mp/s debug).
still single-threaded scalar code.

the main difference is that it works on planar YUV input, and assumes 4:2:0, and requires less math than for RGBA (and we only need the min and max Y, UV=simple average).

not entirely sure what a SIMD or multithreaded version would do here.

granted, this version would be used on the front-end of a JPEG decoder, which would make it slower.


CPU speed (my case) = 2.8 GHz.

or such...

PARTNERS