Jump to content

  • Log In with Google      Sign In   
  • Create Account


#Actualcr88192

Posted 10 February 2013 - 08:13 PM


What seemed weird to me was that after decompressing it people wanted to use additional CPU time to recompress it as DXT just for speeding up the way from main memory to gpu memory which is much faster than the hdd anyway.

This may still be a valid reason.

Although you typically have about 8 GiB/s of bandwidth over PCIe, several hundreds of megabytes are still a non-neglegible share. If you do nothing else, that's no problem, but if you possibly have other stuff to transfer, it may be.

Transfers also have a fixed overhead and cause a complete GPU stall on typical present-day consumer hardware (the driver will either let the GPU render, or it will do a PCIe transfer, not both at the same time), so transferring a frame at a time is not an efficient solution. Transferring many frames at a time results in much better parallelism, however, it takes forbidding amounts of GPU memory. DXT compression alleviates that.


pretty much.

nothing here to say this is actually efficient (traditional animated textures are still generally a better solution in most cases, where my engine supports both).


another motivation is something I had suspected already (partly confirmed in benchmarks):
the direct YUV to DXTn route is apparently slightly faster than going all the way to raw RGB.

still need more code though to confirm that everything is actually working, probably followed by more fine-tuning.

(EDIT/ADD: sadly, it turns out my JPEG decoder isn't quite as fast as I had thought I had remembered, oh well...).


(EDIT/ADD 2: above, as-in, the current JPEG->DXT5 transcoding route pulls off only about 38Mp/s (optimized "/O2", ~ 20Mp/s debug), whereas previously I had thought I had remembered things being faster. (granted, am getting tempted to use SIMD intrinsics for a few things...).

note that current RGBA video frames have both an RGB(YUV) image and an embedded Alpha image (mono, also encoded as a JPEG).
both layers are decoded/transcoded and recombined into a composite DXT5 image.


while looking around online, did run across this article though:
http://www.nvidia.com/object/real-time-ycocg-dxt-compression.html
nifty idea... but granted this would need special shaders...).

#2cr88192

Posted 10 February 2013 - 01:49 AM


What seemed weird to me was that after decompressing it people wanted to use additional CPU time to recompress it as DXT just for speeding up the way from main memory to gpu memory which is much faster than the hdd anyway.

This may still be a valid reason.

Although you typically have about 8 GiB/s of bandwidth over PCIe, several hundreds of megabytes are still a non-neglegible share. If you do nothing else, that's no problem, but if you possibly have other stuff to transfer, it may be.

Transfers also have a fixed overhead and cause a complete GPU stall on typical present-day consumer hardware (the driver will either let the GPU render, or it will do a PCIe transfer, not both at the same time), so transferring a frame at a time is not an efficient solution. Transferring many frames at a time results in much better parallelism, however, it takes forbidding amounts of GPU memory. DXT compression alleviates that.


pretty much.

nothing here to say this is actually efficient (traditional animated textures are still generally a better solution in most cases, where my engine supports both).


another motivation is something I had suspected already (partly confirmed in benchmarks):
the direct YUV to DXTn route is apparently slightly faster than going all the way to raw RGB.

still need more code though to confirm that everything is actually working, probably followed by more fine-tuning.

(EDIT/ADD: sadly, it turns out my JPEG decoder isn't quite as fast as I had thought I had remembered, oh well...).

#1cr88192

Posted 09 February 2013 - 11:50 PM


What seemed weird to me was that after decompressing it people wanted to use additional CPU time to recompress it as DXT just for speeding up the way from main memory to gpu memory which is much faster than the hdd anyway.

This may still be a valid reason.

Although you typically have about 8 GiB/s of bandwidth over PCIe, several hundreds of megabytes are still a non-neglegible share. If you do nothing else, that's no problem, but if you possibly have other stuff to transfer, it may be.

Transfers also have a fixed overhead and cause a complete GPU stall on typical present-day consumer hardware (the driver will either let the GPU render, or it will do a PCIe transfer, not both at the same time), so transferring a frame at a time is not an efficient solution. Transferring many frames at a time results in much better parallelism, however, it takes forbidding amounts of GPU memory. DXT compression alleviates that.


pretty much.

nothing here to say this is actually efficient (traditional animated textures are still generally a better solution in most cases, where my engine supports both).


another motivation is something I had suspected already (partly confirmed in benchmarks):
the direct YUV to DXTn route is apparently slightly faster than going all the way to raw RGB.

still need more code though to confirm that everything is actually working, probably followed by more fine-tuning.

PARTNERS