[Release] DXTn Compression Algorithm (Revisited)

Started by
9 comments, last by L. Spiro 12 years, 1 month ago
Some may remember my previous release of a DXTn algorithm from here.
It was then pointed out that I had some work to do to improve the results and remove some artifacts.

I took a break from it for a long while in order to work on other areas of my engine, but I came back to it a few weeks ago and ironed out all of the bugs.

My MSE is now quite close to that of ATI’s The Compressonator, and my article explains why it is not possible to get the same MSE or lower.
Then again, MSE is not the end-all-beat-all way to determine the quality of the images, and in most cases my results have a lot of for-the-better differences.


Most of the changes I made were to the way I compared the quality of sets of colors. Most of the other changes were just minor tweaks, so the original concept of using 2 layers of linear regression is solid; it just needs to be implemented with care.

I will release an open-source tool at some point in the future. I have no doubts that others could build upon this concept and make further improvements.

My new article explains the algorithm more clearly. Enjoy.
http://lspiroengine.com/?p=312


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Advertisement
Wow, great work, it's even quite hard to tell the original and compressed one apart at a distance.


Good work, great to see you stick with it and improve your algorithm.

The end notes about ATI/nVidia decompression routines being slightly different is also an interesting consideration...
Thank you both.

I remember a topic from a few months ago in which I was accused of wasting my time reinventing wheels. It did take more time than I expected but it was certainly rewarding, and I was able to uncover some important facts that could be beneficial to a lot of people/companies when considering their image quality.

Mr. Gotanda was also not aware of the differences in how NVIDIA decodes their images, and by my calculations this special decoding method was very likely to have been invented specifically for the PlayStation 3. Had my company been aware of this difference before, the quality of their PlayStation 3 textures could have been improved.

The ATI results differ by more than 1 value in many places too, which suggests more than just truncation. I wish they would publish their decoding method. It’s not like it will help NVIDIA or anything, but it would help developers striving for better image quality.


If the ATI decompression method were exposed, then a “perfect” tool could be made to tailor to each of their decompression methods individually.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


Thank you both.

I remember a topic from a few months ago in which I was accused of wasting my time reinventing wheels. It did take more time than I expected but it was certainly rewarding, and I was able to uncover some important facts that could be beneficial to a lot of people/companies when considering their image quality.

Mr. Gotanda was also not aware of the differences in how NVIDIA decodes their images, and by my calculations this special decoding method was very likely to have been invented specifically for the PlayStation 3. Had my company been aware of this difference before, the quality of their PlayStation 3 textures could have been improved.

The ATI results differ by more than 1 value in many places too, which suggests more than just truncation. I wish they would publish their decoding method. It’s not like it will help NVIDIA or anything, but it would help developers striving for better image quality.


If the ATI decompression method were exposed, then a “perfect” tool could be made to tailor to each of their decompression methods individually.


L. Spiro


I found it quite interesting that different GPUs have different percentages for the interpolated colors, would never have guessed that.

I'm curious if it's something that could be reasonably improved by simply embedding different "color pairs" for each block in a texture, rather than necessarily generating a unique texture for each different hardware, as to be able to compensate at load time for the (three?) common PC hardware configurations. I imagine one could possibly even allow the algorithm to generate some unique blocks if the algorithm deems it a significant improvement (rather than just another color pair).

Or are the hardware differences really minor in practice and that the primary effect is perhaps only really observed in mathematical measurements and not perceptually?


As for AMD decoding, shouldn't it be quite easy to just generate a bunch of "hand coded" blocks with specific gradients and then look at what AMD outputs? (assuming you have an AMD card) ... it would seem to me like there can't be anything really complicated going on behind the scenes, that wouldn't be "easily" understood by just a bit of testing. Of course there may be differences between different models... EDIT: After looking at the nVidia-implementation, I take it back!

EDIT: Couldn't find any actual numbers, but it would be interesting if the percentages wasn't symmetrical, as one could then also exploit the order of the two colors as a further optimization.


The format only allows one set of data per block. The format is simple in order to make fetches faster. The best route is to generate two images, but I am mainly thinking about consoles when I say that. If a PC game has too many textures it may be worth it to take the hit.

As for how noticeable it is, well, I certainly noticed it when I first tried it. Of course I had already been studying that image for a long while gathering all the types of artifacts that needed to be eliminated. But it is also more noticeable on some other images, especially cartoon ones.


That would be a possible way of going about it, but I will save that exercise for the reader.


Before leaving work today I pulled a coworker over to my desk. I said, “This is the original image. Below that there are 2 more images. One was made by my tool and one was made by ATI. Which one do you think looks better?”
He got close and stared for a long while, unable to decide. Finally he pointed at mine (unknowingly) and said it looks better because the “FREE-TO-PLAY PVP MMORPG” looks jaggier in the ATI result.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


The format only allows one set of data per block. The format is simple in order to make fetches faster. The best route is to generate two images, but I am mainly thinking about consoles when I say that. If a PC game has too many textures it may be worth it to take the hit.

As for how noticeable it is, well, I certainly noticed it when I first tried it. Of course I had already been studying that image for a long while gathering all the types of artifacts that needed to be eliminated. But it is also more noticeable on some other images, especially cartoon ones.


That would be a possible way of going about it, but I will save that exercise for the reader.


Before leaving work today I pulled a coworker over to my desk. I said, “This is the original image. Below that there are 2 more images. One was made by my tool and one was made by ATI. Which one do you think looks better?”
He got close and stared for a long while, unable to decide. Finally he pointed at mine (unknowingly) and said it looks better because the “FREE-TO-PLAY PVP MMORPG” looks jaggier in the ATI result.


L. Spiro


Indeed the DXT format is really rather fixed, but it seems to me like it would be pretty much trivial to splice together a proper DXT texture at load time, so the texture stored on the disk could then instead be generated with some "average" decoder in-mind, and then prepended to that are a bunch of alternate color pairs and blocks, which are used to replace the color pairs/blocks in the "base texture" based on what GPU is used by the host computer.

How practical and useful it is in practice, I don't know, but it seems like if there are some 3-4 different DXT decoders for PCs, generating and distributing a unique texture for each of them would be quite wasteful if one could instead just replace blocks or color sets so that the final texture produce close to the same results.

Perhaps one could even just go ahead and generate unique textures for the different target decoders, select one as a primary and XOR all the other textures against that, and apply some really cheap compression. Perfect results for all decoders, that should hopefully end up really compressible.

But of course I realize that it may be of no interest to you ;)


It is certainly an interesting idea to patch together the final result before committing it to the hardware. I already use a proprietary image format with the extension LSI, and even DXT data is encoded within it in a smaller form, so, since all code paths have both an encoder and decoder, it is no problem for me to add another layer of code to check which GPU is being used and stitch together the final image.

But unfortunately I came to a dim conclusion.

For it to be worth the extra effort, a large proportion of the image would have to be patched. If a large proportion of the image has to be patched it is easier to just use a second copy.

If fewer areas need to be patched, more code has to be added to determine the cut-off point. But patching fewer areas does not justify the extra cost in implementing and tuning just where that cut-off point should be, which ultimately may be highly subjective and change on a per-image basis.


As for me, I have an entire next-generation game engine to finish, so I don’t really have the time to figure out how ATI decodes DXT images and implement an option to encode for ATI and NVIDIA (as apposed to encoding for “Generic and NVIDIA”).

Also my company wants to use this algorithm for their upcoming games so I need to focus more on making the actual exporter tool now.
When the CEO asked me, “What is your plan for this algorithm? Keep it a secret or open-source it?”, I replied by saying that I had planned to open-source it, but if he wanted to use it in-house and have me not disclose the algorithm (beyond what I had already) I wouldn’t.”
He laughed and said, “No no, I encourage everyone to share their new techniques and technology.”

Pretty nice attitude.

So I will focus on making an actual tool now and leave the enhancements up to those who wish to dig through the code.
If someone finds a way to tweak it to beat the MSE of ATI, I will be perfectly satisfied even if it is not me. As long as it does not become MSE-centric, as ATI seems to have become. They are willing to introduce some fairly horrible artifacts for the sake of lowering that number.
With the proper settings, I can get that image to have an MSE below 4.92 in The Compressonator, but the artifacts are ridiculous and blatantly obvious. I simply can’t understand why a computer thinks we would perceive that image as closer to the original.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This is pretty awesome work! How fast is your program? About the same as NVidia's? MSE stats are interesting; it would be cool to see SSIM stats too.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
I am interested in this as my hardware is not capable of generating mipmaps from DXTn textures. It seems I have to uncompress DXTn textures, shrink the size and compress them again to make mipmaps, all on the CPU during init.

This topic is closed to new replies.

Advertisement