Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualHodgman

Posted 06 February 2013 - 07:24 AM

feeling like it, went and wrote a "vaguely fast" DXT1 encoder (DXT1F, where the F means "Fast").

the goal here was mostly to convert reasonably quickly, with image quality not necessarily being as much of a high priority.
as-is, I don't know of any good ways to make it notably faster (I know a few possible ways, but they aren't pretty).

Nice work smile.png

If you still find yourself interested in DXT1F:

* The code looks like it would be possible to port over to entirely use the SSE registers instead of general purpose int ones, which might reduce the instruction count a lot.

* You could also make it so that the user can use multiple threads to perform the processing -- e.g. Instead of (or as well as) having an API that encodes a whole image at once, you could add two extra parameters -- the row to begin working from, and the row to end on (which should both be multiples of 4, or whatever the block size is). The user could then call that function multiple times with different start/end parameters on different threads to produce different rows of blocks concurrently.


#1Hodgman

Posted 06 February 2013 - 07:22 AM

feeling like it, went and wrote a "vaguely fast" DXT1 encoder (DXT1F, where the F means "Fast").

the goal here was mostly to convert reasonably quickly, with image quality not necessarily being as much of a high priority.
as-is, I don't know of any good ways to make it notably faster (I know a few possible ways, but they aren't pretty).

Nice work :)

 

The code looks like it would be possible to port over to entirely use the SSE registers instead of general purpose int ones, which might reduce the instruction count a lot.

 

You could also make it so that the user can use multiple threads to perform the processing -- e.g. instead of (or as well as) having an API that encodes a whole image at once, you could add two extra parameters -- the vertical row to begin working from, and the row to end on (which should both be multiples of 4, or whatever the block size is). The user could then call that function multiple times with different start/end parameters on different threads to produce different rows of blocks concurrently.


PARTNERS