Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!


1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Texture formats


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
25 replies to this topic

#1 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 07 December 2013 - 11:15 PM

Reading around on the Internet there are a few tidbits I have come across several times, they are:

1. Modern GPUs don't have hardware support for uncompressed RGB textures. They will be converted to RGBA internally because the GPU doesn't like 3 component textures.

2. The best transfer format to use (especially on Windows?) is GL_BGRA. If you use GL_RGBA, the driver will have to swizzle your texture data when you call glTexImage2d, slowing the performance.

I've read both of those countless places it seems and so I decided to corroborate this by using ARB_internalformat_query2. I did the appropriate glGetInternalformativ calls with GL_INTERNALFORMAT_PREFERRED and GL_TEXTURE_IMAGE_FORMAT. What I got from this was different from what I expected considering the things I read.

According to glGetInternalformativ the internal format used when you ask for GL_RGB8 is GL_RGB8, the optimum transfer format for GL_RGB8 is GL_RGB and the optimum transfer format for GL_RGBA8 is GL_RGBA. So, is what I read outdated, or is my graphics driver lying to me?

I am using a AMD HD 5850 with the latest drivers.



Sponsor:

#2 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 08 December 2013 - 03:03 AM

RGBA, BGRA, or some other 32-bit format will be the internal format instead of RGB (24-bit). Processors generally don't operate on 24 bits at a time, so they'd pick a 32-bit format. But it's still faster to send the data across the incredibly slow bus as RGB and let the incredibly fast GPU convert it to RGBA for you.

 

 

As for RGBA and BGRA, I'd like to think video cards handle both at the same speed by now. I honestly haven't checked though. You might check if the tools you're using really are giving you RGBA, and not BGRA. I kind of thought BGRA was picked in the first place so we wouldn't have to reorder bytes before we DMA it over to the video card. You probably want to stick with the actual memory layout used by your tools so you can memcpy instead of iterating through every pixel and every color component copying them individually.

 

Edit: I realized my first paragraph is very misleading. It's only useful to worry about bus traffic being generated if you actually generate it, and the normal case when you have dedicated graphics memory is that color data doesn't change after you load it during the loading screen or whatever. Just keep the 24-bit option in your pocket when you know you'll be passing the data back and forth for any reason since it could easily be your performance bottleneck (obviously test to confirm you are bandwidth-limited).


Edited by richardurich, 08 December 2013 - 04:45 AM.


#3 Hodgman   Moderators   -  Reputation: 37549

Like
2Likes
Like

Posted 08 December 2013 - 04:47 AM

Regarding RGBA vs BGRA, I would make a strong guess that most modern GPUs would have swizzle functionality built into the texture fetch hardware, so if the data is stored in one order, but the shader wants it in another order, the swizzling would be free. GPUs actually have a bunch of neat funcionality like this hidden away, so that they can implement both the GL and D3D APIs, given their differences -- such as GL vs D3D's difference in Z range, or D3D9 vs everything-else's pixel centre coordinates...

I would guess that the Windows obsession with BGRA is probably a legacy of their software-rendered desktop manager, which probably chose BGRA ordering arbitrarily and then forced all other software to comply with it.

 

Not sure about your other question. When the driver says that the actual internal format is RGB, maybe it's reporting that because it's actually using "RGBX" or "XRGB" (i.e. RGB with a padding byte), but this format doesn't exist in the GL enumerations?



#4 mhagain   Crossbones+   -  Reputation: 9208

Like
2Likes
Like

Posted 08 December 2013 - 06:39 AM

The best way as always is to test, and it's simple enough to knock up a quick program to test various combinations and see where the performance is.

 

glTexImage on it's own is not good for testing with because the driver needs to set up a texture, perform various internal checks, allocate storage, etc.  The overhead of that is likely to overwhelm any actual transfer performance.

 

So using glTexSubImage you can more reasonably isolate transfer performance, and your test program will initially specify a texture (using glTexImage), then perform a bunch of timed glTexSubImage calls.  By swapping out the parameters you can get a good feel for which combinations are the best to use.

 

For OpenGL the internalFormat parameter is the only one that specifies the format of the texture itself; the format and type parameters have absolutely nothing to do with the texture format, and instead describe the data that you're sending in the last parameter of your glTex(Sub)Image call.  This is made clearer if you use the newer glTexStorage API (which only takes internalFormat to describe the texture).

 

So there are 3 factors at work here:

  • The internal format of the texture as it's stored by the GPU/driver.
  • The format of the data that you send when filling the texture.
  • Any conversion steps that the driver needs to do in order to convert the latter to the former.

In theory the most suitable combination to use is one that allows the driver to do the equivalent of a straight memcpy, whereas the worst will be one that forces the driver to allocate a new block of temp storage, move the data component-by-component into that new block, fixing it up as it goes, then do it's "equivalent of memcpy" thing, finally releasing the temp storage.

 

It's a few years since I've written such a program to benchmark transfers, but at the time, the combination of format GL_BGRA and type GL_UNSIGNED_INT_8_8_8_8_REV was fastest overall on all hardware.  That didn't mean that it was measurably faster on individual specific hardware, and when I broke it down by GPU vendor things got more interesting.  AMD was seemingly impervious to changes in these two parameters, so it didn't really matter which you used, you got similar performance with all of them.  NVIDIA was about 6x faster with format GL_BGRA than with GL_RGBA, but it didn't mind so much what you used for type.  Intel absolutely required both format and type to be set as I give, as it was about 40x faster if you used them.  With the optimal parameters identified and in place, NVIDIA was overall fastest, then Intel, finally AMD.  What all of this underlines is the danger of testing a single vendor's hardware in isolation; you really do have to test on and balance things out between all vendors.

 

Regarding data volume versus data conversion overheads, at the time I had tested data conversion was by far the largest bottleneck so it was a more than fair tradeoff to accept the extra 8 unused bytes per texel in exchange for a faster transfer.  That may be changed on more recent hardware but I don't have up to date figures.


Edited by mhagain, 08 December 2013 - 06:43 AM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#5 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 08 December 2013 - 02:16 PM

Regarding RGBA vs BGRA, I would make a strong guess that most modern GPUs would have swizzle functionality built into the texture fetch hardware, so if the data is stored in one order, but the shader wants it in another order, the swizzling would be free. GPUs actually have a bunch of neat funcionality like this hidden away, so that they can implement both the GL and D3D APIs, given their differences -- such as GL vs D3D's difference in Z range, or D3D9 vs everything-else's pixel centre coordinates...

I would guess that the Windows obsession with BGRA is probably a legacy of their software-rendered desktop manager, which probably chose BGRA ordering arbitrarily and then forced all other software to comply with it.

 

Not sure about your other question. When the driver says that the actual internal format is RGB, maybe it's reporting that because it's actually using "RGBX" or "XRGB" (i.e. RGB with a padding byte), but this format doesn't exist in the GL enumerations?

 

decided to omit a more detailed description, but IIRC, basically this (BGR / BGRA / BGRX) was what was used generally by the graphics hardware.

Windows likely followed suit mostly because this was what would be cheapest to draw into the graphics hardware's frame-buffer (no need to swap components, ...).

 

also, I suspect related, is if you put RGB in a hex number:

0xRRGGBB

then write it to memory in little-endian ordering:

BB GG RR 00

if the spare bits are used for alpha:

BB GG RR AA

 

 

on newer hardware, I don't really know.

but, it appears that not much has changed here.


Edited by BGB, 08 December 2013 - 02:29 PM.


#6 mark ds   Members   -  Reputation: 1653

Like
0Likes
Like

Posted 08 December 2013 - 02:31 PM

Although a little out of date, the following may be of interest - it's shows the internal formats used by NVidia on a range of different hardware.

 

https://developer.nvidia.com/content/nvidia-opengl-texture-formats



#7 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 08 December 2013 - 04:59 PM


RGBA, BGRA, or some other 32-bit format will be the internal format instead of RGB (24-bit). Processors generally don't operate on 24 bits at a time, so they'd pick a 32-bit format. But it's still faster to send the data across the incredibly slow bus as RGB and let the incredibly fast GPU convert it to RGBA for you.

 

GL_INTERNALFORMAT_PREFERRED is supposed to give you the internal format the the driver is actually going to use. So if GL_INTERNALFORMAT_PREFERRED returns GL_RGB then the driver is saying that is how it plans on storing the data internally. So either newer Radeon cards have hardware support for 24-bit texture formats, or AMD's implementation of ARB_internalformat_query2 has its pants on fire.

 


Not sure about your other question. When the driver says that the actual internal format is RGB, maybe it's reporting that because it's actually using "RGBX" or "XRGB" (i.e. RGB with a padding byte), but this format doesn't exist in the GL enumerations?

 

I have no clue. That thought had crossed my mind too, but it is only speculation. Is there some way of actually uploading a RGB texture and then seeing definitively how much GPU memory is being taken up by it?

 

For OpenGL the internalFormat parameter is the only one that specifies the format of the texture itself; the format and type parameters have absolutely nothing to do with the texture format, and instead describe the data that you're sending in the last parameter of your glTex(Sub)Image call.  This is made clearer if you use the newer glTexStorage API (which only takes internalFormat to describe the texture).

 

So there are 3 factors at work here:

  • The internal format of the texture as it's stored by the GPU/driver.
  • The format of the data that you send when filling the texture.
  • Any conversion steps that the driver needs to do in order to convert the latter to the former.

 

This is what confuses me. If the internal format is going to be RGBA then why would a transfer format of BGRA ever be faster than RGBA? Another source of confusion for me can be found here. Under "Texture only" it says that RGB8 is a "required format" that a OpenGL implementation must support. Is the word "support" very loose? i.e. your hardware could only support 32-bit RGBA floating point textures but the driver converts RGB8 textures to that so it counts as supported.

 

I guess I'm going to have to do testing as you said, but ideally this kind testing would not be necessary at all assuming that ARB_internalformat_query2 is present and gives good information. I was under the impression that was the whole point of this extension.


Edited by Chris_F, 08 December 2013 - 05:01 PM.


#8 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 08 December 2013 - 05:35 PM

Also, what is the difference between using GL_UNSIGNED_INT_8_8_8_8(_REV) and GL_UNSIGNED_BYTE?



#9 Brother Bob   Moderators   -  Reputation: 9238

Like
1Likes
Like

Posted 08 December 2013 - 05:48 PM

Also, what is the difference between using GL_UNSIGNED_INT_8_8_8_8(_REV) and GL_UNSIGNED_BYTE?

The packed format types store the color components of a pixel within a larger data type, while the non-packed format types store the color components linearly in memory.

 

For example, with UNSIGNED_INT_8_8_8_8, the data type of a pixel is assumed to be a GLuint, and the first color component is stored in bits 0 to 7, second in bits 8 to 15, and so on. The actual physical order in memory depends on the endianness of a GLuint, but you always know that the first component is the lower 8 bits of a GLuint. On the other hand, UNSIGNED_BYTE means that each color component is 8 bits and each color component is stored in consecutive memory locations.

 

The difference between packed and non-packed types is basically that packed types ensures that you can read color components from a type in an endian-safe order but that the physical memory storage is endian-dependent, and the non-packed formats ensures that the color components are stored in a specific format but that reading/writing components from a pixel as a whole is endian-dependent.

 

edit: And the _REV variants just reverse the order of the bit ranges, so that bits 24 to 31 is the first component, bits 16 to 23 is the second component, and so on.


Edited by Brother Bob, 08 December 2013 - 05:49 PM.


#10 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 08 December 2013 - 06:06 PM

Here is the spec information that explains why ARB_internalformat_query2 exists, what it does, and such. I'm not sure where you heard the preferred formats are one-to-one mappings of internal formats actually used, but I've never heard that before.

 

And support for RGB can definitely be implemented internally as RGBA or BGRA or even GBAR if some hardware manufacturer really wanted to go crazy. We've already had video cards that did implement 3-component color data internally as 4-component. I assume the newer cards are more flexible as a result of their support for general-purpose computations.



#11 mhagain   Crossbones+   -  Reputation: 9208

Like
1Likes
Like

Posted 08 December 2013 - 06:23 PM

GL_INTERNALFORMAT_PREFERRED is supposed to give you the internal format the the driver is actually going to use. So if GL_INTERNALFORMAT_PREFERRED returns GL_RGB then the driver is saying that is how it plans on storing the data internally. So either newer Radeon cards have hardware support for 24-bit texture formats, or AMD's implementation of ARB_internalformat_query2 has its pants on fire.

 

The specification is actually a lot looser than that, and internal formats just specify the minimum that the driver will give you; the driver is allowed to give you more.  Section 8.5.1 of the core GL 4.3 spec clarifies that this is even true of the required sized internal formats.

 

It's not surprising that you get "GL_RGB8" as the preferred internal format here as the behaviour of these internal formats is to read the r, g and b components of a texture during sampling, but always return 1 for alpha; the preferred internal format must match this behaviour, so it's one that has the same behaviour.  If it gave you an RGBA internal format instead, and if you actually used it, the behaviour during sampling may change (if, say, your source data had anything other than 255 in the alpha byte).

 

I think you're viewing all of this as if it were a lower-level description of what actually happens in hardware, whereas it's not.  We're still at a quite high-level abstraction here, and GL isn't specifying anything to do with hardware; it's specifying what the implementation does.

 

Think of this as being somewhat similar to malloc in C; if you need - say - 64 bytes allocated, malloc can satisfy that request by allocating exactly 64 bytes.  Or it can also satisfy it by allocating 128 bytes if that's friendlier for your hardware (perhaps by aligning the allocation to a hypothetical 128 byte wide cache line).


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#12 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 08 December 2013 - 06:54 PM


I think you're viewing all of this as if it were a lower-level description of what actually happens in hardware, whereas it's not. We're still at a quite high-level abstraction here, and GL isn't specifying anything to do with hardware; it's specifying what the implementation does.

 

Well that's a crying shame because having a way to query the implementation about the real details would actually be invaluable. I would love for the driver to be able to tell me what format it really wants through the API, that way I can convert it to that during program installation and rest easy at night knowing that the driver isn't going to be doing anything absurd behind my back every time I load a texture.



#13 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 09 December 2013 - 12:24 AM

OK, so I'm still uncertain of something. I tried GL_INTERNALFORMAT_PREFERRED again, this time with GL_COMPRESSED_RGB8_ETC2. I am 100% certain that this card doesn't have hardware support for ETC2 textures, which means the driver has to be converting them to uncompressed or maybe re-compressing them as something else like S3TC/BPTC. Despite that, I'm getting back GL_COMPRESSED_RGB8_ETC2 as the preferred internal format.

 

Surely this isn't correct behavior, otherwise what good is GL_INTERNALFORMAT_PREFERRED at all?



#14 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 09 December 2013 - 05:26 AM

Without ARB_internalformat_query2, drivers had no way to give you useful information. Even if they do not always provide useful hints to you, the fact that sometimes they can is incredibly useful. It's especially beneficial for integrated graphics solutions where you can still take huge performance hits by using a suboptimal format.

 

When the driver has to convert a compressed format on a more recent GPU, it may be using the same computational resources you use for shaders to do so rather than using dedicated hardware or using the CPU before sending it across the bus. I have absolutely no idea what your 5850 is doing with ETC2 specifically, but it is a DX11 card and so it does have pretty good computational abilities. Did you test ETC2 and find it is being decompressed by the x86 CPU before being sent to the video card? If that's happening, I agree it is weird for AMD to consider ETC2 a preferred format.



#15 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 09 December 2013 - 04:12 PM

Without ARB_internalformat_query2, drivers had no way to give you useful information. Even if they do not always provide useful hints to you, the fact that sometimes they can is incredibly useful. It's especially beneficial for integrated graphics solutions where you can still take huge performance hits by using a suboptimal format.

 

When the driver has to convert a compressed format on a more recent GPU, it may be using the same computational resources you use for shaders to do so rather than using dedicated hardware or using the CPU before sending it across the bus. I have absolutely no idea what your 5850 is doing with ETC2 specifically, but it is a DX11 card and so it does have pretty good computational abilities. Did you test ETC2 and find it is being decompressed by the x86 CPU before being sent to the video card? If that's happening, I agree it is weird for AMD to consider ETC2 a preferred format.

 

I have not yet tested it, but I don't see why it should be considered a preferred format even if it is using GPU compute to do the conversion. Either way it is sub optimal, and absolutely pointless. Compressing to ETC2 only to convert to uncompressed means you save no space in graphics memory and take an unnecessary hit in visual quality. Re-compressing to another compressed format means you loose even more quality.


Edited by Chris_F, 09 December 2013 - 04:13 PM.


#16 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 09 December 2013 - 08:22 PM

I have not yet tested it, but I don't see why it should be considered a preferred format even if it is using GPU compute to do the conversion. Either way it is sub optimal, and absolutely pointless. Compressing to ETC2 only to convert to uncompressed means you save no space in graphics memory and take an unnecessary hit in visual quality. Re-compressing to another compressed format means you loose even more quality.

 

 

Compression that is decoded on the card still provides a ton of benefits. It reduces the size of data transferred across the bus (a massive performance bottleneck). It reduces the size of your distributables.

 

Your query is basically saying "I have ETC2 assets, how do you want me to send them to you Mr. video card driver?" The driver doesn't know if you're going to be bottlenecked by bus bandwidth or not, so it wouldn't be right to tell you to send the data uncompressed. The driver also doesn't know if you're willing to trade even more loss of quality for performance, so it can't recommend a different compressed format. A new compressed format might even result in larger data sizes, going back to the bus bandwidth issue. The query most definitely cannot assume you still have access to the original art assets and can convert to a different compressed format from the original source, then change all necessary code and recompile to use a new format. So it tells you to use the compressed format you're already using.

 

What problem are you trying to solve anyways? From what you've said, I can't really tell what help to provide. It's obvious you're upset the query isn't giving you what you wanted, but people here can't really help you figure out how to get the information you want if you aren't even saying what it is you want.



#17 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 09 December 2013 - 08:54 PM

 

I have not yet tested it, but I don't see why it should be considered a preferred format even if it is using GPU compute to do the conversion. Either way it is sub optimal, and absolutely pointless. Compressing to ETC2 only to convert to uncompressed means you save no space in graphics memory and take an unnecessary hit in visual quality. Re-compressing to another compressed format means you loose even more quality.

 

 

Compression that is decoded on the card still provides a ton of benefits. It reduces the size of data transferred across the bus (a massive performance bottleneck). It reduces the size of your distributables.

 

Your query is basically saying "I have ETC2 assets, how do you want me to send them to you Mr. video card driver?" The driver doesn't know if you're going to be bottlenecked by bus bandwidth or not, so it wouldn't be right to tell you to send the data uncompressed. The driver also doesn't know if you're willing to trade even more loss of quality for performance, so it can't recommend a different compressed format. A new compressed format might even result in larger data sizes, going back to the bus bandwidth issue. The query most definitely cannot assume you still have access to the original art assets and can convert to a different compressed format from the original source, then change all necessary code and recompile to use a new format. So it tells you to use the compressed format you're already using.

 

What problem are you trying to solve anyways? From what you've said, I can't really tell what help to provide. It's obvious you're upset the query isn't giving you what you wanted, but people here can't really help you figure out how to get the information you want if you aren't even saying what it is you want.

 

 

Well, basically what I'd like to do is compress all of my assets using a lossy codec similar to JPEG for distribution. During installation I would like to find out what texture formats are actually supported by the hardware (the ones that will give the best performance and quality) and then do a one time conversion (e.g. It discovers support for ETC2, so it encodes JPEG -> ETC2. If the card doesn't support ETC2, then maybe it discovers support for RGTC and does JPEG -> RGTC instead.) What I don't want to do is JPEG -> ETC2 thinking the GPU supports it natively and then end up having the driver do ETC2 -> RGTC silently. If that is the case then it would be better for me to do JPEG -> RGTC from the start.


Edited by Chris_F, 09 December 2013 - 09:01 PM.


#18 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 10 December 2013 - 02:54 AM

You can use glHint on GL_TEXTURE_COMPRESSION_HINT, let the GPU pick the texture compression by using GL_COMPRESSED_RGB/A, then check what format was used.

 

If there's a more straightforward way, hopefully someone else can chime in.



#19 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 14 December 2013 - 02:08 PM

You can use glHint on GL_TEXTURE_COMPRESSION_HINT, let the GPU pick the texture compression by using GL_COMPRESSED_RGB/A, then check what format was used.

 

If there's a more straightforward way, hopefully someone else can chime in.

 

Ok, that didn't do much good. I used GL_COMPRESSED_SRGB_ALPHA with glTexImage2D and glGetTexLevelParameteriv is telling me the internalformat is GL_COMPRESSED_SRGB_ALPHA.



#20 Chris_F   Members   -  Reputation: 2637

Like
0Likes
Like

Posted 14 December 2013 - 10:49 PM

Is this yet another bug in AMD's drivers? I quote the GL 4.3 spec:

 


Generic compressed internal formats are never used directly as the internal for-
mats of texture images. If internalformat is one of the six generic compressed
internal formats, its value is replaced by the symbolic constant for a specific com-
pressed internal format of the GL’s choosing with the same base internal format.
If no specific compressed format is available, internalformat is instead replaced by
the corresponding base internal format. If internalformat is given as or mapped
to a specific compressed internal format, but the GL can not support images com-
pressed in the chosen internal format for any reason (e.g., the compression format
might not support 3D textures), internalformat is replaced by the corresponding
base internal format and the texture image will not be compressed by the GL.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS