Sign in to follow this  

OpenGL Carmack's Virtualized Textures (QuakeCon 2007)

This topic is 3717 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Well, I started a topic about this a year or two back, but at that point there was next to no information to go on. John has given us a bit more information about the technique, and since it's one that I'm very curious about, I'd like the resurrect the subject. Most of you have probably heard about idTech5, id Softwares new engine which will be powering their next game, Rage. The primary new feature that they've been showcasing is "Virtualize Textures", which is basically an paging system for textures that will determine the textures (and mipmap level) that are required to render the current frame and load only those into memory. In "non-geek" terms, this essentially means that you can have infinite texture detail with no performance hit. The only previous algorithm that I'm aware of that attempts something like this is Clipmapping, but it is designed to work only with perturbed plane (like a heightmap) whereas this method apparently works on any surface. So how do you think it's being accomplished? During Carmacks Quakecon 07 keynotes (Keynote, Q&A) he mentioned a few things that should help determine how: 1) It's being done with DirectX 9 level hardware (OpenGL on the PC), so no DX10-only techniques are needed 2) Carmack said that the engine wasn't going to be ported to the Wii, because it wasn't designed for the hardware. He also said, however, that the memory/processing requirements were pretty modest. That would imply that approach is reasonably shader dependent. (Fairly obvious, but made more so by the Wii being fixed function only) 3)One of the more interesting bits of the QA session was when he mentioned that theoretically every scene could be rendered with only 3 draw calls, and that the only reason it wasn't was for culling granularity. This was possible, he said, because (among other things) the virtualized texture system naturally created a texture atlas. So essentially it would seem that he is allocating one or two large textures and manually loading texture portions into different segments of it. 4) Combining the two above items, the obvious approach would be that while the mesh stores the "standard" texture coordinates the shaders that it is run through do a lookup into the texture atlas and modify the coordinates to point to the appropriate sub-map. Pretty logical. Everything above makes sense to me, but there's one big missing piece: How to determine what textures and the mip levels of those texture to load? My best guess is that you could do a pre-pass of the scene using a specially color-coded texture (which is sampled normally) where each mip level uses a different color. You could then use the color that's actually rendered to determine which mip level to read in. This approach wouldn't work directly unless every mesh used the same sized texture, though, which would defeat the purpous. Also, you would need to combine it with another pass that told you which textures are actually needed for the scene, and that seems like a lot of pre-pass for what is supposed to be a low-impact technique. So what are your thoughts? Anyone see something I don't?

Share this post


Link to post
Share on other sites
Hm... good question. From the sounds of it, it would seem like he only ever has cached the appropriate mip level for any given texture anyway. If that's the case, and you weren't letting the GPU do the mip generation, wouldn't that potentially eliminate most border artifacts? If not, I don't suppose it would be too much hassle to simply surround everything with a 1 pixel wide border.

Share this post


Link to post
Share on other sites
Quote:
One of the more interesting bits of the QA session was when he mentioned that theoretically every scene could be rendered with only 3 draw calls, and that the only reason it wasn't was for culling granularity. This was possible, he said, because (among other things) the virtualized texture system naturally created a texture atlas. So essentially it would seem that he is allocating one or two large textures and manually loading texture portions into different segments of it.

I don't see how these mega-textures can result in every scene getting rendered in only 3 draw calls, since texture switches are not the only batch-breaking state changes and while creating texture atlases can result in less state switches, they can not totally remove them and decrease the number of DP calls to 3. Some of those state changes are not even texture-related. What about different vertex layouts, shaders, shader parameters or just every other state change?

Share this post


Link to post
Share on other sites
Quote:

One of the more interesting bits of the QA session was when he mentioned that theoretically every scene could be rendered with only 3 draw calls...

I'd say only 3 draw calls might be very theoretical but possible if you use uber shaders (much like the mega texture where many shaders are contained in one), not that many different parameters, hardware instancing, the same vertex layout etc.

In practice, as you said, there would be more state changes und thus more draw calls.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ashkan
I don't see how these mega-textures can result in every scene getting rendered in only 3 draw calls...

I think he's only talking about the static environment here, i.e. the racing track in the demo. That might be possible to draw with 3 draw calls: it's a textured terrain, so one shader, one vertex layout, and thus one huge texture. Note that the MegaTexture has all kinds of effect baked in.
Then, for actors etc. you'd need some extra draw calls, since you need a different shader, vertex formats that support skinning etc.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lord_Evil
I'd say only 3 draw calls might be very theoretical but possible if you use uber shaders (much like the mega texture where many shaders are contained in one), not that many different parameters, hardware instancing, the same vertex layout etc.

In practice, as you said, there would be more state changes und thus more draw calls.


Actually, from what he said in the keynotes, it sounds like you're pretty dead on.

I'll attempt to transcribe a bit:

Quote:
John Carmack (Quakecon '07) - 24:20 in the Q&A vid linked above
"It turns out that the entire world, just about, could be drawn with three draw calls in this: One for land surfaces, one for non-land opaque surfaces, and one for non-land translucent surfaces. Almost everything end up being done like that.

One of the interesting things is that in addition to virtualizing the textures to let you have kind of unlimited resources, it's also the ultimate infinite atlasing system where, since everything is using the same virtual texture page pool, they can all use the same material. And you can wind up taking, you know, in this room you would have the podiums, the floors, the stands, the lights. All of these things wind up, while they're separate models and you can move them around and position them separately, but when you're ready to "go fast" they all get built into the same model. So you've got one draw call because they all use the same virtual materials on there.

The only reason you end up breaking it up is for culling granularity. You really literally could have three calls that draw just about everything except for a few glowing things and the sky and the few things that are not conventional surfaces in there. But you still want to break it up so you can do culling on visibility and frustum culling and things like that."


I'm guessing that when he says "the lights" as part of the models, he's referring to the physical light casings, not lights in the illuminating, shadow casting sense :)

Share this post


Link to post
Share on other sites
I still don`t get one thing. does he need to create new texture(atlas) every frame and then send it to GPU? it would consume some time.

also, wouldn`t it be better to for example, divide level using some grid, have unique texture and mesh for each grid unit, and then simply load/free these resources as player moves? hmm, i`m not so sure it would work well, becouse i think Oblivion used such aproach and landscape that was far away used low-res textures and it looked blurred.(http://oblivion.bonusweb.cz/obrazek.html?obrazek=oblivion20030604.jpg) it consumed a lots of memory too. what do you guys think?

Share this post


Link to post
Share on other sites
Quote:
Original post by MassacrerAL
I still don`t get one thing. does he need to create new texture(atlas) every frame and then send it to GPU? it would consume some time.

I think you highly underestimate how much data would have to be sent and how fast AGP8x/PCIe bandwidth is. Even on a PCI board, if you had a 1024x768 screen and only diffuse textures, you'd only need 16MB of memory, and enough bandwidth is provided that you could do an entire refresh of the atlas in less than a sixtieth of a second.
Quote:
also, wouldn`t it be better to for example, divide level using some grid, have unique texture and mesh for each grid unit, and then simply load/free these resources as player moves? hmm, i`m not so sure it would work well, becouse i think Oblivion used such aproach and landscape that was far away used low-res textures and it looked blurred.(http://oblivion.bonusweb.cz/obrazek.html?obrazek=oblivion20030604.jpg) it consumed a lots of memory too. what do you guys think?

The virtual texturing of idtech5, if it is what I think it is, is much more interesting than that, since it's able to take into account partial/entire occlusion of geometry, and is able to do much more precise determinisim of what mips of what parts of the texture is needed. With a proper virtual texturing setup like what I think idtech5 has, you could have nigh-infininte resolution textures with multiple channels (i.e. diffuse, normal, specular coef, spec exponent, fresnel coef/exponent, and hell probably even scattering coef/exponent) on a 64MB video card while still having space leftover to do some shadow mapping, store a low-res (e.g. 640x480 or 800x600?) backbuffer with some AA, and other miscellaneous stuff. Like he mentions in the keynote, it really is appalling that GPU makers are even thinking about going beyond 512MB when maybe half of that is really required for 1080p resolutions.

Quote:
Everything above makes sense to me, but there's one big missing piece: How to determine what textures and the mip levels of those texture to load? My best guess is that you could do a pre-pass of the scene using a specially color-coded texture (which is sampled normally) where each mip level uses a different color. You could then use the color that's actually rendered to determine which mip level to read in. This approach wouldn't work directly unless every mesh used the same sized texture, though, which would defeat the purpous. Also, you would need to combine it with another pass that told you which textures are actually needed for the scene, and that seems like a lot of pre-pass for what is supposed to be a low-impact technique.

I have a pretty good idea on what the answer to that question is, but I haven't had time to make a demo that uses it to fully convince myself that the answer does work (fwiw, the idea would only need probably DX7 level tech). The idea that you've suggested probably wouldn't work though, since you need to analyze those results in a manner.

Share this post


Link to post
Share on other sites
Quote:
Original post by Cypher19

I have a pretty good idea on what the answer to that question is, but I haven't had time to make a demo that uses it to fully convince myself that the answer does work (fwiw, the idea would only need probably DX7 level tech).


Really? DX7? I'd be very interested in hearing more about that idea, wether or not it actually works :)

And yes, I know that you would have to read the results of my idea somehow. I meant to imply that it could be rendered to a texture then read back out, which isn't terribly uncommon, but which also isn't terribly fast. :P I really don't think that's it.

Share this post


Link to post
Share on other sites
As long as the pixel shaders don't change the scene (such as depth info), you could easily make a 'software renderer' that could quickly evaluate what texture levels and resolutions are needed for each visible part of a model. It wouldn't even need to actually render anything - it just needs to do some of the calculations to determine where the polygons lie in screenspace and which mipmap should be applied. You also don't need to use a high-poly version of the model - the lowest poly version you have will probably suffice.

This avoids pretty much all the problems with things like GPU RAM being slow for the CPU to read from, etc.

On the other hand, if you were going to create a texture atlas manually each frame, you'd need either a perfectly accurate 'software renderer' or a _MUCH_ larger texture than the screen. Realistically, somewhere in middle (rather accurate statistics from the software pass, somewhat larger texture) is probably what you'd need to use.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
As long as the pixel shaders don't change the scene (such as depth info), you could easily make a 'software renderer' that could quickly evaluate what texture levels and resolutions are needed for each visible part of a model. It wouldn't even need to actually render anything - it just needs to do some of the calculations to determine where the polygons lie in screenspace and which mipmap should be applied. You also don't need to use a high-poly version of the model - the lowest poly version you have will probably suffice.

That's acutally an idea I was entertaining for quite awhile to determine visibility of each texture 'page', but I abandoned it for two reasons:
1) It'd probably just be too damn slow because you'd have to render the scene at full resolution. If your final screen is 1920x1200, you can't have 640x480 soft-rast going on because if you missed a small bit of a page, you'd probably end up with some popping going on as a polygon is revealed (e.g. if you have a simple 6-wall room with a pillar in the middle, and the camera is circle-strafing around the pillar, as the parts of the room behind the pillar are revealed you'll see accesses into parts of the virtual texutre that you haven't allocated/uploaded yet because the softrast thinks that area was occluded by the pillar).
2) Your results probably owuldn't match what the hardware/driver is going to render, so you might have some over/underestimations of waht's needed going on. Overestimation isn't that bad; so what, you load in a higher-res mip a bit early. Underestimation IS, though, because once the softrast finally decides that a higher-res mip than what's loaded in is necessary, and the video card is currently magnifying that lower-res mip alreaedy in the texture pool, you'll consistently get popin as your camera moves around the world, or triangles face the camera more and more.

By the way, one thing I should note, and I find to be a more interesting problem to solve than determining page visibility is page _arrangement_. One thing I tried, since it is the most obvious solution, is to just say "okay, I've got this 32x32 page of a texture, I'll dump it somewhere in the virtual texture pool, and my meshes will use an indirection texture to translate the coordinates to the pageI want to look up". However, that only works with point filtering. Once you start including linear filtering, yeah, you get some colour bleeding. Upgrade taht to anisotropic filtering, and things just go to hell. Because the hardware expects texture coordinates to be continuous, so that it knows what the aniso lookup pattern will be, the rendering results will be completely wrong because your texture coordinates are non-continuous. The hardware will instead stretch the aniso filtering pattern all the way across the virtual texture pool, between the page you're trying to look up and the page that (in terms of the final render) is adjacent to the one you're looking up. This thread from awhile ago was a part of my attempt to figure out what was going on: http://www.gamedev.net/community/forums/topic.asp?topic_id=393047 . These images demonstrate what was going on:

The hand-made unarranged texture I was using to test with:
Photo Sharing and Video Hosting at Photobucket
The artifact:
http://img.photobucket.com/albums/v627/Cypher19/artifactpage.jpg


The desired image (this was taken with point filtering):
http://img.photobucket.com/albums/v627/Cypher19/bestpointpage.jpg
Visualization of how the GPU is filtering those things:
http://img.photobucket.com/albums/v627/Cypher19/relthamexplan.jpg

[Edited by - Cypher19 on August 15, 2007 3:59:10 PM]

Share this post


Link to post
Share on other sites
To be honest, I just don't see how it's possible to render all the static/opaque geometry (culling issues excepted) in one draw call.

I understand the theory, parsing the scene before render time, extracting which textures are in use, their priorities, their mipmap level, etc.. but from those infos, where do you go ?

To be able to render with only one draw call, you must allocate all your textures in a big virtual one. Whatever you do you are still limited by the hardware/driver restrictions. Let's take an example: your max texture resolution is 2048x2048, and you have 16 TMUs, so you've got enough "virtual" texture space for 2048x2048x16 pixels, before being forced to switch the textures (hence a new draw call).

Since Carmack specifically says that you could render the scene in one draw call, unless I'm missing something (like being able to switch a texture in the middle of a draw call), you're limited to 16 2048^2 textures. That's around 64 MB of video memory.

What if I use 300 MB of data in the frame ? How do they get allocated into 16 2048^2 textures ?

How do you handle compressed textures ?

How do you handle texture tiling ? Or has it become obsolete with Megatextures, in the way that all textures are "unique", even if you don't want them to ? Or do you leave it to the shaders to do their own tiling ? If so, that means the engine can't run without pixel shaders, so I don't see how it could be implemented on DX7 level hardware or hardware without ps 2.0.

How do you offset the texture coordinates from the original texture, to the virtual texture ? You'd have to store a matrix for each object, meaning a switch of matrix (-> new render call), or you'd have to use an ID per vertex, upload matrices as shader constants and offset the tex coords in the shader. Then you need shaders hardware, plus you are limited to the amount of constants available.

Maybe a part of the answer is allocating the textures in a virtual 3D texture instead, but you still need to update the texture coordinates of each mesh to sample the correct layer in the volumetric texture.

Food for thought..

Y.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ysaneya

What if I use 300 MB of data in the frame ? How do they get allocated into 16 2048^2 textures ?

You only send down the fractions of each mip level that you need, basically.

Quote:
How do you handle compressed textures ?

I'm guessing you mean DXTn stuff. If it's possible to send down DXTn stuff in chunks to the GPU, then I don't see how that would be an issue. For other compressed stuff, just get a high-quality JPEG, or a PNG, decompress on the CPU, and then load that uncompressed data to the GPU as necessary. (This reminds me; earlier I mentioned "a 64 MB card would be handle to handle blah blah blah", that was in terms of uncompressed data).

Quote:
How do you handle texture tiling ? Or has it become obsolete with Megatextures, in the way that all textures are "unique", even if you don't want them to ? Or do you leave it to the shaders to do their own tiling ? If so, that means the engine can't run without pixel shaders, so I don't see how it could be implemented on DX7 level hardware or hardware without ps 2.0.

It's become obsolete. Infinite texture memory literally means that; go ahead and tile the texture in photoshop or using the content creation tools or whatever.

Quote:
Maybe a part of the answer is allocating the textures in a virtual 3D texture instead, but you still need to update the texture coordinates of each mesh to sample the correct layer in the volumetric texture.

Even if you could, how would handle filtering between page edges?

Share this post


Link to post
Share on other sites
The way I understood it, the megatexture is an all-encompassing texture atlas which contains a unique texture for every surface element in the world map. This way you can have scene painters working in the world like Carmack described. He also hinted that the current id technology doesn't enable the artists to sculpt the map the same way they're able to paint the environment as they run around the scene. I took this as an indication that during painting world geometry is static and thus the texture atlas arrangement may remain static while the artists paint texels.

If, during painting, the scene geometry needs to be modified (likely introducing new triangles) you'd have to re-arrange the texture atlas. I mean, in a system like this, you'd probably want to be able to give the system a texture budget, say 100K x 100K texels, along with the world geometry and then have the system make the best possible use of the available megatexture space (maximum texels per primitive). After that you'd give the result to the artists and let them paint.

It's clear that the whole megatexture won't fit into video memory, so, he must be maintaining a "working set" in video memory and I suppose this could be a local texture atlas which contains a superset of the textures for the visible primitives. I suppose you could maintain this working set by exploiting temporal coherence between frames and only update it as new textures (primitives) become visible.

Carmack also mentioned something about a test scene with a huge 100K x 100K megatexture. That pretty much means he's paging texture data in from disk. That'd would make a three level paging scheme disk<->main memory<->video memory.

I wouldn't know about the specifics of an actual implementation as far as GPU programming goes, though.

Anyway, interesting stuff indeed.

-- Jani

Share this post


Link to post
Share on other sites
Quote:
Original post by Ysaneya
To be honest, I just don't see how it's possible to render all the static/opaque geometry (culling issues excepted) in one draw call.

I understand the theory, parsing the scene before render time, extracting which textures are in use, their priorities, their mipmap level, etc.. but from those infos, where do you go ?

To be able to render with only one draw call, you must allocate all your textures in a big virtual one. Whatever you do you are still limited by the hardware/driver restrictions. Let's take an example: your max texture resolution is 2048x2048, and you have 16 TMUs, so you've got enough "virtual" texture space for 2048x2048x16 pixels, before being forced to switch the textures (hence a new draw call).

Since Carmack specifically says that you could render the scene in one draw call, unless I'm missing something (like being able to switch a texture in the middle of a draw call), you're limited to 16 2048^2 textures. That's around 64 MB of video memory.

The whole scene/level (static part) has one huge texture applied. I believe the demo level he has shown is using 128000x128000 texture. I am not sure how he unwraps all the data but it is all spacially coherent. So the terrain and whatever is ontop of it at some spot is stored very closly in the MegaTexture.

That huge texture obviously doesn't fit in RAM so he pages it in when needed. He could be using a 2048x2048 texture for the highest detail, 0 to 10m from the viewer. Then another 2048x2048 for the 10-50 meters and maybe a third and fourth for 50+ meters. The fragment shader then uses the 3 or 4 diffuse textures (he would also need the normal map and maybe some other textures), morphs the coords as needed and blends in between them. Check out clipmaps, very similar.

MegaTexture also stores the normal maps and other data, I believe some physics parameters...

Quote:
How do you handle compressed textures ?

MegaTexture is compressed, a lot. It's all one texture in the end, doesn't matter how individual textures are handled pre level compile.

Quote:
How do you handle texture tiling ? Or has it become obsolete with Megatextures, in the way that all textures are "unique", even if you don't want them to ? Or do you leave it to the shaders to do their own tiling ?

Tiling gets compiled into the one huge MegaTexture.

Don't know if this is covered in the keynote, haven't watched it yet but checkout the idTech5 walkthrough videos, especially the Tools video.

Part 1
Part 2
Tools

Share this post


Link to post
Share on other sites
From everything I've read (all I could find on MegaTextures) and watched (Carmacks speeches) the ground is made up of a massive texture, 128,000x128,000 (of course artists decides what size is needed, it can be smaller but not sure if this is the max size) pixels and is one continuous texture on the hard-drive that is paged in as needed. Also, each object in the world is made up of a MegaTexture, so the guys in the game. But I'm quite positive they're a lot smaller textures (guess here), say 2048x2048.

It also seems from what he has said, that the ultra high resolution texture is only used say right under the person, while lower quality is used for things further away and quality, seems to be on the fly, is degraded as you get further and further out (automatic mip-mapping). Since this works on the PS3 which has lower memory then the 360 that means that the high resolution maps will likely be lower resolution then on say the 360 which might be lower then on the PC. Seems like they're calculated in real time.

I believe the texture is created based on all the textures determined to be needed. These are then put into a large block of memory and uploaded. This is updated in real time for the sub parts that change from frame to frame and the changes uploaded to the GPU (likely not the whole texture just the sub sections). This massive texture is made up of static geometry's texture data and dynamic (say a person in game). Data is uploaded to the shader telling it where its texture(s) are in the larger texture.

It's a really interesting concept. The tools video from QuakeCon showed a lot of information. Like the texture could be tiled but that each pixel is saved, if you wanted to stamp out the ground with set tiles you could then custom paint between them, etc. They also had brushes like in MS Paint, you could resize etc.

I'm actually working on something similar, been for a few months but in the process of putting our house on the market so I have so little spare times its not even funny. I had a demo with a 8GB "generated" texture I was streaming from but the HD it was on died a very painful death and I did not, like an idiot, have that project backed up. I was just texturing static terrain and it looked pretty darn good and worked very well. Took me about a week to implement after I spent a week reading everything I could find where Carmack discussed it.

I do want to know what type of compression is used, from 80GB to 2 DVDs is pretty impressive. I was just using raw data myself.

Edit:

@nts -- That rocks, same basic information typed around the same basic time. When I hit reply I didn't see your posting. Awesome links!

[Edited by - Mike2343 on August 15, 2007 7:17:24 PM]

Share this post


Link to post
Share on other sites

I find this all very confusing. I share Ysaneya's questions and I'm not even sure I understand the basic concept, much less how the implementation should work. Going from this part of the transcription, I got a dim idea of how it works in my mind. Could any of you verify if I'm on the right track?

Quote:
And you can wind up taking, you know, in this room you would have the podiums, the floors, the stands, the lights. All of these things wind up, while they're separate models and you can move them around and position them separately, but when you're ready to "go fast" they all get built into the same model. So you've got one draw call because they all use the same virtual materials on there.


It sounds like it's mainly meant to work for relatively static objects. At some point during build-time or sparsely at runtime when some scene has been loaded or altered, all static objects are accumulated into one big batch buffer. To make this work for multiple objects/textures, in bare essence an UVW map is generated for this entire buffer, unwrapping the per-object textures to the scene-wide 'megatexture'. Is that the basic idea of this technique?

Share this post


Link to post
Share on other sites
Remigius: Yeah, based on the quote, it does seems like the modellers first sculpt the static geometry part of the world, then some kind of a processing step is run over the model which constructs this huge texture atlas (the mega texture) which contains a texture for each triangle. As long as the geometry remains unaltered, so does the organization of the texture atlas. After this processing, the artists can start painting the world, in other words, updating the texels in the atlas.

The texture atlas can be organized in such a way that the textures of triangles spatially close to each other are close to each other in the texture atlas as well. I suppose you could tage advantage of this during rendering when trying to figure out which (sub)textures of the atlas to maintain in the current working set.

-- Jani

Share this post


Link to post
Share on other sites
I feel like the quote I posted earlier was slightly misleading: I view it as Carmack describing a nice side effect of the texturing system, not a core component of it.

It's worth noting that it's been mentioned that this texturing technique applies to both static and animated meshes. In one of the tech demo videos they point out that the store owner guy with the big hat uses a 2k*2k texture just for his face. Carmack mentions that this may be a little wasteful, but also says that it really doesn't matter because it doesn't affect the game performance at all. We're also told that the artists can create those meshes however they want, using pretty much any tools they want (I would imagine they have something like a Collada importer), which tends to imply that the textures for most models are not done in a pre-built atlas. (The artists probably wouldn't have been too happy with that!)

Now that's just for individual meshes, the landscape may be a special case, but it does highlight the fact that they must be building the atlas on the fly and, subsequently, determining texture visibility and detail level on the fly. Those two elements are what makes the system so impressive in my view.

@born49: Thank you for posting that paper! Very interesting indeed, and quite probably related to what the idTech5 is doing.

Share this post


Link to post
Share on other sites
Quote:
It sounds like it's mainly meant to work for relatively static objects. At some point during build-time or sparsely at runtime when some scene has been loaded or altered, all static objects are accumulated into one big batch buffer. To make this work for multiple objects/textures, in bare essence an UVW map is generated for this entire buffer, unwrapping the per-object textures to the scene-wide 'megatexture'. Is that the basic idea of this technique?


No. The basic idea is it's one massive texture. Think one massive jpeg or png file (with layers like PSP files but they get baked in, with a history it seems, but this is more file format then technique so we won't get into that). This is for ANY object in the world. It can be (I do believe he mentioned 128,000 x 128,000 being the max side but I'm sure thats just a "set" limit) any size. Like he said the shop vendors face is like 2048x2048. It sounds like the the engine/tools in real time modify this size to whatever the hardware can handle. It's to give the artist total freedom. Not having to worry about anything but making the map/level/area look amazing. You can modify the geometry at any point the texture just gets remapped from the sounds of it.

He said, depending on the resolution, it could take up to 5 minutes to save artist changes for the area of the texture they've "checked out". As he said several artists can work on the map at a time, it's likely they check out sections and can only modify those. It would make sense and what I was doing before I got distracted.

Since all 4 systems, Win32, 360, PS3 and MacOS share the same data set (from the same server) I believe the engine in real time changes the level of detail to suit each systems specifications. So since the PS3 technically has less memory available to each game then say the 360 (the PS3 OS uses more memory) it likely has a lower level of detail on the textures then the 360/PC/Mac.

Mr. Carmack has said several times that the system was very easy to implement and was not all that complex. The hardest part he said was writing the shader and that didn't even take that long.

How I did it was fill the texture units with the max size texture atlases I could and updated them as rarely as possible. And even more rare was full changes, I mostly updated sub sections as needed. I did try and keep static and close objects in the same atlas(s). Say terrain took up 2 texture units and the second one was only 1/2 used I'd put in static objects textures in there also.

But as someone else said and to make sure people understand, ALL objects can have a mega texture applied to it. There is no limit to how many textures are used or how large they are. The engine manages all of those details, leaving the artists to do what they do best.

Hope that helps.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mike2343
...

Hope that helps.


Sorry, it doesn't. I'm probably just dumb, but most of Carmacks things or treatises on them I read haven't been too clear to me. Judging by the confusion in this thread though, I'm not the only one who isn't instantly enlightened by his talks [smile]


Quote:

No. The basic idea is it's one massive texture. Think one massive jpeg or png file (with layers like PSP files but they get baked in, with a history it seems, but this is more file format then technique so we won't get into that). This is for ANY object in the world.


Ok... then how does this work toward drawing the scene in 3 draw calls? From my limited understanding, this sounds more like the original megatextures idea, which seems to be 'just' a part of the virtualized textures scheme. As far as I get it, virtualized textures is a fancy paging system to actually put all this data to good use.

I'd wish Carmack would provide sample implementations instead of making us all guess for what he meant [wink]

Share this post


Link to post
Share on other sites

This topic is 3717 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628645
    • Total Posts
      2984015
  • Similar Content

    • By Vortez
      Hi guys, im having a little problem fixing a bug in my program since i multi-threaded it. The app is a little video converter i wrote for fun. To help you understand the problem, ill first explain how the program is made. Im using Delphi to do the GUI/Windows part of the code, then im loading a c++ dll for the video conversion. The problem is not related to the video conversion, but with OpenGL only. The code work like this:

       
      DWORD WINAPI JobThread(void *params) { for each files { ... _ConvertVideo(input_name, output_name); } } void EXP_FUNC _ConvertVideo(char *input_fname, char *output_fname) { // Note that im re-initializing and cleaning up OpenGL each time this function is called... CGLEngine GLEngine; ... // Initialize OpenGL GLEngine.Initialize(render_wnd); GLEngine.CreateTexture(dst_width, dst_height, 4); // decode the video and render the frames... for each frames { ... GLEngine.UpdateTexture(pY, pU, pV); GLEngine.Render(); } cleanup: GLEngine.DeleteTexture(); GLEngine.Shutdown(); // video cleanup code... }  
      With a single thread, everything work fine. The problem arise when im starting the thread for a second time, nothing get rendered, but the encoding work fine. For example, if i start the thread with 3 files to process, all of them render fine, but if i start the thread again (with the same batch of files or not...), OpenGL fail to render anything.
      Im pretty sure it has something to do with the rendering context (or maybe the window DC?). Here a snippet of my OpenGL class:
      bool CGLEngine::Initialize(HWND hWnd) { hDC = GetDC(hWnd); if(!SetupPixelFormatDescriptor(hDC)){ ReleaseDC(hWnd, hDC); return false; } hRC = wglCreateContext(hDC); wglMakeCurrent(hDC, hRC); // more code ... return true; } void CGLEngine::Shutdown() { // some code... if(hRC){wglDeleteContext(hRC);} if(hDC){ReleaseDC(hWnd, hDC);} hDC = hRC = NULL; }  
      The full source code is available here. The most relevant files are:
      -OpenGL class (header / source)
      -Main code (header / source)
       
      Thx in advance if anyone can help me.
    • By DiligentDev
      This article uses material originally posted on Diligent Graphics web site.
      Introduction
      Graphics APIs have come a long way from small set of basic commands allowing limited control of configurable stages of early 3D accelerators to very low-level programming interfaces exposing almost every aspect of the underlying graphics hardware. Next-generation APIs, Direct3D12 by Microsoft and Vulkan by Khronos are relatively new and have only started getting widespread adoption and support from hardware vendors, while Direct3D11 and OpenGL are still considered industry standard. New APIs can provide substantial performance and functional improvements, but may not be supported by older hardware. An application targeting wide range of platforms needs to support Direct3D11 and OpenGL. New APIs will not give any advantage when used with old paradigms. It is totally possible to add Direct3D12 support to an existing renderer by implementing Direct3D11 interface through Direct3D12, but this will give zero benefits. Instead, new approaches and rendering architectures that leverage flexibility provided by the next-generation APIs are expected to be developed.
      There are at least four APIs (Direct3D11, Direct3D12, OpenGL/GLES, Vulkan, plus Apple's Metal for iOS and osX platforms) that a cross-platform 3D application may need to support. Writing separate code paths for all APIs is clearly not an option for any real-world application and the need for a cross-platform graphics abstraction layer is evident. The following is the list of requirements that I believe such layer needs to satisfy:
      Lightweight abstractions: the API should be as close to the underlying native APIs as possible to allow an application leverage all available low-level functionality. In many cases this requirement is difficult to achieve because specific features exposed by different APIs may vary considerably. Low performance overhead: the abstraction layer needs to be efficient from performance point of view. If it introduces considerable amount of overhead, there is no point in using it. Convenience: the API needs to be convenient to use. It needs to assist developers in achieving their goals not limiting their control of the graphics hardware. Multithreading: ability to efficiently parallelize work is in the core of Direct3D12 and Vulkan and one of the main selling points of the new APIs. Support for multithreading in a cross-platform layer is a must. Extensibility: no matter how well the API is designed, it still introduces some level of abstraction. In some cases the most efficient way to implement certain functionality is to directly use native API. The abstraction layer needs to provide seamless interoperability with the underlying native APIs to provide a way for the app to add features that may be missing. Diligent Engine is designed to solve these problems. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common C++ front-end for all supported platforms and provides interoperability with underlying native APIs. It also supports integration with Unity and is designed to be used as graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. Full source code is available for download at GitHub and is free to use.
      Overview
      Diligent Engine API takes some features from Direct3D11 and Direct3D12 as well as introduces new concepts to hide certain platform-specific details and make the system easy to use. It contains the following main components:
      Render device (IRenderDevice  interface) is responsible for creating all other objects (textures, buffers, shaders, pipeline states, etc.).
      Device context (IDeviceContext interface) is the main interface for recording rendering commands. Similar to Direct3D11, there are immediate context and deferred contexts (which in Direct3D11 implementation map directly to the corresponding context types). Immediate context combines command queue and command list recording functionality. It records commands and submits the command list for execution when it contains sufficient number of commands. Deferred contexts are designed to only record command lists that can be submitted for execution through the immediate context.
      An alternative way to design the API would be to expose command queue and command lists directly. This approach however does not map well to Direct3D11 and OpenGL. Besides, some functionality (such as dynamic descriptor allocation) can be much more efficiently implemented when it is known that a command list is recorded by a certain deferred context from some thread.
      The approach taken in the engine does not limit scalability as the application is expected to create one deferred context per thread, and internally every deferred context records a command list in lock-free fashion. At the same time this approach maps well to older APIs.
      In current implementation, only one immediate context that uses default graphics command queue is created. To support multiple GPUs or multiple command queue types (compute, copy, etc.), it is natural to have one immediate contexts per queue. Cross-context synchronization utilities will be necessary.
      Swap Chain (ISwapChain interface). Swap chain interface represents a chain of back buffers and is responsible for showing the final rendered image on the screen.
      Render device, device contexts and swap chain are created during the engine initialization.
      Resources (ITexture and IBuffer interfaces). There are two types of resources - textures and buffers. There are many different texture types (2D textures, 3D textures, texture array, cubmepas, etc.) that can all be represented by ITexture interface.
      Resources Views (ITextureView and IBufferView interfaces). While textures and buffers are mere data containers, texture views and buffer views describe how the data should be interpreted. For instance, a 2D texture can be used as a render target for rendering commands or as a shader resource.
      Pipeline State (IPipelineState interface). GPU pipeline contains many configurable stages (depth-stencil, rasterizer and blend states, different shader stage, etc.). Direct3D11 uses coarse-grain objects to set all stage parameters at once (for instance, a rasterizer object encompasses all rasterizer attributes), while OpenGL contains myriad functions to fine-grain control every individual attribute of every stage. Both methods do not map very well to modern graphics hardware that combines all states into one monolithic state under the hood. Direct3D12 directly exposes pipeline state object in the API, and Diligent Engine uses the same approach.
      Shader Resource Binding (IShaderResourceBinding interface). Shaders are programs that run on the GPU. Shaders may access various resources (textures and buffers), and setting correspondence between shader variables and actual resources is called resource binding. Resource binding implementation varies considerably between different API. Diligent Engine introduces a new object called shader resource binding that encompasses all resources needed by all shaders in a certain pipeline state.
      API Basics
      Creating Resources
      Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. Graphics APIs usually have a native object that represents linear buffer. Diligent Engine uses IBuffer interface as an abstraction for a native buffer. To create a buffer, one needs to populate BufferDesc structure and call IRenderDevice::CreateBuffer() method as in the following example:
      BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); While there is usually just one buffer object, different APIs use very different approaches to represent textures. For instance, in Direct3D11, there are ID3D11Texture1D, ID3D11Texture2D, and ID3D11Texture3D objects. In OpenGL, there is individual object for every texture dimension (1D, 2D, 3D, Cube), which may be a texture array, which may also be multisampled (i.e. GL_TEXTURE_2D_MULTISAMPLE_ARRAY). As a result there are nine different GL texture types that Diligent Engine may create under the hood. In Direct3D12, there is only one resource interface. Diligent Engine hides all these details in ITexture interface. There is only one  IRenderDevice::CreateTexture() method that is capable of creating all texture types. Dimension, format, array size and all other parameters are specified by the members of the TextureDesc structure:
      TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); If native API supports multithreaded resource creation, textures and buffers can be created by multiple threads simultaneously.
      Interoperability with native API provides access to the native buffer/texture objects and also allows creating Diligent Engine objects from native handles. It allows applications seamlessly integrate native API-specific code with Diligent Engine.
      Next-generation APIs allow fine level-control over how resources are allocated. Diligent Engine does not currently expose this functionality, but it can be added by implementing IResourceAllocator interface that encapsulates specifics of resource allocation and providing this interface to CreateBuffer() or CreateTexture() methods. If null is provided, default allocator should be used.
      Initializing the Pipeline State
      As it was mentioned earlier, Diligent Engine follows next-gen APIs to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.). This approach maps directly to Direct3D12/Vulkan, but is also beneficial for older APIs as it eliminates pipeline misconfiguration errors. With many individual calls tweaking various GPU pipeline settings it is very easy to forget to set one of the states or assume the stage is already properly configured when in fact it is not. Using pipeline state object helps avoid these problems as all stages are configured at once.
      Creating Shaders
      While in earlier APIs shaders were bound separately, in the next-generation APIs as well as in Diligent Engine shaders are part of the pipeline state object. The biggest challenge when authoring shaders is that Direct3D and OpenGL/Vulkan use different shader languages (while Apple uses yet another language in their Metal API). Maintaining two versions of every shader is not an option for real applications and Diligent Engine implements shader source code converter that allows shaders authored in HLSL to be translated to GLSL. To create a shader, one needs to populate ShaderCreationAttribs structure. SourceLanguage member of this structure tells the system which language the shader is authored in:
      SHADER_SOURCE_LANGUAGE_DEFAULT - The shader source language matches the underlying graphics API: HLSL for Direct3D11/Direct3D12 mode, and GLSL for OpenGL and OpenGLES modes. SHADER_SOURCE_LANGUAGE_HLSL - The shader source is in HLSL. For OpenGL and OpenGLES modes, the source code will be converted to GLSL. SHADER_SOURCE_LANGUAGE_GLSL - The shader source is in GLSL. There is currently no GLSL to HLSL converter, so this value should only be used for OpenGL and OpenGLES modes. There are two ways to provide the shader source code. The first way is to use Source member. The second way is to provide a file path in FilePath member. Since the engine is entirely decoupled from the platform and the host file system is platform-dependent, the structure exposes pShaderSourceStreamFactory member that is intended to provide the engine access to the file system. If FilePath is provided, shader source factory must also be provided. If the shader source contains any #include directives, the source stream factory will also be used to load these files. The engine provides default implementation for every supported platform that should be sufficient in most cases. Custom implementation can be provided when needed.
      When sampling a texture in a shader, the texture sampler was traditionally specified as separate object that was bound to the pipeline at run time or set as part of the texture object itself. However, in most cases it is known beforehand what kind of sampler will be used in the shader. Next-generation APIs expose new type of sampler called static sampler that can be initialized directly in the pipeline state. Diligent Engine exposes this functionality: when creating a shader, textures can be assigned static samplers. If static sampler is assigned, it will always be used instead of the one initialized in the texture shader resource view. To initialize static samplers, prepare an array of StaticSamplerDesc structures and initialize StaticSamplers and NumStaticSamplers members. Static samplers are more efficient and it is highly recommended to use them whenever possible. On older APIs, static samplers are emulated via generic sampler objects.
      The following is an example of shader initialization:
      ShaderCreationAttribs Attrs; Attrs.Desc.Name = "MyPixelShader"; Attrs.FilePath = "MyShaderFile.fx"; Attrs.SearchDirectories = "shaders;shaders\\inc;"; Attrs.EntryPoint = "MyPixelShader"; Attrs.Desc.ShaderType = SHADER_TYPE_PIXEL; Attrs.SourceLanguage = SHADER_SOURCE_LANGUAGE_HLSL; BasicShaderSourceStreamFactory BasicSSSFactory(Attrs.SearchDirectories); Attrs.pShaderSourceStreamFactory = &BasicSSSFactory; ShaderVariableDesc ShaderVars[] = {     {"g_StaticTexture", SHADER_VARIABLE_TYPE_STATIC},     {"g_MutableTexture", SHADER_VARIABLE_TYPE_MUTABLE},     {"g_DynamicTexture", SHADER_VARIABLE_TYPE_DYNAMIC} }; Attrs.Desc.VariableDesc = ShaderVars; Attrs.Desc.NumVariables = _countof(ShaderVars); Attrs.Desc.DefaultVariableType = SHADER_VARIABLE_TYPE_STATIC; StaticSamplerDesc StaticSampler; StaticSampler.Desc.MinFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MagFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MipFilter = FILTER_TYPE_LINEAR; StaticSampler.TextureName = "g_MutableTexture"; Attrs.Desc.NumStaticSamplers = 1; Attrs.Desc.StaticSamplers = &StaticSampler; ShaderMacroHelper Macros; Macros.AddShaderMacro("USE_SHADOWS", 1); Macros.AddShaderMacro("NUM_SHADOW_SAMPLES", 4); Macros.Finalize(); Attrs.Macros = Macros; RefCntAutoPtr<IShader> pShader; m_pDevice->CreateShader( Attrs, &pShader );
      Creating the Pipeline State Object
      After all required shaders are created, the rest of the fields of the PipelineStateDesc structure provide depth-stencil, rasterizer, and blend state descriptions, the number and format of render targets, input layout format, etc. For instance, rasterizer state can be described as follows:
      PipelineStateDesc PSODesc; RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; RasterizerDesc.AntialiasedLineEnable = False; Depth-stencil and blend states are defined in a similar fashion.
      Another important thing that pipeline state object encompasses is the input layout description that defines how inputs to the vertex shader, which is the very first shader stage, should be read from the memory. Input layout may define several vertex streams that contain values of different formats and sizes:
      // Define input layout InputLayoutDesc &Layout = PSODesc.GraphicsPipeline.InputLayout; LayoutElement TextLayoutElems[] = {     LayoutElement( 0, 0, 3, VT_FLOAT32, False ),     LayoutElement( 1, 0, 4, VT_UINT8, True ),     LayoutElement( 2, 0, 2, VT_FLOAT32, False ), }; Layout.LayoutElements = TextLayoutElems; Layout.NumElements = _countof( TextLayoutElems ); Finally, pipeline state defines primitive topology type. When all required members are initialized, a pipeline state object can be created by IRenderDevice::CreatePipelineState() method:
      // Define shader and primitive topology PSODesc.GraphicsPipeline.PrimitiveTopologyType = PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; PSODesc.GraphicsPipeline.pVS = pVertexShader; PSODesc.GraphicsPipeline.pPS = pPixelShader; PSODesc.Name = "My pipeline state"; m_pDev->CreatePipelineState(PSODesc, &m_pPSO); When PSO object is bound to the pipeline, the engine invokes all API-specific commands to set all states specified by the object. In case of Direct3D12 this maps directly to setting the D3D12 PSO object. In case of Direct3D11, this involves setting individual state objects (such as rasterizer and blend states), shaders, input layout etc. In case of OpenGL, this requires a number of fine-grain state tweaking calls. Diligent Engine keeps track of currently bound states and only calls functions to update these states that have actually changed.
      Binding Shader Resources
      Direct3D11 and OpenGL utilize fine-grain resource binding models, where an application binds individual buffers and textures to certain shader or program resource binding slots. Direct3D12 uses a very different approach, where resource descriptors are grouped into tables, and an application can bind all resources in the table at once by setting the table in the command list. Resource binding model in Diligent Engine is designed to leverage this new method. It introduces a new object called shader resource binding that encapsulates all resource bindings required for all shaders in a certain pipeline state. It also introduces the classification of shader variables based on the frequency of expected change that helps the engine group them into tables under the hood:
      Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. Shader variable type must be specified during shader creation by populating an array of ShaderVariableDesc structures and initializing ShaderCreationAttribs::Desc::VariableDesc and ShaderCreationAttribs::Desc::NumVariables members (see example of shader creation above).
      Static variables cannot be changed once a resource is bound to the variable. They are bound directly to the shader object. For instance, a shadow map texture is not expected to change after it is created, so it can be bound directly to the shader:
      PixelShader->GetShaderVariable( "g_tex2DShadowMap" )->Set( pShadowMapSRV ); Mutable and dynamic variables are bound via a new Shader Resource Binding object (SRB) that is created by the pipeline state (IPipelineState::CreateShaderResourceBinding()):
      m_pPSO->CreateShaderResourceBinding(&m_pSRB); Note that an SRB is only compatible with the pipeline state it was created from. SRB object inherits all static bindings from shaders in the pipeline, but is not allowed to change them.
      Mutable resources can only be set once for every instance of a shader resource binding. Such resources are intended to define specific material properties. For instance, a diffuse texture for a specific material is not expected to change once the material is defined and can be set right after the SRB object has been created:
      m_pSRB->GetVariable(SHADER_TYPE_PIXEL, "tex2DDiffuse")->Set(pDiffuseTexSRV); In some cases it is necessary to bind a new resource to a variable every time a draw command is invoked. Such variables should be labeled as dynamic, which will allow setting them multiple times through the same SRB object:
      m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); Under the hood, the engine pre-allocates descriptor tables for static and mutable resources when an SRB objcet is created. Space for dynamic resources is dynamically allocated at run time. Static and mutable resources are thus more efficient and should be used whenever possible.
      As you can see, Diligent Engine does not expose low-level details of how resources are bound to shader variables. One reason for this is that these details are very different for various APIs. The other reason is that using low-level binding methods is extremely error-prone: it is very easy to forget to bind some resource, or bind incorrect resource such as bind a buffer to the variable that is in fact a texture, especially during shader development when everything changes fast. Diligent Engine instead relies on shader reflection system to automatically query the list of all shader variables. Grouping variables based on three types mentioned above allows the engine to create optimized layout and take heavy lifting of matching resources to API-specific resource location, register or descriptor in the table.
      This post gives more details about the resource binding model in Diligent Engine.
      Setting the Pipeline State and Committing Shader Resources
      Before any draw or compute command can be invoked, the pipeline state needs to be bound to the context:
      m_pContext->SetPipelineState(m_pPSO); Under the hood, the engine sets the internal PSO object in the command list or calls all the required native API functions to properly configure all pipeline stages.
      The next step is to bind all required shader resources to the GPU pipeline, which is accomplished by IDeviceContext::CommitShaderResources() method:
      m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); The method takes a pointer to the shader resource binding object and makes all resources the object holds available for the shaders. In the case of D3D12, this only requires setting appropriate descriptor tables in the command list. For older APIs, this typically requires setting all resources individually.
      Next-generation APIs require the application to track the state of every resource and explicitly inform the system about all state transitions. For instance, if a texture was used as render target before, while the next draw command is going to use it as shader resource, a transition barrier needs to be executed. Diligent Engine does the heavy lifting of state tracking.  When CommitShaderResources() method is called with COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES flag, the engine commits and transitions resources to correct states at the same time. Note that transitioning resources does introduce some overhead. The engine tracks state of every resource and it will not issue the barrier if the state is already correct. But checking resource state is an overhead that can sometimes be avoided. The engine provides IDeviceContext::TransitionShaderResources() method that only transitions resources:
      m_pContext->TransitionShaderResources(m_pPSO, m_pSRB); In some scenarios it is more efficient to transition resources once and then only commit them.
      Invoking Draw Command
      The final step is to set states that are not part of the PSO, such as render targets, vertex and index buffers. Diligent Engine uses Direct3D11-syle API that is translated to other native API calls under the hood:
      ITextureView *pRTVs[] = {m_pRTV}; m_pContext->SetRenderTargets(_countof( pRTVs ), pRTVs, m_pDSV); // Clear render target and depth buffer const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); m_pContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); Different native APIs use various set of function to execute draw commands depending on command details (if the command is indexed, instanced or both, what offsets in the source buffers are used etc.). For instance, there are 5 draw commands in Direct3D11 and more than 9 commands in OpenGL with something like glDrawElementsInstancedBaseVertexBaseInstance not uncommon. Diligent Engine hides all details with single IDeviceContext::Draw() method that takes takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
      DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); For compute commands, there is IDeviceContext::DispatchCompute() method that takes DispatchComputeAttribs structure that defines compute grid dimension.
      Source Code
      Full engine source code is available on GitHub and is free to use. The repository contains two samples, asteroids performance benchmark and example Unity project that uses Diligent Engine in native plugin.
      AntTweakBar sample is Diligent Engine’s “Hello World” example.

       
      Atmospheric scattering sample is a more advanced example. It demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to multiple render targets, using compute shaders and unordered access views, etc.

      Asteroids performance benchmark is based on this demo developed by Intel. It renders 50,000 unique textured asteroids and allows comparing performance of Direct3D11 and Direct3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures.

      Finally, there is an example project that shows how Diligent Engine can be integrated with Unity.

      Future Work
      The engine is under active development. It currently supports Windows desktop, Universal Windows and Android platforms. Direct3D11, Direct3D12, OpenGL/GLES backends are now feature complete. Vulkan backend is coming next, and support for more platforms is planned.
    • By michaeldodis
      I've started building a small library, that can render pie menu GUI in legacy opengl, planning to add some traditional elements of course.
      It's interface is similar to something you'd see in IMGUI. It's written in C.
      Early version of the library
      I'd really love to hear anyone's thoughts on this, any suggestions on what features you'd want to see in a library like this? 
      Thanks in advance!
    • By Michael Aganier
      I have this 2D game which currently eats up to 200k draw calls per frame. The performance is acceptable, but I want a lot more than that. I need to batch my sprite drawing, but I'm not sure what's the best way in OpenGL 3.3 (to keep compatibility with older machines).
      Each individual sprite move independently almost every frame and their is a variety of textures and animations. What's the fastest way to render a lot of dynamic sprites? Should I map all my data to the GPU and update it all the time? Should I setup my data in the RAM and send it to the GPU all at once? Should I use one draw call per sprite and let the matrices apply the transformations or should I compute the transformations in a world vbo on the CPU so that they can be rendered by a single draw call?
    • By zolgoz
      Hi!

      I've recently started with opengl and just managed to write my first shader using the phong model. I want to learn more but I don't know where to begin, does anyone know of good articles or algorithms to start with?
      Thanks in advance.
  • Popular Now