About this blog
The story of a 19 year old kid and Direct3D. Bleu, camembert, brie, and swiss wait within!
Entries in this blog
So my GDNet+ subscription will be expiring fairly soon, and I don't think I'll renew immediately. I want to be in a position where I can acutally make consistent journal updates with at least SOME content in them.
I have done some more work on my virtual texturing project in the last couple months, which has been pretty fun. I had some issues along the way while (re-)developing some of the base concepts for it. For example, one significant issue was when I was experimenting with the way that texture lookups would be done. At first, I thought it would be feasible to have every texture page be stored in a large texture in a non-continuous manner, and in the pixel shader use an indirection texture to specify what blocks of a texture should be looked up at a certain texture coordinate. Not unlike page lookup tables on a virtual memory system, really. However, I ran into an extremely objectionable artifact, shown below:
(click for enlarged version)
At first, I, like just about everyone else I asked on the issue, thought it was related to texture bleeding, however as I investigated it carefully, this was not the case at all. Basically, it was due to the hardware doing anisotropic filtering in a manner that it is not at all used to, or designed for (if anyone wants a more detail explanation, I'll make a subsequent entry on it). I was quite taken aback that I managed to make any kind of artifact that was so close to the hardware, and one that basically no one I knew had seen before and could accurately guess as to what the problem was. I've since "solved" that issue, so I moved onto another one that I had to w*ork out: evaluating the visibility of individual pages on textures.
If you look through my journal posts from early in the year, you'll notice that I was initially going to try for a software rasterizer to figure out what texture blocks are necessary for a given frame. Since that time, I've decided to forget it partially due to the undesirably high CPU usage, and more importantly, because there would have likely been objectionable artifacts as the contents of the scene moved around due to the mismatched resolutions of the GPU rendering and the softrast. I did some pondering and have come up with a new, good, solution that I think will work really well. It's also fairly odd too I think, for example, the evaluation requires that all of the objects in the scene have to be rendered using a specific shader, and in the vertex shader, the position of the vertex is determined entirely using the inputted texture coordinates. Like above, if anyone wants more information on how I'm planning/doing this part, I'll be more than willing to make an entry about it.
I do hope that I get this done and working to a reasonable extent fairly soon, as I've been itching to get some other graphics programming done. I haven't been working on other stuff primarily because I have a fairly one-track mind, and I didn't want to get distracted. But once this is done, I really want to do a high end shadow map-based renderer, partially because there's been some work recently that I think is not that great and I want to be able to say "Come on guys, THIS is how shadow maps are done". Of course, idNext is probably going to be shown off at QuakeCon this year, and I certainly hope that that does it instead. I'm getting really tired at game devs, both formal and informal, who seem to so easily tolerate issues like shadow mapping's aliasing, and I hope something is done about it soon.
So, that's basically an update of what I've been doing this last little while. Like I said earlier, I'll probably let my GDNet+ account slide a bit once I get a bit more prolific in my work and have more stuff to show off.
Not much to talk about today. I've got affine texture coordinates working, and I'm definitely sure now that they'll be sufficient for my purposes. Also, I brainstormed a bit on how to manage block requests, and I've got a solution that should work fairly well. Today was a fairly unproductive week though, and it's something that I'll try and fix so that I can have more stuff done by my next weekly journal entry.
Anyways, target goal for this week is to get most if not all of the block request system up. At that point, all that'll be left to do is arrangement of blocks, uploading blocks to the video card, setting up and applying texture matrices, and actually making the hardware draw something.
By the way, regarding block arrangement... I decided that for right now if there is an internal hole in a texture* I'll just leave the space allocated for the texture, but it won't be filled. Previously, I had the idea of doing some really aggressive arrangement of blocks so that that occluded part of the texture would not only not appear in the D3DPOOL_DEFAULT texture, but a smaller texture that could fit in would be moved in. I decided to omit the latter point for two reasons:
1) Massive amount of checks for what is probably not going to be that great of a gain
2) If it's an internal hole, chances are it's an occluder that is not going to be there for an extended period of time, and so the blocks bordering that hole will be changing a lot, meaning that there will be a lot of extra thrashing because a seperate texture stored in the hole will be moving around accordingly, or have to find some other space.
Note that this is different from just occluding sides of a texture, which I will still do, since that will give a lot more savings, be more consistent, and be simpler to make.
Also this will be just for the pre-shader version of the algorithm. I had a small idea that could be useful for when I upgrade from basically DX7 tech to DX9 (which would actually make the system MORE like virtual memory than it is right now) that would basically make the non-filling of internal holes a non-issue.
*Let's say we have a simple scene. The camera is looking at a wall and there's a character in front of the center of the wall, occluding part of the texture.
So, I'm now mostly done my rewritten scanline rasterizer. There isn't much more that I can really optimize that I can see. Right now it's fairly straightforward and simple; I don't do that many tricks to speed it up. Probably the biggest trick is the one-time calculation of edge data (e.g. dx/dy/dz and such) but even that is fairly basic. For the most part, it was just looking directly at the loops and saying "What's calculated again and again and again". Also, it's still for a single-frame scenario so I haven't done anything to avoid massive amount of clearing every frame, which I'm predicting to be a bit expensive. For that I'll probably add an extra bool to my struct that contains data for each pixel that will just say if the pixel was last drawn on an even or odd frame, unless someone knows some low (or high) level trick to set a continuous block of memory to some value really fast. Oh, and I have to do texcoord data, but I'll address that later in this post (just remembered another thing: hierarchical depth buffer and depth culling at the macro-pixel and triangle level are not in yet either).
Anyways, right now I think I'm getting a pretty good speed on it. (note: all times are of release builds) For my test scene I'm getting 4.25ms (235 fps) run time at 640x480, and 0.85ms (1176fps) run time at 320x240. Threre're a few observations I'd like to make about this performance. First, they're on a slow and old CPU (1GHz Athlon which I'm not entirely sure supports SSE) with undoubtedly slow memory, so on a higher end PC (e.g. my one back in Waterloo) these times would be much less. Secondly, the test scene is fairly simple. It's only 160 triangles, and while even I'll admit that this is a low number, it's not as low as I predict I'll need for a real scenario. For one, the lower resolution means that you can use a couple lower levels of geometric details and secondly, the lack of image fidelity required means that, as long as the texture coordinates are properly preserved, you could probably get away with yet another less level of geom detail. So, instead of rendering 10k poly characters for an upclose shot like you would on the GPU, you could more than easily get away with a
As I mentioned earlier, I do not have texture coordinate stuff in yet either. For this I'm still not entirely sure how I want to go about doing this. My big decision right now is what flavour of mapping do I want to try out: affine texture coords, perspective texture coords, or perspective blocks of affine texture coords (see: Quake). Affine texture coords seems like a really good idea right now, in part due to the fact that, again, I don't need a high quality image. Besides, with affine coords, they're still all correct at the vertices, meaning there'd be only a handful of pixels who are a bit off. I'll likely try that out first and when I get the full algo working and I can see what textures/texture detail it's picking up I might decide to reevaluate it, but I'll be surprised if it doesn't produce adequate results.
The other big issue that I'm concerned with is how to mark what texture blocks are needed, that I've got a few ideas for. But first, to fill you in on what this part will need to do. Basically, while I'm drawing the image, I'll need to count up how many pixels are asking for what blocks of textures, and what detail is required at each block. The reason I need to count up the pixels is so that I can increment and decrement the count on the fly. If I just marked every block as I went, then areas of high occlusion will undoubtedly end up requesting a lot of extraneous texture data. For example, if I write a pixel and the pixel was already written to this frame, I'll figure out what block it'll need, and increment the request count of that block. So I've got a few ideas on how to do this:
1) Have a linked list of block requests that I add to and delete from dynamically. In this case, each block request has an identifer for what texture and texture level it's from, as well as the actual request count. Big disadvantage here is that a lot of news and deletes will have to occur which I know by now will kill performance. If I do this way, I may also try a series of linked lists for each texture that has this data so that I can keep requests grouped together properly.
2) One big array that has the maximum number of block requests possible allocated. So, if I have a main, uh, cache texture where the assembly of texture data is stored that is 2048x2048, and split that into 32x32 blocks, then I'll end up with a total of 64*64=4096 tota block requests. Big pro here is that no re/deallocation of memory will occur, however finding the block I need to increment may be difficult to do fast. Right now though, this is my favourite idea and will be the one I do first.
3) An array of block requests for each level of each texture. This method I'm going to avoid quite a bit because I don't think it will scale well to large textures, and having to scan each level of each texture for every request when evaluating what all of the data I'll need is does not sound very fast. I MAY try this if #2 does not work out as well as I would like.
So there's where I'm at right now. By next week I hope to have my software rasterizer all fully done and polished up, and to have either a fairly fleshed out design of idea #2 from above written up or a bit of an early implementation.
Nothing to talk about this week. I'm redoing most of my software renderer right now, including but not limited to, switching from halfspace checking to scanline checking. Next week I'll have that finished I think, so I'll share some performance numbers then. In the meantime, here are some completely gratuitous animations of stuff.
Oh, and having to clip triangles before persp transformation? How much does THAT suck, eh?
Two parts to this: Software rasterizer update, and the rant of procedural textures.
The software rasterizing has gone well. I now have hierarchical block culling in. Basically how it works is that first for a 32x32 block it'll figure out if the block is entirely in the triangle, entirely out, or partially in. If it's entirely in, it'll simply fill the block without any comparisons being done. If it's entirely out, it'll simply skip all drawing for that block. If it's partially in, it moves on to do a check for the 16x16 block. This repeats for the 8x8 block, and 4x4 block, and then after that it just checks every pixel (even if I went down to 2x2, that's four checks for four pixels). Here's a visualization of this in action:
The colour indicates at what stage the pixel was filled in. The darker the colour, the lower the resolution of the block. It goes until near-black, which means that that pixel was scanned individually. For larger triangles it's great, but I'm actually more than a little disappointed when the triangles get a bit finer, as a lot of individual pixels get scanned:
I'm not that surprised though, this is something I anticipated. Depending on the speed of everything after all is said and done, I may decide to look into this a bit more later, after I see some realworld results regarding performance and such...
I've also done some other optimizations on the sides (e.g. for each block, I'd remultiply the X coord by one of the coefficients for the current triangle many times, so instead I just calculate it once). Even without doing some loose bvenchmarks, I actually noticed a significant speedup. Right now when I run the program, with it drawing about 180 triangles to a 640x480 memory image 500 times over, it takes about 6 or 7 seconds to execute. At first I was shocked, but then I turned on optimizations, and it runs in about 2 seconds, and that's on a 1GHz Athlon, and without and SSE/SSE2 instructions being used, or special processor code generation; that is, I have a blended CPU target. It currently only outputs an arbitrary value and nothing practical, but that seems pretty swell to me, especially since I still have to do more than a few microoptimizations in the draw code. Next up on my list, I'm going to throw depth buffer calculations back in, as well as hierarchical z-culling.
Now, procedural textures. I think everyone here has seen Ysaneya's fantastic terrain and planets, which are entirely proecdural. Also, who doesn't love applying Perlin noise to textures for various things like clouds? With my previous rants on memory virtualization, I've mentioned the fact that we almost have enough AGP bandwidth to send down almost a full screen's worth of simple texture data. That's great and all, but sadly we have barely a fraction of that when moving data from the hard drive to the system memory (or, alternatively, DVD/CD drive to system mem). This is a fairly significant issue, as the user might be moving too fast for the system to keep up, resulting in fairly blurry textures. A good solution is to do compression to shrink the data transfer: DXTn, PNG, JPEG (if you're totally ignorant of image quality), and so on. However, even those can be a bit impractical at some times: DXTn and JPEG are a bit lossy, and PNG doesn't allow for random data access. An alternative may be to do procedural textures. Most people are used to procedural textures and data to basically being a set of random values, and even the examples I introduced with revolve around the usage of some form of noise. While they do work pretty well, they're still special case scenarios, and cannot be influenced by artists very easily. Even Ysaneya admits that each planet in his engine may look a bit repetitive, which is easily credited to the fact that an artist can't put a unique touch on each one (although, things like the orbiting space station that he has right now would help). But what if semi procedural textures could be done, generating high frequency data from artist influenced values? For example, some of the hardcore Doom3 modders have been playing with the early-gen MegaTexture stuff for Quake Wars that came out in a recent patch, and I found one tidbit in which an artist could essentially paint roads onto the landscape. With something like vector or curve drawing in this case, a very small sample of data could be used to generate, in the case of MT, extremely high resolution terrain that is still heavily artist influenced (and could still use noise as a little bit of touchup or something).
So what other applications could semi-procedural textures be used in? Well, at QuakeCon 05, John Carmack proposed the idea of basically rendering a 500 page PDF file at 100 dpi, essentially hundreds of megabytes of data, even though such a PDF could be as small as 10's of megabytes in size. With the theoretically lean memory usage and movement between the CPU and GPU, why not just go and make a random-access PDF generator, and page onto the GPU only that data that's needed, when it's needed? Also, how about animation? Take a Flash movie like The Demented Cartoon Movie, which, depending on your point of view, could be treated as half an hour of 640x480 of 30fps lossless animation packed into less than 4MB of data, as opposed to 45GB of raw video. Another example could be, compositing of complex functions and/or art data. For something like a sidewalk texture, the gaps between blocks could be drawn on, with a little bit of noise paired with an if-else ladder for the cement and stones and small rocks randomly mixed in with the cement. One could easily imagine a lot of different scenarios where custom made semi-procedural textures are used for textures in the game world, as a means of providing special case image data compression with excellent compression ratios.
Well, I've been working mainly on the software rasterizer that I'll need for this project, and that's going well so far. It's the first time I've ever done one, and was going in with about zero experience in that area. However, it's going very well right now. jpetrie has been a marvelous help, as he has basically taught me how software rasterizing works, methods of doing it, some optimizations, and so on. I've also been able to apply a surprising amount of hardware 3D experience to this as well, as I'm using things like index buffers along with the vertex buffer. Sadly, I don't know how fast it's running right now because of the way I'm viewing the image (which is to render it, then create a D3D surface with that data, and then save that data to a file using D3DX functions) and I think fairly soon I'll implement a timer or something so that I can measure its speed.
Right now I've got it so that it'll render the depth of arbitrary triangles (i.e. VBs with associated IBs) to the image. It uses the halfspace method of rasterizing, which jpetrie convinced me to use, becuase you can do things like skip entire blocks of an image. For awhile I felt that the scan line method would be better, since you don't have to do any checking: using a simple formula you can say exactly where a triangle will start and end on a scan line. However, I required some more convincing, some of which occured on jpetrie's end, and some on mine. Jpetrie's biggest reason was that things like depth/texcoord derivatives don't have to be calculated as often for a block method. With a scanline renderer, the derivatives have to be recalced on each line, and quite often inside that line, while a halfspace method would let you calculate derivatives less often, on 8x8 sized blocks. The reasons on my end were two-fold. First, with a block method, I could take advantage of early z-cull, similar to what most hardware does nowadays, by storing the greatest depth per 32^2/16^2/8^2 block and culling 1024/256/64 occluded pixels at a time. Secondly, and this is something I realized after finalizing the halfspace vs. scanline decision, I could achieve greater memory coherency with a halfspace system. Instead of storing the array in a scanline fashion (i.e. an array of length 480, each element being an array of length 640) I believe it may be better to store it in a block fashion (i.e. for 640x480, have an array of length 300, each element being an array of length 1024, which internally is treated as a 32x32 block). If I'm working in an 8x8 tile, I'll quite often be going to new scanlines, forcing a lot of data movement to go up and down levels often, possibly causing a significant amount of memory thrashing. Instead, a 32x32 block is moved up in one shot, which can then be used for up to 16 8x8 tiles.
So, my todo list right now is as follows (not necessarily in the order described):
Perblock iteration (I've already got the block storage set up, but I still iterate through pixel by pixel)
Depth checking and early z-cull
Edge caching system (inspired by Quake's softrast, which I glanced over awhile ago. I still have yet to decide how this will be stored though. For those who don't know what I'm referring to, basically it's so that calculations between verts are calculated once)
Timer so that I can figure out how fast/slow my softrast is.
Perspective correct texture coordinates and texcoord derivatives (which will lead into texture page identification)
Oh, and my next entry I think I might do a little rant on procedural textures, and why texture virtualization could benefit enormously from them.
So, I've decided to try out full texture memory virtualization of a scene, as opposed to just the terrain (which was ending up being a lot like clipmaps, really). I figure I'll talk about it a bit, and then discuss how I'm going to go about it. Note that I will not be referring to D3D10 in this discussion, even though I know it has virtual memory. This is from a purely D3D9 perspective. Why bother with older tech, though? Well, D3D9 is going to be around for 4 or 5 years at least, so a lot of research is still really applicable for it, in my opinion. Kind of like how, in some cases, some people are still looking at 1.x shaders, even though DX8 is now more than 5 years old.
First, a look at what we actually use in graphics. As an example, let's have a frame rendered at 1024x768, with mipmapping and aniso filtering on (make it a bad case and say a 2:1 texel to pixel ratio), and 3 textures for each object onscreen (diffuse, normal data, and some specular stuff. Let's say that that data is all 3Bpt). It doesn't matter how hi res any of the source data is. As long as it satisfies those requirements, you only need 13.5MB of data onscreen, and that's a pretty bad case. Oh, and that's without even thinking about data compression.
Let that number sink in a bit:
That's 10.5% of the total texture memory on an average video card with 128MB of memory.
At 60fps, and needing a COMPLETELY new set of data every frame, that's 810MB/s of data transfer, less than 40% of the full AGP8x transfer rate.
And yet, here we are, with cards on the market that have 4 times that data; and PCIe 16x now in existence with 2 times the transfer rate.
Where is this absolutely ridiculous need coming from? Why do we think that regardless of an object's position, orientation, and visibility that we need to send to the video card 1024*1024*4/3*4Bpp = 5.3MB (the 4/3 is due to mipmapping) for a hires texture and just forget about it? It could be 4 pixels wide but we'd still kindly ask API to use the full 5.3MB of data even though you're using less than a kilobyte of it. And what about future games (e.g. UT2007 and other UE3 titles), where we're going even HIGHER than that, with 2048*2048*4/3*4Bpp = 20MB of data for a SINGLE texture, WHEN WE CANNOT EVEN SEE THAT FULL TEXTURE, EVER?!
This is just a huge waste, and the answer to that is memory virtualization. Obviously, we don't have the hardware to do that, so the programmers have to do it manually. Basically this boils down to sending to the video card only the data that we need in a frame, all the way down to 64x64 or 32x32 chunks of a given mipmap.
Which leads into my solution. It's still a high level design, but I see no real problems with this:
Draw the scene using a simple software rasterizer (kind of a slightly more functional version of Yann L's occ query softrast system, I'll explain the motivation later)
Use that data to figure out what tiles of mipmap levels will be needed, and what the arrangement of the primary texture will be (this is probably going to be the hardest part, I'll talk about it later)
Copy the data from system memory to a texture in the SYSTEMMEM pool
Add the necessary dirty rects to the target texture in the DEFAULT pool (which will be 2048x2048 for lower resolutions. One dimension might have to be increased to 4096 for higher res's though)
Call UpdateTexture so that the data is passed to the GPU, and ask the API to generate a few mipmaps from that data (OR, call UpdateSurf to fill in the mipmaps as needed)
Setup a stream of texcoords for the objects so that the texture coordinates go at the right spot on the main texture
Wait for a tumbleweed to pass by, reach for my revolver and call out "DRAW!"
So, motivation for a software rasterizer: Well, like how Yann used one to determine occlusion of whole characters, a softrast could be used to determine occlusion of sub-triangle data. For example, let's take a look at this picture I found on a GIS:
(click for bigger image)
With a softrast, you can say that parts of the books behind the kids head are not visible, and that the hidden texture data is not needed. Also, consider the stacks of floppies and CDs to his right. Only a sliver of the data on some of those disks or cases is visible, so a softrast could easily tell you that exactly what parts are not needed. Even without those considerations though, it can help out in determining what mipmaps of textures are needed based on aniso filtering. Heck, an image like that is almost the poster child (no pun intended) for why you'd want a softrast to help with the paging of only necessary data.
The arrangement of the tiles is going to be a bit hairier though, and as I said will be the hardest part of this. The reasons are two fold:
Continuity will be required between tiles. I can't just slap the tiles around randomly.
Retain data in the same location when possible. Since the continuity might require a texture to end up going into an area that some other texture already occupies, the second or first texture might have to be moved. There are not any practical ways of moving this data, so the texture being moved would have to be totally reloaded. I know I said before that AGP transfer speeds are insane, but that doesn't mean I can just keep wailing on it like crazy.
Lastly, what're the benefits that all of this will bring?
Ability to have hi-res textures everywhere (including ones that may be larger than the hardware could handle, e.g. for terrain)
There's only one texture bound for all of the drawing (in most cases; it could result in two for cases of high resolution plus lack of 4kx4k texture support) resulting in some nice batching. This means that 100% unique textures can be used practically.
A LOT of textures can be used without the video card keeling over in pain, e.g. with the stacks of discs in the above photo.
Relatively light memory usage, so even non hardware enthusiasts with GeForce 6200s or X300s can still see some kickass texturing.
So, that's what I'm working on right now, and I really hope that it turns out nicely.
You can't believe everything that you read, but the right information is just a page away.
(I've got a loose sketch in my head how I'm going to go about my little proof-of-concept demo. This weekend, if I can get a lot of work done on it, I'll unveil what I'm working on [wink])
Well, first off, the introduction to aniso filtering that I promised Yemen. In the past graphics only had textures that were filtered with just nearest point magnification (many pixels for one texel) and minification (many texels for one pixel). For magnification, we can now simply do linear filtering, which smooths out the textures, and makes it look less chunky. For minification we need to use mipmaps (smaller versions of the base texture) to choose when we're going to be using a specific texture. The idea behind mipmaps is that since we could have, say, a 256x256 texture dumped on a pixel-sized triangle, and the average of all of those texels would have to be found. Since that's beyond impractical for realtime use, a mipmap chain is created, full of smaller and smaller versions of that texture, all the way down to a 1x1 texture (which, in the case suggested above, is the one that is sampeld). By linearly filtering across the mipmap chain, transitions between mipmaps can be quite smooth, and greatly helps reduce shimmering artifacts due to bad minification sampling. Aniso filtering I don't know the exact details, but the end result, bascally, is that the highest possible mipmap is chosen for any pixel. This is determined (again, basically) by the angle of the triangle in relaiton to the camera and the distance of the triangle from the camera, whereas simply doing a linear filter between mipmaps seems to only use distance from the camera to determine what mipmap to use. The result of aniso filtering is that textures end up looking very crisp and aliasing free when minified, which is a far cry compared to point filtering:
(point filtering is on the left, aniso is on the right. This was a 2048x2048 texture rendered to a 300x300 display)
Anyways, now that that's out of the way, I've decided to abandon RFBTexture (in its current form) for awhile, as something far more interesting has caught my attention. I'm not going to talk about now or for awhile, as I would like to surprise my peers [smile].
I've been doing more work/experiments on my RFBTexture experiment recently, although that's hard to do when you practically get about 2-3 hours a day to work on it. Anyways, I've mainly been spending the last week figuring out anisotropic filtering (as well as practicing the pronounciation: an-eye-so-trah-pic), in terms of what results the hardware gives. Sadly, the only practical information I could find on aniso filtering was either from a hardware reviewer saying "it does fun stuff with textures and it's really cool 'cause it's another number you can jack up!" or mostly impractical 6 year old SIGGRAPH classes in OpenGL. So, I dwelled on results for awhile, pondering what it does, what factors affect it and so on, and I think I've got a pretty good handle on it. At least, as good of a handle as I'll need for this experiment.
Basically, from what I can tell, the possible mipmaps required for a given pixel (ignoring rotation of a triangle) is, in the best case, the same as trilinear filtering. That is, the highest res mipmap required is that which is required if the triangle was facing the viewer, and the lowest res would be if the triangle was edge on to the viewer-regardless of distance from the camera. So, this is a problem that I've got two possible solutions for, which I've described in the paragraph after next. Before I go in, I want to introduce some terminology that I'll be using.
The data that I want to store is essentially 4D; that is, a 2D layout of 2D images. The layout's axes I describe as primaries and secondaries. Essentially, a primary refers to data from the main, uncropped texture, and the nth primiary refers to data from the nth level (0-based) of the main texture. A secondary refers to a cropped chunk of the primary, and the nth secondary refers to a portion of data from the nth level (0-based) of the nth primary. So, the 0th secondary uses data directly from the 0th primary, and has the greatest detail of all of the secondaries. Lastly, level (again, 0-based) simply refers to the mipmap level of a secondary. So, the 0th level of the 0th secondary uses data from the 0th primary. Another example is that the 1st level of the 0th secondary uses data from the 1st primary. However, the 0th level of the 1st secondary uses data from the 1st primary as well. The difference is that the 1st level of the 0th secondary is 256x256 whereas the 0th level of the 1st secondary is 512x512, and hence covers a larger area than the 1st leve of the 0th secondary.
Now that that's out of the way, here's my problem:
How to take full advantage of anisotropic filtering while minimizing the amount of data is stored on the GPU and/or passed over the AGP bus.
Solution 1: Alpha testing, overdraw, and way too much of both.
This method requires only 4 levels on each secondary, and only the top 3 need updates. The four levels consist of the following data (only talking about the 0th secondary in this example, which actually has only 3 levels in total; I'll get to that later):
-0th level has data from the 0th primary, with an alpha value of 1.
-1st level has data from the 1th primary, with an alpha value of 1.
-2nd level is filled with unseen data, with an alpha value of 0.
What happens in this case, is that data from the 0th and 1st levels get linearly filtered between each other properly (due to the fact that each aniso tap is essentially a trilinear lookup). However, when data from the 1st and 2nd levels start blending, the 0th level has no more significance on the pixel. As a result, we want to stop drawing on that pixel. However, since the alpha value becomes non-one when the 2nd level starts influencing the image, we can simply cull the pixel by doing an alpha test, where pixels with non-one alpha values are dumped. When we render the next secondary, we only want data where the 1st level (same data as the 2nd level of the 0th secondary, remember) affects the image. So, we want to cull pixels where the 1st level has no effect: this is where only the -1th level and the 3rd level influence the image, so we repeat the same process as the first secondary with those pixels. Obviously though, we don't have a -1th level, so we have the drawing occur in reverse: the geometry is rendered with the last secondary drawn first, so that it replaces value that we don't want, and culls values we don't want affecting the rest of the image. IF you need a visual idea of this, here we go:
(image from Toms Hardware, as a part of their review of the X800 hardware)
Essentially, the 1st secondary covers data where there's any green close to the camera. The 0th secondary covers the first bit of non-coloured texture, and stops when a little bit of green exists.
Don't worry if you got lost along the way there, I probably won't be using that method, instead going with:
Solution 2) Just full up each secondaries mipmap chain, dammit!
Instead of just pussying out with only three relevant levels to each secondary (or two for the 0th secondary) and alpha culling the parts that would dare need other mipmap levels, the full mipmap chain for a secondary is stored and loaded, so that the geometry is drawn exactly as submitted, instead of some pixels being clipped away in the process, and rerendering those areas. Basically, a chunk of geometry is drawn, and the secondary where the 0th level mipmap would've been the primary's visible mipmap in that location in the best case is bound as the active texture. I was reluctant to approach this solution first since I was concerned about the extra updating required for each level of each secondary. However, as I reflected on it more, I can likely guess what mipmaps a piece of geometry will be using and only upload those part to the GPU by using data such as the angle between the triangle normal and the viewdirection and the distance together (wait, isn't this getting kind of close to memory virtualization?) instead of just updating all 10 levels of each secondary. That way, only some texture memory is wasted, instead of oodles of fill rate being lost because early z was turned off because alpha testing was turned on.
By the way, one thing that I may want to try out later is permanent decals. If the entire landscape can be theoretically unique the whole way, I wonder how practical it would be to update, say, the data of the primary level when an explosion or something occurs so that the terrain is permanently scarred. In theory, this could be used so that a person could, like, shoot a rocket from the ground on the other end of the map at a mountain far in the distance, then walk all of the way there blasting a ton of other crap in the process, and when he arrived, he'd see the damage he caused. I don't know about you guys, but I think the idea is cool enough to warrant a bit of research once I have RFBTextures fully working.
So, really (freak)in' big textures.
Here's my idea for how to handle RFB textures:
Let's say we have a texture for terrain that is 8kx8k, split into 512x512 textures with mipmaps, plus a 16x16 mipmapped texture for the mips that the 512 ones don't get (reasoning behind this is that the 512 mips will go 512-256-128-64-32-16-8-4-2-1, and at the level where it's at 1, you end up with 256 1x1 textures that still need a couple more mipmaps. The mipmapped 16x16 tex takes care of that). This is all stored in the system pool. Then, in the default pool, we have, say, 24 mipmapped 512x512 textures (i.e. 32MB in size), reserved for general terrain use.
First, the blocks of terrain nearest the camera are rendered first, using the requisite full-res 512x512 blocks that are stored in system mem. How they are used is a series of UpdateSurfaces are called, the source being the system mem textures, the destination being the first 4 default pool textures (I'll refer to the default pool textures as A, B, C, D, etc.). Basically, every mipmap of A-D are filled using the system mem blocks. Then, four larger chunks (x2) of terrain that are further out are rendered as well. They will use textures E-H. This time, the UpdateSurfaces that E-H use are not the highest level mipmaps of the areas they're over. I'm having trouble writing up how this part works, so I'll make a little ASCII diagram:
Here's part of the terrain laid out:
1 2 3 4
A * * * *
B * * * *
C * * * *
D * * * *
The camera is located between the cells (B|C)(2|3), which is also where the first 4 512x512 blocks were used (i.e. every cell covers 512x512 of the main texture). Anyways, for the next chunks that each cover a 1kx1k area, their bounds are A1 to B2, A3 to B4, C1 to D2 and C3 to D4. Since each chunk can only have 512x512 of the highest level however, the mipmap relationships between the source chunk in system mem and the destination chunk, E (or F or..) can't be 1 to 1. Instead, the 2nd level of a source chunk (256x256) is used to fill up 1/4 of the 1st level of the destination chunk, and when the bottom level of the destination chunk needs to be written to, part of the sub-mipmap texture (the 16x16 mentioned way back) is copied instead. This is repeated until all 24 destination chunks are full. Then, an algorithm that JC describes here is used, to prevent excessive texture thrashing in successive frames:
Basically, when chunks S-X are used, instead of cycling back and rewriting A-D again, S-X get rewritten to again and again until the terrain is done rendering. Then, in the next frame (assuming the camera hasn't moved) A-R don't have to be written to again. If the camera does move though, chunks will obviously have to replaced, bit by bit. It's possible that either a scrolling system not unlike Dangerous Dan/Commander Keen's could be used (if such a function exists in D3D9..), or in the worse case, chunks get rewritten in a slightly staggered fashion.
Obviously, this doesn't take into account frustum culling, which would complicate the management, but also let it be more aggressive. Plus, one nice thing is that streaming in the RFB texture from the HD could be done relatively easily as the camera moves around I imagine.
Anyways, this is something I shall want to experiment with in the near future. Any thoughts?
So, it turns out I basically won't be able to get my hands on Vista as a D3D10 dev platform, so I'm probably going to just take a break from that stuff.
There are two graphics things on my mind though that I've been thinking of and trying to come up with solutions for: area lighting and memory virtualization (emulation).
For a linear area light, I've got the basic idea down, but in my implementation I've been having problems with the diffuse light integration. Basically, the idea is that since the N dot L term should be normalized, it turns into just the cosine value, so then the formula integrates between two cosine values (i.e. integrate the light from L1 to L2). That is simply a conversion to sin vals, and then subtracting the difference, which is, from a math pov, fairly trivial (from a shader pov? Nooot as much). The identity cos^2+sin^2=1 is just rearranged to sin=sqrt(1-cos^2). Anyways, I've got that working, but the problem is that just using that, the values have to be added OR subtracted, since the sin value will always be positive (a side effect of the sqrt function). Now, the integration works fine, I've punched in two formulae on my ti83 that shows what the final result of the lighting is, and assuming the add/sub is chosen correctly, it works fantastically. The problem I've been having is that the add/sub ISN'T chosen correctly right now, so that's something I'm looking into. (For those curious, the method I'm using right now is to calculate the reflection of one of the light vectors about the normal, and comparing the results of L1 dot L2 and R1 dot L2)
Memory virtualization is the other thing I want to talk about, primarily with respect to John Carmack's memory virtualization antics. I want to finalize my thoughts on this, so I'll probably make an edit later (a few hours from now, during my lunch break). Basically, the idea that I want to experiment with is to, say, for a terrain system, figure out what parts of a really really big texture (i.e. >4kx4k) are needed when drawing a frame by using something like a memory management algo JC proposed several years ago in a .plan update, combined with some other memory address manipulation stuff.
I've decided to pause my game for awhile (which is about the same as putting the brakes on a car moving at 1km/h...) due to finals and something better than finding the Holy Grail inside the Arc of the Covenant: Direct3D 10.
Since the docs were released tuesday, I've been looking them up and down, asking around for clarification on certain things, and so on. Plain and simple, I think D3D10 is absolutely fantastic, and I ask for only one more bit of functionality that I've mentioned a bit: source pixel access in the pixel shader so that we can basically do custom blending, as opposed to using the still-fixed-function OMSetBlendState. Aside from that, I think it's great. It's extremely flexible, slim and powerful. I know that MS will be releasing subversions every now and then after the final release, but I think that even without those, devs would be finding new things to do with D3D10 over 5 years from now.
Anyways, after studying the docs and samples for the last couple days I've got a pretty good handle on D3D10, but there are a couple unadvertised features that really surprised me. Namely, multiple viewports and scissors. Right now I'm getting mixed readings on those, as Redbeard (who is a tester for the Direct3D 10 team) says that only one viewport or scissor can be bound to a single render target, but nothing in the docs suggest otherwise. That is something I'll want to work with to try and figure out what's-what.
Also, one other thing I'd like to have expanded on in the docs (Even though a good chunk of it is used in the shaders) is the SetRasterizerState/SetBlendState, etc. properties in FX/HLSL. I imagine that one could guess what each possible state for FX properties is though, but it'd still be nice to have that in the docs.
Since I'm going to be working on a not-as-high-end-as-my-current-computer laptop for the next 4 months while on my co-op work term, one thing I want to do is do a lot of D3D10 work using the REF rasterizer in my spare time. I've got a couple tech demos that I want to try out, and was wondering if anyone has any input for them (suggestions, changes, stuff like that).
-A demo doing really souped up shadow mapping using D3D10 features such as a single pass cubemap and depth buffer lookup in a pixel shader. Also, if I have time, work with some dynamic scaling of the shadow map.
-A demo showcasing Bezier surfaces, hopefully with, when needed, a virtually infinite level of detail. I know some of you will mention that hardware vendors only want us spitting out a max of 20 tri-er..primitives in the GS, but that'll be part of the challenge behind the demo.
-A demo showing a game where all of the logic is calculated in the shaders (i.e. only inputs and time increments are sent into the tech logic), possibly like the Geometry Wars clone I wanted to do. The guys on #graphicsdev kind of poo-poo'd the initial idea (I just said "a game" and didn't really specify much about it) but I think I might still give it a try, since I think it'll be a fun experiment. Plus, it could be a good demo showing off the variety of buffer accesses and unlimited shader length stuff in D3D10. If not that, maybe I'll give GPU physics a whirl. At the very least, something that has typically been reserved for CPU only.
In between studying for finals I'm still working on the particle system (and now the game). The particle system doesn't handle multiple textures yet, but got a nice speedup thanks to a suggestion by Reltham to split the primitives into batches of ten thousand or something, instead of spitting out all 50+k in one shot. While making that optimization, I ended up learning quite a bit more on how vertex steams work. Since then I played around with stretching the particles based on velocity a bit, which turned out really nicely (click for executable):
Right now I'm also working on the game logic/sprite system. In the past, I always kept graphics and game logic in the same class, but one thing I want to do for this game is to seperate the two as much as possible. So, right now I've got the old particle list, which stores an array of sParticle structs (Position, Velocity, Colour, Type, and a couple other things I'm forgetting) and takes care of all of the drawing based on that. The sprite list stores an array of sprites (note: I may or may not use an array in the final version, I'll get to that later) and performs operations on those. The sprites aren't too complex, with cSprite being a base class for all sprites, and then various types of sprites derived from that class. Anyways, the way I have my sprites talking to their respective particles right now is just storing an address to the position and velocity members of the associated sParticle in the particle list which is fairly convenient.
Anyways, sprite storage. Right now I'm using an array (newb doing a bit of pre-micro-optimization over here), but I also haven't implemented removal of entities from the array yet so right now I can get away with it reasonably safely. I'll likely move over to a linked list system later because an immense amount of sprite removal will be occuring which linked lists do fairly well.
Going to real life matters, I recently saw Serenity and holy crap is it badass. Really felt that a large amount of it was good. Also, does anyone else think that Caylee looks a LOT older in the film than in the show? I know it's been a nice three year gap between production of the show and the movie, but she looks like she aged 10 years in that time.
Final exams have also started for me. I've got my first tomorrow (I'm writing this over a study break) even though the exam season started last monday. "Thankfully", after tomorrow I'll have another 7 day break. Then I get to have 4 exams in 3 days. Whoopee. As a result, I likely won't do much work on my game.
Since I've started doing a bit more relevant game programming, I might as well give a quick update on this stuff.
For those who don't go to #graphicsdev, I guess I might as well fill you in on what I've been doing since my last journal entry. The majority of the time was spent making a new algorithm that I called "Deferred Antialiasing", which is intended to basically allow for antialiasing in a couple situations where AA was virtually impossible to do, i.e. HDR rendering and/or deferred shading. It works pretty well, but it's also a bit intensive on memory and performace. I also haven't tested it out on a practical modern game application (normal mapping, high frequency detail and so on) but it still works pretty well I think. If you want some details on it I can provide some more.
Lately though I've been working on university stuff, and kicking myself in the face for just dicking around sometimes when I could be doing some game or graphics programming. Personally, this is primarily because the project that I've got my sights aimed for is so enormous that I quickly get scared off.
Recently though, I've managed to be brave and start working on a couple components that I might need, such as a texture manager (well, more like material sorter). Around the same time, one of my friends over at the ICB2 mentioned this game called Geometry Wars that he just downloaded onto his X360. I checked out a couple videos over at IGN, and thought the game looked really cool, fun, and wasn't overly daunting. Here's a screenshot of it in action:
So, right now I'm just making a clone of that to develop the texture manager I'll need for the greater project. As one could immediately guess from the video though, particles is something that I need to do as well. Lots of them.
In the past couple days I've been working on particles, and also learning some low-end D3D stuff like vertex declarations. Right now I've got particles running well. I can do game updates on them, render them properly, stuff like that. However, I really want to match the chaos and particle count of the X360 version of Geometry Wars, so optimization will be a really big issue for this (Reltham has been a huge help in that regard so far). As it is, I can safely do fifty thousand particles on my X800Pro and P4 3.0GHz and get 50fps, but I want to really increase that number as high as I can.
Here's the current setup with the particles running (I dunno how many are in the frame in this example). Basically it renders the particles with a monochrome texture, which is multiplied by some colour value associated with the particle.
My next entry (monday or tuesday) I'll probably have the particle system done, working, and hopefully running faster, or at the very least be able to handle a lot more particles.
Just a quickie thought: Would it be feasible to do raytracing in the pixel shader in SM3.0? Like, render a fullscreen quad and basically run a raytracing program on each pixel like one would do on the CPU?
I guess it's been awhile since I've updated this, huh...
Okay, well, life update: I've moved into an oncampus university residence for two weeks (I'm in Edmonton, on a co-op work term. Hometown is Calgary, ~300km away) to top off my term, and then it's back to Calgary. I'll finally get around to finishing my paper, no doubt of that. Why you may ask?
THERE'S NO FREAKING INTERNET CONNECTION IN THE DORM!!!
So, I'm currently posting this at work over lunch.
hum, what else...oh, how about some quickie thoughts on Carmack's keynote?
B-O-O-O-R-I-N-G. It was basically an hour long rant on console development, and he didn't even show any shots of iDNext! Come ON! One thing's for sure: I won't be dissecting it ten times over like I did with the QC2004 keynote. That was actually interesting, especially since I absolutely CANNOT imagine him choosing to do 2048x2048 shadow maps. Even he admits that they look OKAY, (with hardware linear filtering...) but the performance at that res is TERRIBLE! Slightly crazy if you ask me...
Also, helped raise a small shitstorm over at Raven when I told Sages, who told the project leader, news of a supposed Quake 4 leak. Thankfully it was fake, although it would've been nice to try it out before release (*wink wink to Sages, Nitro, and rhummer...* :-P ).
And to close, how about some graphic's thoughts. I've decided to reevaluate how I'm going to do environment mapping. I'm STILL not sure how exactly how I'm going to do it, which isn't helped by the fact that I want some pretty shit-hot results, i.e. almost perfect reflection, so that objects touching the envmapped object have the reflection in the right place. I talked with Drillian a few days ago about this on IRC, specifically thinking about how to get 360d envmapping on an object that has a predominant planar side (for example, one of those huge board room tables with the flat top but round edges). I'm on the edge of giving up, but before doing that I want to see exactly what the results would be on doing simple per-object skewbe maps, and determine if it's satisfactory or not. Right now, I'll probably have some system where I literally combine classic planar reflections (with a bit of warping due to the surface normal) and skewbe mapping, by identifying before rendering a frame what the dominant reflection plane is (either predefined or generated on the fly). A table would likely be predefined, but if you have something like a car, a glancing plane on the car's BB would be used instead, because that'd be hardest for a skewbemap to simulate, plus the Fresnel-modified reflectance would make that plane more obvious, necessitating increased clarity. The only issue I foresee with this is somehow getting skewbemap and planar reflection continuity, so that there isn't some line where the environment lookup suddenly changes. And I do realize that that is far easier said than done :-)
So, there you go. Next time I post I'll probably have the next iteration of my skewbe paper up, and will have cleaned up the source code of my program for the publication. It's not like I'll have anything else to do at home...
(note: I have not researched at all on BSSRDFs, so pardon me if I'm reinventing the wheel to a ridiculous extent)
So, I decided to contemplate some more on translucent surfaces. One thing I've noticed early on in my education on light was how some translucent things can scatter light. For example, let's say you're sitting in a bus (like where I am while writing this entry) and the windows have some spots of dirt, due to a piss-poor cleaning job. One thing that the dirt does is scatter light, in almost a bell curve fashion, as shown in the following image:
(as I said earlier, I'm writing this on a bus, meaning I don't have a mouse. As a result, I'm using one of those old laptop nubs, so pardon the crappier-than-normal quality of the diagram)
Okay, that turned out pretty badly, btu I think the idea is there. Basically, it's supposed to be a cosine curve raised to some power, with some constant added to it. Pretty much like your standard diffuse and specular lighting. The "diffuse" I'm actually not 100% about, what I obesrved might have been ambient light. Anyhoo, that adds a couple pieces of data to any texture that might be translucent, a straight transmission coefficient (i.e. alpha), a scatter transmission coefficient, and a scatter transmission power. Also, I might want to throw a refraction offset in there, a 2D screen offset. So, what does that bring the theoretical total of texture channels to (in the..."worst" case):
Albedo - red, green, blue
Normal - X, Y, Z
Specular - coefficient, power (floating point), fresnel coefficient
Environment mapping - Coefficent, fresnel coefficient
Refraction - X and Y screen offset.
Transmission - Straight coefficient, scatter coefficient, scatter power (floating point)...and scatter diffuse would bring it to 17.
So, any artists and memory whores in the crowd wanna fillet me? A 512x512 17channel texture with 2 FP channels: 4.25MB. Thankfully, there won't be a lot of translucent objects, and that IS a stupidly complex material.
But christ, even I don't wanna see the shader that'll use all that data, considering it's also going to be doing shadowing as well, AND lumping together multiple lights.
It'd be a fun looking material though, doncha think?
for the non-graphics guys: I'm currently watching Cool Runnings on the bus. How the hell did the Jamaican fad never catch on?
So right now in my demo I've got some fairly nice perpixel Blinn lighting going on, and some additional effects like Fresnel to modify the specular coefficient. However, as I observe the real world more and more, I've noticed the huge significance environment mapping can have. I recently decided to dynamic env maps to my todo list, and they were absent from it since the very beginning.
Anyways, here're some thoughts I have on the possible implementations I'm considering:
1) Have per-object env mapping. Obviously, this will produce the best results, but at a fairly high cost. Also, the easiest to do.
2) Have an env map made at the camera's location. A nice approximation, but I think I'll have flags for Renderable objects that can choose between this option or #2.
3) Have an env map solution similar to Half Life 2's, where areas in the environment already have them, and objects inside a certain environment map's bounding box will just use that. Despite the fact that I'm going for fairly high quality reflections, I think HL2's implementation could be fairly good...but not in its current state.
The big flaw behind HL2's system is shown below:
The black square is the env map bounding box, the red circle being some reflective object, the green thing being the camera, the green line being a camera->position vector, the blue arrow being the reflection vector (aside: Just so I can keep reflective consistency, I think I'll use the halfway vector as the reflection vector). What I show in this image is that the right reflection vector is generated, and is then basically placed onto the center of the env map BB (that pink dot), at which point the cubemap is just normally looked up. Obviously, this creates a large inconsistency with the object's position and the environment, and is quite evident in some parts of Half Life 2 (good example: Start up the beginning of the game, and play through until right before you enter Barney's interrogation room. In the hallway immediately prior, watch the reflection on the ground. It's supposed to be primarily the overhanging light, but as you get near the end of the hall, the reflection should instead be the backwall but it's STILL reflecting the light!). When I get around to implementing this, one thing I want to experiment with is to find the proper vector required for that area. That is, take into account the position of the pixel/object inside the env map BB, and find what the correct reflection vector should be.
So, those are three options that I'm considering. One that seems fairly appealing is HL2's method with the changes I suggested above. I would also make the env map fully dynamic, because I want to take advantage of skewbe mapping (it's not just for shadows, it's an extension of EVERY kind of cube map!) and also see how dynamic lighting (possibly dynamic shadows too? Oooh man that'd look cool, even if it would be a slideshow) affects the map as well.
One thing to note about the env map lighting/shadowing: Because I'd use skewbe mapping, just like with omni shadows I could take advantage of a fairly low resolution environment map. Like, omni shadows I feel require a 512x512 bare minimum, but an env map could very likely dip down into the 256x256 range and still look very good, before applying scaling due to reduction of the envmap BB in screenspace.
Greetings. I've just finished a ~85% complete verison of my skewbe mapping paper, which includes most of the content, but the writing is still fairly poor. I thought that before making a big release I'd post it here first to get thoughts from you guys on some things in it (aside from improving the language, I'm going to work on that already). Anyways, check it out: http://www.eng.uwaterloo.ca/~dcrooks/SkewbePaperBeta.pdf
I didn't put figures 2 and 3 (referred to in the results section, which is also fairly incomplete) in there since I'm still not entirely sure how to include them or what text to have for them, so here's what I plan on putting in:
(figure 2, compilation of the following images. 2a shows the 'optimal' FOV/focuspoint algo, 2b shows the mediocre one, and 2c is default. )
(figure 3, another compilation, same setup as above)
So, any thoughts?
Hey, I figured I'd throw in an update regarding my skewbe map paper.
Doles helped me get the various demo images I needed for quality comparisons, so I'm grateful for that, and that's done. I've also done the menial tasks regarding naming and such. I plan on having all of the content done tomorrow, so a person can read the paper, hopefully understand most or all of it, and implement it in their own program without my assistance. Improving the wording and grammar will come later, and in the meantime I'll post the content-complete version of the paper here. Once it is fully done and the writing is good, I'll be slapping it onto the more prominent GP&T forum. For the full publish, I think I'll also slap in some cleaned up source code if I have the time.
Also, uavfun helped convince me to move the paper from LaTeX to Word (I'm going with Publisher because image layout is about a bajillion times easier though), although that quickly became a necessity because, as far as I could tell, LaTeX didn't support embedding of images. Anyways, that'll be a pain in the ass to do due to all of the subscripts and superscripts.
Sooo, long story short? All of you guys with shader-based omni shadow maps will be able to soup them up about 36 hours from the time that this was written.
Just got GDNet+, so here's my first journal entry. I'll mainly be posting in here various graphics-related thoughts. I've done very little research on them, mainly just buzzwords, so if I reinvent the wheel a couple times, just let me know. Anyways, right now I'm mainly researching/developing realistic lighting and shadows, but somethings my mind will wander and I'll dabble in some space partitioning, or some sound design, etc.
Right now I'm working on my skewbe mapping paper, so I'll copy my most recent (read: only) LiveJournal entry about it. This will pretty much only be relevant to #graphicsdev'ers, who know a thing or two about skewbe mapping, and for the rest it'll be a teaser of things to come.
Right now I guess I'll talk about my paper's status. My to do list is copied below:
show image demonstrating necessity of offcenter perspective
properly label matrices (e.g. +X, -X...)
get the two formulas used in the lighting demo
show image demonstrating where an artist could do some face culling
have set of images demonstrating results under different scenarios
For the first image, I still need to plan exactly what the image will be, and the associated text. Right now, it'll probably be like this image:
...that I showed #graphcisdev early on in its development to show how the cubemap will be generated, except with the off-center projections corrections that moopy and redbeard suggested to change it back to a skewbe map (in case you don't remember, it was basically a frustum map for a couple days).
For the second, I'm not sure what labels I should give the matrices. Maybe, say, X_+ or X^+?
The two formulas are going to be a bit bit tricky, because I think my computer at home got shut off, meaning I can't connect to it, meaning I can't look up the code from here in Edmonton. Although, Pfhor has the first formula I need, and I think I remember the second pretty well...
The second image I've got in my head, I just need to author it and make it look all professional-like.
The set of images I can't do until I get my high-end comp going again.
Conclusion I haven't even thought of.
Acknowledgements I haven't thought of either, but it shouldn't be too hard, right?