Jump to content
  • Advertisement

cr88192

Member
  • Content Count

    564
  • Joined

  • Last visited

Community Reputation

1570 Excellent

About cr88192

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. cr88192

    Software OpenGL Rasterizer + Quake 3 Arena

      granted, albeit, it is looking like this is near the limit of what may be achieved within a reasonable level of time/effort.   as-is, it was about 3 weeks effort to get this far, and as-is is around 26 kLOC, so an apparent average of about 1.24 kLOC/day, though parts of the code were reused from my 3D engine, so the actual amount of newely-written code may be lower.   I may consider going and making the source available.   ADD: http://cr88192.dyndns.org:8080/2014-04-27_bgbrasw_dump0.zip
  2. second test with Quake3 Arena, this time in 1440x900... differences from before: * More performance tweaks / micro-optimization; * Now the rasterizer supports multiple threads (2 threads were used in this test, with the screen divided in half); * Inline ASM is used in a few places; * ... it is still sort of laggy, but I am not really sure how far software rasterization can be pushed on a generic desktop PC. CPU: Phenom II X4 3.4 GHz; RAM: 4x4GB PC3-1060 note: it is a fair bit faster at 1024x768 or lower...
  3. cr88192

    Software OpenGL Rasterizer + Quake 2

    meanwhile, using it for Quake 3 Arena: https://www.youtube.com/watch?v=uZcnL3jwi2c
  4. cr88192

    Software OpenGL Rasterizer + Quake 2

    yeah. there were a few bugs visible in the video that I have since fixed, as well as fiddling with it trying to get things to be a little faster.   unlike the Quake2 software renderer, this rasterizer currently works in 32-bit colors. I had considered 8, 12, or 16-bit colors, but had decided against this at the time (color fidelity would be lower, but they could give more opportunity for use of lookup tables).   unlike with hardware accelerated OpenGL, it basically does pretty much everything with nearest filtering. the software rasterizer here can support bilinear filtering, but it is somewhat more expensive.   as-is, it doesn't use any SSE or ASM, though these could potentially be helpful.
  5. cr88192

    Software OpenGL Rasterizer + Quake 2

        ok, tried to add a bit giving a little more information...
  6. basically, lacking much better to do recently, I wrote a basic software rasterizer and put an OpenGL front-end on it, then proceeded to make Quake 2 work on it: yes, kind of sucks, but there are some limits as to what is likely doable on the CPU. also, it is plain C and single threaded scalar code (no SSE). a rasterizer using multiple threads and/or SSE could potentially do a little more, but I don't know, and don't expect there to really be much practical use for something like this, so alas. writing something like this though does make one a little more aware what things are involved trying to get from geometry to the final output. otherwise, may need to try to find something more relevant to do... ADD: as-is, it basically mimics an "OpenGL Miniport" DLL, which means it exports the usual GL 1.1 calls, along with some WGL calls, and some wrapped GDI calls. this is loaded up by Quake2, which then goes and uses "GetProcAddress" a bunch of times to fetch the various function pointers. it has to export pretty much all of the 1.1 calls, though a lot of them are basically no-op stubs (would normally set an error status, currently rigged up to intentionally crash so the debugger can catch it...). as for calls implemented: simple answer: about 1/4 to 1/2 of them. as for functionality implemented by the rasterizer: * most stuff related to glBegin/glEnd; * things like glTexImage2D, glTexParameter, ... * various misc things, like glClear, glClearColor, glDepthRange, ... * matrix operations (glPushMatrix, glPopMatrix, ...) * ... as for functionality not implemented: * texture-coordinate generation stuff; * display lists, selection buffers, accumulation buffer, ... * pretty much everything else where I was like "what is this and what would it be used for?" * currently doesn't do DrawArrays or DrawElements, but this may change. ** would basically be needed for Quake3 to work IIRC. ** partial provisions have been made, but logic isn't written yet. internally, it implements the actual drawing to the screen via CreateDIBSection and BitBlt and similar. then it has a few buffers, for example, a color-buffer, implemented as an array of 32-bit pixel colors (in 0xAARRGGBB order, AKA, BGRA), as well as a Depth+Stencil buffer in Depth24_Stencil8 format (I was originally going to use Depth16 and no stencil, but then I realized that space for a stencil buffer could be provided "almost for free"). at its core, its main operation is "drawing spans", which look something like:void BGBRASW_DrawSpanFlatBasic( bgbrasw_pixel *span, int npix, bgbrasw_pixel clr){ bgbrasw_pixel *ct, *cte; ct=span; cte=span+npix; while((ct+16)>8)&0x00FF00FF); crag+=cragv; crrb+=crrbv; *ct++=clr; clr=(crag&0xFF00FF00)|((crrb>>8)&0x00FF00FF); crag+=cragv; crrb+=crrbv; *ct++=clr; } while(ct>8)&0x00FF00FF); crag+=cragv; crrb+=crrbv; *ct++=clr; }}...void BGBRASW_DrawSpanTextureInterpTestBlend( BGBRASW_TestBlendData *testData, bgbrasw_pixel *span, bgbrasw_zbuf *spanz, int npix, bgbrasw_pixel *tex, int txs, int tys, int st0s, int st0t, int st1s, int st1t, bgbrasw_pixel clr0, bgbrasw_pixel clr1, bgbrasw_zbuf z0, bgbrasw_zbuf z1){ BGBRASW_TestBlendFunc_ft testBlend; bgbrasw_pixel *ct, *cte; bgbrasw_zbuf *ctz, *ctze; u32 tz, tzv, tz2; int ts, tt, tsv, ttv, tx, ty; int cr, crv, crt; int cg, cgv, cgt; int cb, cbv, cbt; int ca, cav, cat; int clr, clrt; if(npix
  7. I had recently fiddled some with BTIC1C real-time recording, and have got some interesting results: main desktop PC, mostly holds a solid 29/30 fps for recording. * 1680x1050p30 on a 3.4 GHz Phenom II X4 with 16GB PC3-1066 RAM. it holds 25-28 fps on my newer laptop: * 1440x900p30 on a 2.1 GHz Pentium Dual-Core with 4GB RAM. it does a solid 30 fps on my older laptop: * 1024x768p30 on a 1.6 GHz Mobile Athlon (single-core) with 1GB RAM. ** thought it was 1.2 GHz, seems I was misremembering. ** kind of kills things though as comparatively, this laptop is too fast for the screen resolution. *** to be fair, for this resolution, the CPU would have needed to be ~ 1.2-1.4 GHz. was half-considering testing on an ASUS EEE, but I seem to have misplaced it. running off other calculations, there is a statistically high chance that an EEE would be able to record full-screen video at ~ 20 fps or so (given its clock-speeds and resolution). or, if I could build for Android, maybe testing on my tablet or phone (Sony Xperia X8, *). *: like the EEE, it is basically what sorts of video encoding I can get out of an ~ 600MHz CPU. linear extrapolation implies it should be able to pull around 20 fps from 800x480 and ~ 30fps from 640x480. my tablet has HW stats on-par with my laptops though, so it may not mean a whole lot (ran 3DMark, got 5225... CPU speed is similar to old laptop, but graphics and framerates look a lot prettier than either of my laptops). actually, theoretically, the Ouya also has HW stats on par with my laptops as well (raw clock speed in-between them, but has 4 cores and fast RAM). ADD: ( testing desktop PC with 3DMark Cloud Gate: 10890 Ice Storm: 81583 Fire Strike: 4083 (*) *: Updated, didn't crash this time... ) here is from another test involving desktop recording and Minecraft: most of the lag/jerkiness was actually from Minecraft itself, which is basically how it plays on my computer (but I was happy originally when I got some newer parts, mostly because Minecraft usually stays above 20). otherwise: messed recently with special 4:2:0 block-modes, which can improve image quality but hurt encoder speeds (they effectively store color information for each 2x2 pixel sub-block, rather than for the whole 4x4 block, and require more arithmetic in the pixel-to-block transform). I did introduce a more limited form of differential color-coding, which seems to have actually increased encoder speed for some reason (colors will often be stored as a delta from the prior color rather than the color itself, *). generally, color prediction has been largely restricted down to last-seen-color prediction, generally because this can be done without needing to retrieve colors from the output blocks (it will simply keep track of the last-seen colors, using these as the predictors, which is a little cheaper, and also less problematic). *: as-is, for 23-bit colors, a 15-bit delta-color will be used, and will fall back to explicit colors for large deltas.
  8. yeah...   looking online recently, Seagate drives seem to have an annual failure rate of nearly 30%, which is pretty solidly bad. apparently, the 2TB Seagate Barracuda was one of their extra-bad drives though.   I was able to coerce the drive back into working enough to at least try to image its contents onto a new drive (a 3TB WD Caviar Black), which will hopefully work a bit better. sort of been sitting around since yesterday waiting for the thing to image though.   even at pretty good speeds, it will take a good number of hours. at the speeds I was getting initially, it would require around a week, but luckily it seems to have sped up.   it is a bit hit or miss as to whether the drive wants to work at all. it seemed to (seemingly randomly) decide to start working again enough that I could try to image its contents off (after being very uncooperative/unresponsive), and hasn't died again yet.   as well as other odd observed behavior, like when it was unhooked (with only power connected), it seemed to spin up and spin down and do stuff on its own (like it will spin up and sound like it is doing stuff, occasionally make bad-sector noises, ...). this seemed a bit odd, as normally when drives are unhooked, they don't seem to do a whole lot (they will spin up, and maybe spin back down again later).     with my current computer, pretty much every part apart from the case has been replaced since it was originally built (in 2010). though, granted, a lot of the HW I am using isn't particularly high-end.   the reason I have not used SSDs thus far has mostly been due to their high costs vs HDDs.
  9. so, what happened recently: one of my HDDs (in this case, a Seagate 2TB drive) decided to die on me mostly without warning. * curse of Seagate: damn near every time I have used a Seagate drive, it has usually ended up dying on me. ** sort of like my experience with ASUS motherboards... *** had reliability problems with several ASUS MOBOs, got a Gigabyte MOBO, and everything worked. *** maybe not a good sign when the MOBOs come wrapped in cardboard. ** actually the MOBO issue is less severe, a crashy OS is less of an issue than lots of lost data. luckily for me, it wasn't my main OS drive (which is currently a "WD Caviar Black"). well, originally I was using a "Caviar Green" for my main OS drive, but it was 5400 RPM and often had performance issues (computer would often lag/stall, becoming somewhat IO bound). using a (more expensive) 7200 RPM drive made performance a bit better. but, yeah, didn't loose much particularly important, as (luckily) I tend to try to keep multiple copies of a lot of more important stuff (on different drives, ...). but, still did loose some amount of stuff (like, all my 2D character art, and random downloaded YouTube videos, and my installations of Steam and VS2013, ...), and may have lost one of my newer fiction stories, ... some of this is because, sadly, both drive reliability (and the OS going and stupidly turning the filesystem into mincemeat) are not entirely uncommon IME (though the FS mincemeat issue seemed more common with WinXP and NTFS drives, not seen it happen with Win7 thus far, nor had I seen it happen with FAT32 drives). in this case, the HDD itself mostly just stopped working, with Windows mostly just giving lots of "drive controller not ready" error messages, and Windows otherwise not listing the drive as working. though a few times it had worked sort-of, and (with luck) when I get a new HDD, maybe I can see if I can image the old contents onto a new drive. otherwise, recently added an in-engine profiler. mostly this was because after the HDD crash, and resorting back to VS2008 for building my 3D engine (VS2013 had been installed on the crashed HDD), CodeXL stopped being able to effectively profile the thing for some reason or another. since both CodeAnalyst and CodeXL have a lot of "often don't work for crap" issues, I was like "oh hell with it" and basically just made a rudimentary profiler which can be run inside the engine. it aggregates things and tells me which functions use most of the time, which is the main thing anyways (source-level profiling is nice, but would be more involved, and would probably require a proper UI vs just dumping crap to the console). did observe that the majority of the execution time in these tests was going into "NtDelayExecution", which was mostly related to sleeping threads. made it so that the statistics aggregation ignores this function, mostly so that more sane percentages can be given to other functions. beyond this, most of the execution time seems to be going into the OpenGL driver, and into some otherwise unknown machine-code (not part of any of the loaded DLLs, nor part of the BSVM JIT / executable-heap). may be part of OpenGL. this becomes more so if the draw-distance is increased. did otherwise make some animated clouds and a new sun effect. basically, rather than simply using a static skybox, it now uses a skybox with a sun overlay and some animated clouds overlaid (though with a few unresolved issues, like color-blending not working on the clouds for some reason I have yet to figure out). new clouds and some tweaks to metal biome can be seen here:
  10. Added a video: http://www.youtube.com/watch?v=RQUF0NEJAV4   Used BTIC1C via VirtualDub for desktop capture (with a lot of running around in Minecraft) partly to test its viability for this use-case.   It is basically usable, though it seems to be at present a little less effective (worse performance and worse compression and with less smooth movement), than when used for in-game video capture. (settings: 1680x1050p24).   though, granted, during gameplay, Minecraft was itself lagging to some extent while recording. then again, MovieMaker didn't help (apparently made the choppiness worse, in retrospect recording at 30 fps might have been better...).   still seems basically viable though...   it seems pretty competitive with some of my other available options here... doesn't look too awful, can record at full desktop resolution, and doesn't kill the CPU and cause lots of lag.     EDIT / ADD: the poor compression was due to having accidentally broken the Deflater's hash-function during optimization attempts. this is part is fairly major for correct operation... used a signed "int" where an "unsigned int" was needed... this effectively caused all attempts to perform match lookups to fail.
  11. well, status recently: lame... started working on a 2D animation tool, but then ran into UI complexities; UI handling in my 3D engine has become a bit of a tangled mess, and there is no real abstraction over it (most things are, for the most part, handling keyboard events and mouse movements...); there is theoretically support for GUI widgets, but I wrote the code in question 10 years ago, and manage to do it sufficiently badly that doing UI stuff via drawing stuff and raw input handling is actually easier (*1); sometimes I look into trying to clean up the GUI widgets thing, and am like "blarg" and don't make a whole lot of progress; other times, I consider "maybe I will make a new GUI widgets system that *doesn't* totally suck", followed by "but I already have these existing widgets, maybe I can fix it up?" followed by "blarg!". 10 years ago I wrote a few things which were ok, and some other stuff which is just plain nasty, but has largely become a black box in that I can't really make it not suck, but also often can't easily replace it without breaking other stuff. *1: but it is GUI widgets. don't these normally suck?... well, yes, but this one extra sucks, as it was basically based around a stack-based mapping of XHTML forms to C; but, without some way to distinguish form instances... so every widget is identified via a global 'id' name, and there may only be a single widget with this name, anywhere. also, no facilities were provided, you know, to update widget contents. also didn't turn out to really be a sane design for most of the types of stuff I am doing. so, somehow, I managed to make something pretty much less usable or useful than GTK or GDI... doesn't help much when looking into it and realizing that there isn't much logic behind it that isn't stuff one would need to replace anyways (some of the structs are useful, but seemingly this is about it). but, a 2D animation tool, while it needs a UI, doesn't necessarily need a traditional GUI. "well, there are always modes and keyboard shortcuts!". yes, fair enough, but it doesn't help when one is left with a UI where pretty much everything (in the 3D engine) is a big monolithic UI, and there are few good options for keyboard shortcuts remaining (and "CTRL+F1,CTRL+SHIFT+G" is a bit outside "good" territory). yes, my 3D modeller, game, mapper, ... all use the same keyboard and mouse-handling code, just with a lot of internal flags controlling everything. in earlier forms of my game effort, it actually required doing an elaborate keyboard dance of various shortcuts to get into a mode where the controls would work as-expected. I then later made the engine front-end set this up by default and effectively lock the UI configuration (short of a special shortcut to "unlock" the UI). theoretically, I have added a solution to partly address this: now you can "ESC,~" (or "ESC,SHIFT+`") into a tabbed selector for "running programs", along with a possible option for a considered drop-list for launching programs (which would probably work by stuffing commands into the console, probably launching scripts...). clicking on tabs can then be used to switch focus between programs, and possibly allowing a cleaner way to handle various use-cases ("hell, maybe I could add a text-editor and a graphics program and a file manager...", "oh, wait..."). but, on the positive side, effectively this mode bypasses nearly all of the normal user-input handling, allowing each "program" a lot more free-reign over the keyboard shortcuts. architecturally, it is on-par with the console (toggled with "ALT+`", and is sort of like a shell just currently without IO redirection or pipes). but, OTOH, I am not so happy with the present UI situation... a lot of this, is, horrid... thus far, in the 2D animation tool, I can sort of add items and move them around and step between frames (with the movement being interpolated, ...), so it is a start, but still falls well short of what would be needed for a usable 2D animation tool (that is hopefully less effort than the current strategy of doing basic 2D animation via globs of script code...). probably will need a concept of "scenes", where in each scene it will be possible to add objects and set various keyframes, ... but, at the moment, I am less certain, seems all like a bit of an undertaking. did at least go and make some improvements to the in-game video recording: switched from using RPZA to a BTIC1C subset for recording, which has somewhat better image quality and lower bitrate, in this case using a more speed-oriented encoder (vs the main encoder, which more prioritizes size/quality); basically holds up pretty well with tests for recording at full-screen 1680x1050p24; made some tweaks to reduce temporal aliasing issues (mostly related to inter-thread timing issues); also fiddled some with trying to get audio more in sync (recorded video had the audio somewhat out of sync, sort of fudged them more back into alignment via inserting about 400ms of silence at the start of the recording... but this is a crap solution... not sure at present a good way to automatically adjust for internal A/V latency). the BTIC1C variant encoder basically mostly just uses straight RGB23 blocks (with no quantization stage), and a higher-speed single-pass single-stop entropy backend (uses an extended Deflate-based format). this allows faster encoding albeit with worse compression. the normal Deflate/BTLZH encoder uses a 3-pass encoding strategy: LZ77 encode data, count up symbol statistics, build and emit Huffman tables, emit Huffman-coded LZ data. the current encoder speeds this up slightly by using the prior statistics for building the Huffman table, then doing the LZ77 and Huffman coding at the same time. it also uses another trick which is that it doesn't actually "search" for matches, just hashes the data it encounters, and sees if the current hash-table entry points to a match. the compression is a little worse, but the advantage is in being able to use the entropy backend for real-time encoding (vs the primary encoder which is a bit slow for real-time). the temporal aliasing issue was mostly a problem which resulted in a notable drop in the effective framerate of the recording, as I had found that many of the frames which were captured were being lost and many frames were being duplicated in the output. I ended up making some tweaks to the handling of accumulation timers and similar, and the number of lost and duplicate frames is notably reduced. test from in-game recording: 1680x1050p24 RGB23 uses about 19Mbps, and about 0.46 bpp. in other tests, this works out to around 6-7 minutes per GB of recording. this is also a bit better than about 2 minutes per GB I can get from M-JPEG, and seems to have "mostly similar" video quality (and without the JPEG encoder's limitation of being too slow for recording at higher resolutions, which is part of the reason I had switched over to RPZA to begin with). don't yet have any videos up for the current version. the most recent video I have at the time of this writing is for a version of the new codec prior to addressing a few image quality issues nor the temporal aliasing or audio sync issues (so the video is a little laggy and the audio isn't really in-sync...).
  12. recently threw together this thing: http://cr88192.dyndns.org:8080/wiki/index.php/BTLZA what is it?... basically, an extended form of Deflate, intended mostly to improve compression with a "modest" impact on decoding speed (while also offering a modest boost in compression). in its simplest mode, it is basically just Deflate, and is binary compatible; otherwise, the decoder remains backwards compatible with Deflate. its extensions are mostly as such: bigger maximum match length (64KiB); bigger maximum dictionary size (theoretical 4GB, likely smaller due to implementation limits); optional arithmetic coded modes. the idea was partly to have a compromise between Deflate and LZMA, with the encoder able to make some tradeoffs WRT compression settings (speed vs ratio, ...). the hope basically being to have something which could compress better than Deflate but decode faster than LZMA. the arithmetic coder is currently applied after the Huffman and VLC coding. this speeds things up slightly by reducing the number of bits which have to be fed through the (otherwise slow) arithmetic coder, while at the same time still offering some (modest) compression benefit from the arithmetic coder. otherwise, arithmetic coder can be left disabled (and bits are read/written more directly), in which case the decoding will be somewhat faster (it generally seems to make around a 10-15% size difference, but around a 2x decoding-speed difference). ADD: in the tests with video stuff, overall I am getting around a 30% compression increase (vs Deflate). what am I using it for? mostly as a Deflate alternative for the BTIC family of video codecs (many of which had used Deflate as their back-end entropy coder); possibly other use cases (compressing voxel region files?...). ... otherwise, I am now much closer to being able to switch BTIC1C over to full RGB colors; most of the relevant logic has been written, so it is mostly finishing up and testing it at this point. this should improve the image-quality at higher quality settings for BC7 and RGBA output (but will have little effect on DXTn output). most of the work here has been on the encoder end, mostly due to the original choice for the representation of pixel-blocks, and there being almost no abstraction over the block format here (it is sad when "move some of this crap into predicate functions and similar" is a big step forwards, a lot of this logic is basically decision trees and raw pointer arithmetic and bit-twiddling and similar). yeah, probably not a great implementation strategy in retrospect. the current choice of blocks looks basically like: AlphaBlock:QWORD ColorBlock:QWORD ExtColorBlock:QWORD MetadataBlock:QWORD so, each new encoder-side block is 256 bits, and spreads the color over the primary ColorBlock and ExtColorBlock. in total, there is currently about 60 bits for color data, which is currently used to (slightly inefficiently) encode a pair of 24-bit colors (had thought, "maybe I can use the other 32 bits for something else", may reconsider. had used a strategy where ExtColorBlock held a delta from the "canonical decoded color"). for 31F colors, I may need to use the block to hold the color-points directly: ExtColorBlock: ColorA:DWORD ColorB:DWORD had also recently gained some quality improvement mostly by tweaking the algorithm for choosing color endpoints: rather than simply using a single gamma function and simply picking the brightest and darkest endpoints, it now uses 4 gamma functions. roughly, by fiddling, I got the best results with a CYGM (Cyan, Yellow, Green, Magenta) based color-space, where each gamma function is an impure form of these colors (permutations of 0.5, 0.35, 0.15). the block encoder then chooses the function (and endpoints) which generated the highest contrast. this basically improved quality with less impact on encoder speed than with some other options (it can still be done in a single pass over the input pixels). it generally improves the quality of sharp color transitions (reducing obvious color bleed), but does seem to come at the cost in these cases of slightly reducing the accuracy of preserved brightness. this change was then also applied to my BC7 encoder and similar with good effect.
  13. see here: http://cr88192.dyndns.org:8080/wiki/index.php/BGB_Current_Status a new version of the engine is available. as well, the image codec library (BGBBTJ) is also available as a stand-alone zip: http://cr88192.dyndns.org:8080/2014-01-05_bgbtech_bgbbtj.zip little provision is made for out-of-box use, as in, if anyone wants to compile or mess with it, probably some hackery will be needed. it provides VFW codec drivers for encoding/decoding, with the encoder currently hard-coded to use BTIC1C (with most of the encoding settings also hard-coded). in my tests it can encode videos with VirtualDub though, so it works here at least. I may later consider adding a codec configuration UI or similar, as well as maybe clean up a few things (better provisions for handling logging and configuration). this would likely mean either putting the config information in the registry, or putting an INI somewhere (rather than just hard-coding stuff like where to put the log file and similar). I have thus far not really finished the level of 1C encoder modifications needed to effectively support expanded color depths (moving forwards here largely requires some fairly non-trivial rewriting of the encoder, effectively moving the encoder over to a new intermediate block format, ...). on another note: made a recent observation that speech is still intelligible at 8kHz 1bit/sample (just it has a harsh/buzzy "retro" sound); not sure as of yet if I will make much use of this. it could mostly be relevant WRT hand-editing sample data as sequences of hex-numbers or similar. a high-pass filter is needed though, otherwise there are significant audio problems. in my tests, I was having best results filtering out everything below about 250Hz. potentially, direct 4 bits/sample could also make sense, as it would map 1 sample per hex character. example, simple sine wave: 89AB CDEE FFEE DCBA 8976 5432 1100 1123 4567 36 samples, 0.0045 seconds (222 Hz). as 1bpp: FF FF C0 00 0 or, as 2bpp (compromise): AAFF FFFA A550 0000 55
  14. ADD, Comment: just noticed that apparently posts can only contain a single video or something... (trying to add another will remove the one that was already there...).   here is the second video: http://www.youtube.com/watch?v=-auo8LYEXY4     ADD 2: I am running into a minor side-issue: all the features that keep being added have generally turned the decoder into an evil/awful mess, and even then there are holes, like differential-colors + higher color-depths (or when using BC6H or BC7 as the output format) aren't actually implemented.   the giant switch for the DXTn decode path has now expanded to nearly 1500 lines. ironically, the BC6 and BC7 paths are actually smaller, mostly because some more of the logic was factored out into functions.     another idle thought is if the extended 23-bit RGB mode were used for in-game video capture. this mode would use 5 bpp block (vs 4 bpp blocks for the current capture), but could look a little better. ( the encoder used for capture is fairly naive, prioritizing fast encoding over good compression. )
  15. well, first off, recently did a test showing the image quality for BTIC1C: this test was for a video at 1024x1024 with 8.6 Mbps and 0.55 bpp. as noted, the quality degradation is noticeable, but "mostly passable". some amount of it is due largely to the conversion to RGB555, rather than actual quantization artifacts (partly because video compression and dithering don't really mix well in my tests). however, some quantization artifacts are visible. as usual, working spec: http://cr88192.dyndns.org:8080/wiki/index.php/BTIC1C other recent changes: I have split apart BTIC1C and RPZA into different codecs, mostly as 1C has diverged sufficiently from RPZA that keeping them as a single codec was becoming problematic. BTIC1C now has BC6H and BC7 decode routes, with single-thread decode speeds of around 320-340 Mpix/sec for BC7, and around 400 Mpix/sec for BC6H (the speed difference is mostly due to the lack of an alpha channel in 6H, and slightly awkward handling of alpha in BC7). as-is, both effectively use a subset of the format (currently Mode 5 for BC7, and Mode 11 for 6H). the (theoretical) color depth has been expanded, as it now supports 23-bit RGB and 31-bit RGB. RGB23 will give (approximately) a full 24-bit color depth (mostly for BC7, possibly could be used for RGBA). RGB31 will support HDR (for BC6H), and comes in signed and unsigned variants. as-is, it stores 10-bits per component (as floating-point). likewise, the 256-color indexed block-modes have been expanded to support 23 and 31 bit RGB colors. these modes are coerced to RGB565 for DXTn decoding, as well as RGB555 still being usable with BC7 and BC6H, ... this means that video intended for one format can still be decoded for another if-needed (though videos will still have a "preferred format"). as-is, it will still require some work on the encoder end to be able to generate output supporting these color depths (likely moving from 128 to 256 blocks on the encoder end). the current encoder basically uses a hacked form of DXT5 for its intermediate form, where: (AlphaA>AlphaB) && (ColorA>ColorB) basically the same as DXT5. (AlphaA
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!