Jump to content

  • Log In with Google      Sign In   
  • Create Account

BGBTech: The Status Update

random: video upsample and enhance...

Posted by , 24 September 2013 - - - - - - · 995 views

well, this was another experiment... (possibly a waste of time, but oh well...).

I had an idea for a tool idea to experiment with:
a tool which will take input video, try to up-sample it in a way which "hopefully doesn't look like crap", and save output video.

basically, it just decodes frames from one video, and then resamples and re-encodes them into another video.

the main thing here, granted, is the upsampler.

the algorithm used is currently fairly specialized, and currently only upsamples by powers of 2 (only 2x at a time, but may be used recursively).

I had originally considered trying a neural net (likely being trained over the video prior to upsampling), but realized I don't have anywhere near enough CPU cycles to make this viable (vs a more conventional strategy).

instead, an algorithm is used which combines bicubic filtering for the "main cross" (basically, a cross-shaped region over the pixels being interpolated), with linear extrapolation from the corners (these points become those used by the bicubic filter). the filter actually over-compensates slightly (by itself it introduces noticeable ringing, by design).

additionally, a very slight blur (~ 0.06) is applied both before and after upsampling, where the former helps reduce artifacts in the original image (which otherwise result in noise in the upsampled image, the noise is typically lower-intensity than are major details so is removed more easily by a slight blur), and the latter helps smooth out ringing artifacts.

the reason over-compensating is to try to compensate for the initial loss of sharpness by the first-pass blur, albeit at the cost of producing additional ringing artifacts near edges. the secondary blur helps smooth this out, and makes these look more like proper edges.

in all, it looks "pretty good", or at least, gives slightly nicer looking results than the upsampling provided by VirtualDub (with the files tested, mostly low-res AVIs of various 90s era TV shows). while VirtualDub provides blur filters, they are too strong, and so using a blur before upsampling results in an obvious loss of sharpness, and doing a blur afterwards does not reduce the effect of artifacts, which become much more pronounced after upsampling.

however, beyond this, the goodness ends:
the tool is slow (currently slower than real-time), currently produces overly large files (due mostly to me saving the output as MJPEG with 95% quality), has a fairly limited range of supported formats and codecs (ex: AVI with MJPG/RPZA/XviD/... and PCM, ADPCM, or MP3 audio, ..., *1), is awkward to use (it is a command-line tool), ...

*1: doesn't currently work with other common formats like MPG, FLV, RM, MOV, ... and only a few formats have encoders (MJPEG, RPZA, ...). I tried initially saving as RPZA, but it made little sense to spend so many cycles upsampling simply to save in a format which doesn't really look very great. (while XviD would be good here, using it isn't without issues..., *2).

granted, there is a possible option of just getting funky with the MJPEG encoding to ensure that the file stays under a certain limit (say, we budget 750MB or 900MB for a 1hr video, with per-frame quality being adjusted to enforce this limit).
for example: 1hr at 24Hz and a 900MB limit means, 86400 frames, and a limit of 10.42 kB/frame.
or, alternatively, a running average could be used to adjust quality settings (to maintain an average bit-rate).

*2: patents, dealing with GPL code, or a mandatory VfW dependency.

another limitation is that I don't currently know any good way to counteract the effects of motion compensation.

likewise would be for trying to counteract VHS artifacts (tracking issues, etc...).

ultimately, lost detail is still lost...

but, even as such, a video might still look better upsampled, say, from 352x240 to 704x480, than if simply stretched out using nearest or linear filtering or similar.

all this would be a bit much work though, as my initial goal was mostly just such that some older shows could be watched with more ability to see what was going on (but, then, VirtualDub resampling is still more convenient and works with a wider range of videos...).

then again, for all I know, someone already has a much better tool for this.

or such...

meh, adding Theora support...

the lib (libtheora) is BSD, so no huge issue, but had to hack on it a bit to make it buildable.
its code is nearly unreadable, I am almost left wondering off if I would be better off doing my own implementation, but probably just going to use the lib for now and replace it later if needed.

basically, actual video is a case where one has more need for an actual video codec (unlike animated textures / ...), as adding a feature to adaptively adjust quality has shown that 1 hour of video with MJPG with a size < 1GB results in awful quality, passable results seem to require a minimum of around 2GB per hour.

this is mostly because MJPG lacks P-Frames or similar (and there is no way to add them without breaking existing decoders, or at least looking really nasty, namely, every P-Frame comes out as a big grey image).

finding *anything* on this was a pain, as apparently AVI+Theora is very uncommon (just, don't want to mess with Ogg at the moment). but, OTOH, Theora is also one of the few major codecs which isn't "yet another reimplementation or variant of MPEG-4".

left thinking that my "JPEG" library is becoming basically a bit of a beast, basically handling image and video compression/decompression with a collection of formats and custom implementations of various codecs.
could be a little better organized, and ideally with less tangling between the AVI logic and codecs.

elsewhere in the renderer, there is a place where texture-management is mixed up with AVI and codec internals, but this may need to be cleaned up eventually, most likely via some sort of proper API for extended components and layers, and making DXTn be a more canonical image representation, likely also with mipmap and cube-map support.

well, and separating layer-management from codec internals would be nice (as-is, it is necessary to jerk off with layers specifically for each codec which uses them).

documentation would have been nice...

I eventually ended up going and digging around in the FFmpeg source trying to answer the question of how exactly their implementation works, which mostly boiled down to the use of 16-bit length fields between headers. knowing these sorts of things could have saved me a good number of hours.

new experimental video codec, BTIC1C...

Posted by , 19 September 2013 - - - - - - · 818 views
video, codec, compression, DXT
Reused text...

Once again, it is a codec intended for quickly decoding into DXT for use in animated textures and video-mapped textures.

It is a direct continuation of my effort to implement Apple Video, but with a slight rename.

I ended up calling it BTIC1C as it is (despite being based on Apple Video), still within the same general "family" as my other BTIC1 formats, as otherwise I probably would have needed to call it "BTIC3" or something.

There may also be a Deflate-compressed variant, which will increase compression at some cost in terms of decoding performance.

Test of using codec for video capture...

basically, created a newer video codec experiment I am calling BTIC1C:

it is basically a modified Apple Video / RPZA with a few more features glued on.

the idea is based on the observation that the RPZA and DXT block structures are sufficiently similar that it is possible to convert between them at reasonably high speeds (mostly by feeding bytes through tables and similar).

this is basically how my current implementation works, basically directly transcoding to/from DXT, and using some of my existing DXT related logic to handle conversions to/from RGBA space, and also things like lossy block quantization.

this then allows the decoder to operate at around 450-480 Mpix/s (for a single-threaded decoder, decoding to DXT). though, granted, the image quality isn't particularly great (slightly worse than DXT at best).

my additions were mostly adding support for alpha transparency, and LZ77 based dictionary compression.

the alpha transparency is basically compatible with existing decoders (seems to be compatible with FFmpeg, which displays an opaque version of the image).

there is support for blended-alpha transparency, but this will not work correctly if decoded by a decoder which expects RPZA (so non-blended transparency is needed).

the LZ77 support is not compatible with existing (RPZA) decoders, but results in a modest increase in compression (in my tests, a 25-50% size reduction with some of the clips tested). this does not add any real cost in terms of decoding speed. the dictionary compression is done in terms of runs of blocks (rather than bytes).

compressed file sizes are "competitive" with M-JPEG, though with M-JPEG generally able to deliver higher image quality (for pretty much anything above around 75% quality), and compressing better with clips which don't effectively utilize the abilities of the codec (it is most effective at compressing images with little movement and large flat-colored areas).

for clips though with mostly static background images and large flat-colored areas, such as "cartoon" graphics, the codec seems particularly effective, and does much better than M-JPEG in terms of size/quality (it requires setting the JPEG quality very low, resulting in an awful-looking mess).

additional competitiveness is added by being around 6-7 times faster than my current M-JPEG decoder (currently does around 70-90 Mpix/s with a single-threaded decoder, when decoding directly to DXT).

comparison between this and a prior format, BTIC1A, is that this format compresses better, but has lower decode speeds.

tests have implied that Huffman compression would likely help with compression (based on a few experiments deflate-compressing the frames), but would likely hurt performance regarding the decode speeds.

I have not done tests here for multi-threaded decoding.

or such...

Planets, and Apple Video...

Posted by , 17 September 2013 - - - - - - · 628 views

I have decided to change my voxel engine from being an "infinite" 2D plane, to being a bounded 2D region, which I will assert as being a "planet".

this basically consisted of adding wrap-around, and generating a topographical map for the world.
currently, the world size is fairly small, roughly a space about the size of the surface of Phobos, or otherwise, about 64km around (65536 meters).

an issue with planets:
there is no really obvious "good" way to map a 2D plane onto a sphere.
some people try, but things get complicated.

I took the easy route though, and simply made it wrap around, effectively making the ground-level version of the planet have a torus as its topology.

then, for those who notice that the ground-level view of the planet is geometrically impossible, I can be like "actually, the planet is a Clifford Torus", and the spherical view represents the part of the torus passing through normal 3D space...

kind of a hand-wave, but oh well...

meanwhile, went and threw together an implementation of the Apple Video codec (FOURCC='rpza').


I was thinking "you know, Apple Video and DXT1 are pretty similar...".

basically, they are sufficiently similar to where moderately fast direct conversion between them is possible.
granted, the size/quality tradeoff for this codec isn't particularly good.

while I was implementing support for the codec, I did add alpha-support, so this particular variant supports transparent pixels.

quick test:
I am decoding frames from a test video at around 432 Mpix/s, or a 320x240 test video decoding at around 5700 frames/second. the above is with a vertical flip as a post-step (omitting the flip leaves the decoder running at 470 Mpix/s, and 6130 frames/second).

and, then even as fast as it is going, checking in the profiler reveals a spot which is taking a lot of cycles, causing me to be like "MSVC, FFS!!"

the expressions:
tb[6]=0; tb[7]=0;

compile to:
0xfe08c6a mov edx,00000001h BA 01 00 00 00 0.44537419
0xfe08c6f imul eax,edx,06h 6B C2 06
0xfe08c72 mov [ebp-44h],eax 89 45 BC 0.21256495
0xfe08c75 cmp [ebp-44h],10h 83 7D BC 10
0xfe08c79 jnb $+04h (0xfe08c7d) 73 02 1.38335919
0xfe08c7b jmp $+07h (0xfe08c82) EB 05 0.19569471
0xfe08c7d call $+00018ce1h (0xfe2195e) E8 DC 8C 01 00
0xfe08c82 mov ecx,[ebp-44h] 8B 4D BC
0xfe08c85 mov [ebp+ecx-14h],00h C6 44 0D EC 00 0.25305352
0xfe08c8a mov edx,00000001h BA 01 00 00 00 0.50273299
0xfe08c8f imul eax,edx,07h 6B C2 07
0xfe08c92 mov [ebp-3ch],eax 89 45 C4 0.30703825
0xfe08c95 cmp [ebp-3ch],10h 83 7D C4 10
0xfe08c99 jnb $+04h (0xfe08c9d) 73 02 1.26189351
0xfe08c9b jmp $+07h (0xfe08ca2) EB 05 0.2598016
0xfe08c9d call $+00018cc1h (0xfe2195e) E8 BC 8C 01 00
0xfe08ca2 mov ecx,[ebp-3ch] 8B 4D C4
0xfe08ca5 mov [ebp+ecx-14h],00h C6 44 0D EC 00 0.00337405

which is actually, pretty much, just plain ridiculous...

yes, the above was for debug code...
but, there is presumably some limit to how wacky code built in debug mode should get.


the implementation of Apple Video was extended slightly, with this extended version hereby dubbed BTIC1C.

the primary additions are mostly alpha support and some LZ77 related features.

Bounds-checking? in C? it would appear so?... (sort of...).

Posted by , 13 September 2013 - - - - - - · 740 views
bounds checking, c++
well, here is something unexpected.
I have yet to generally confirm this, or yet did any real extensive testing to confirm that what I have encountered works, in-general, but it appears to be the case at least in the cases encountered.

(I can't actually seem to find any mention of it existing, like somehow encountering a feature which should not exist?...).

(ADD 4: Turns out it was much more limited, see end of post...).

basically, what happened:
I had looked around on the internet, and recently saw graphs showing that newer versions of MSVC produced code which was notably faster than the version I was using (Platform SDK v6.1 / VS 2008).

I was like, "hell, I think I will go download this, and try it out...".

so, long story short (on this front, basically went and downloaded and installed a bunch of stuff and ended up needing to reboot computer several times), I got my stuff to build on the new compiler (Visual Studio Express 2013).

except, in several cases, it crashed...

the thing was not just that it crashed, but how it crashed:
it was via bounds-check exceptions (I forget the name).

not only this, but on the actual lines of code which were writing out of bounds...
and, this occurred in several places and under several different scenarios.

in past compilers, one may still get a crash, but it was usually after-the-fact (when the function returns, or when "free()" is called), but this is different.

looking around, I couldn't find much information on this, but did run across this paper (from MS Research):

this implies that either this (or maybe something similar) has actually been put into use in compilers deployed "in the wild", and that bounds-checking for C code, apparently, does now actually exist?... (ADD4: No, it does not, I was incorrect here...).

compiler: Visual Studio Express 2013 RC (CL version: 18.00.20827.3);
language: C;
code is compiled with debug settings.

ADD: ok, turns out this is a "Release Candidate" version, still can't find any reference to this feature existing.

I may have to go do some actual testing to confirm that this is actually the case, and/or figure out what else could be going on here... I am confused, like if something like this were added, wouldn't someone write about it somewhere?...

ADD2 (from VS, for one of the cases):
0FD83E3C sub eax,1
0FD83E3F mov dword ptr [ebp-0A4h],eax
0FD83E45 cmp dword ptr [ebp-0A4h],40h
0FD83E4C jae BASM_ParseOpcode+960h (0FD83E50h)
0FD83E4E jmp BASM_ParseOpcode+965h (0FD83E55h)
0FD83E50 call __report_rangecheckfailure (0FD9C978h)
0FD83E55 mov eax,dword ptr [ebp-0A4h]
0FD83E5B mov byte ptr b[eax],0

so, this one is a statically-inserted bounds-check.
(dunno about the others...).

ADD3 (this case has an explanation at least):

ADD4: more testing, it does not apply to memory allocated via "malloc()", which still does the crash on "free()" thing, rather than crash immediately.

the bounds-checking apparently only applies to arrays which the compiler knows the size for, but does not apply to memory accessed via a raw pointer.

complexity and effort-budget...

Posted by , 11 September 2013 - - - - - - · 633 views

idle thoughts here mostly:
project complexity is not nearly so easily estimated as some people seem to think it is;
time budget is, similarly, not nearly so easily estimated.

there is a common tendency to think that the effort investment in a project scales linearly (or exponentially) with the total size of the codebase.
my experiences here seem to imply that this is not the case, but rather that interconnectedness and non-local interactions are a much bigger factor.

like, the more code in more places that one has to interact with (even "trivially") in the course of implementing a feature, the more complex overall the feature is to implement.

similarly, the local complexity of a feature is not a good predictor of the total effort involved in implementing the feature.

like, a few examples, codecs and gameplay features and UIs:

codecs are have a fairly high up-front complexity, and so seemingly are an area a person dare not tread if they hope on getting anything else done.

however, there are a few factors which may be overlooked:
the behavior of a codec is typically extremely localized (typically, code outside the coded will have limited interaction with anything contained inside, so in the ideal case, the codec is mostly treated as a "black box");
internally, a codec may be very nicely divided into layers and stages, which means that the various parts can mostly be evaluated (and their behavior verified) mostly in isolation;
additionally, much of the logic can be copy/pasted from one place to another.

like, seriously, most of the core structure for most of my other codecs has been derived via combinations of parts derived from several other formats: JPEG, Deflate, and FLAC.
I originally just implemented them (for my own reasons, *1), and nearly everything since then has been generations of copy-paste and incremental mutation.

so, for example, BTAC-2A consisted mostly of copy-pasted logic.

*1: partly due to boredom and tinkering, the specs being available, and my inability to make much at the time "clearly better" (say, "what if I have something like Deflate, just with a 4MB sliding window?...", surprisingly little, and trying out various designs for possible JPEG like and PNG like formats, not getting much clearly better than JPEG and PNG, *2).

*2: the partial exception was some random fiddling years later, where I was able to improve over them, by combining parts from both (resulting in an ill-defined format I called "NBCES"), which in turn contributed to several other (incomplete) graphics codecs, and also BTAC-2A (yes, even though this one is audio, bits don't really care what they "are").
(basically its VLC scheme and table-structures more-or-less come from NBCES, followed by some of my "BTIC" formats).

the inverse case has generally been my experiences with adding gameplay, engine, and UI features:
generally, these tend to cross-cut over a lot of different areas;
they tend not to be nearly so well-defined or easily layered;

as a result, these tend to be (per-feature) considerably more work in my experience.
I have yet to find a great way to really make these areas "great", but generally, it seems that keeping things "localized" helps somewhat here;
however, localization favors small code, since as code gets bigger, the more reason there is to try to break it down into smaller parts, which in turn goes against localization (in turn once again increasing the complexity of working on it).

another strategy seems to be trying to centralize control, so while the total code isn't necessarily smaller, the logic which directly controls behavior is kept smaller.

like, the ideal seems to be "everything relevant to X is contained within an instance of X", which "just works", without having to worry about the layers of abstraction behind X:
well, it has to be synchronized via delta messages over a socket, has to make use of a 3D model and has tie-ins with the sound-effect mixing, may get the various asset data involved, ...

it is like asking why some random guy in a game can't just suddenly have his outfit change to reflect his team colors and display his respective team emblem, ... like, despite the seemingly trivial nature of the feature, it may involve touching many parts of the game to make it work (assets, rendering, network code, ...).

OTOH, compiler and VM stuff tends to be a compromise, namely that the logic is not nearly so nicely layered, but it is much more easily divided into layers than is gameplay logic.

say, for example, the VM is structured something like:
parser -> (AST) -> front-end -> (bytecode)
(bytecode) -> back-end -> JIT -> (ASM)
(ASM) -> assembler -> run-time linker -> (machine-code)

it may well end up being a big ugly piece of machinery, but at the same time, has the "saving grace" that most of the layers can mostly ignore most of the other layers (there is generally less incidence of "cross-cutting").

so, in maybe a controversial way, I suspect tools and infrastructure are actually easier overall, even with as much time-wasting and "reinventing the wheel" they may seem to be...

like, maybe the hard thing isn't making all this stuff, but rather making an interesting/playable game out of all this stuff?...

well, granted, my renderer and voxel system have more or less been giving me issues pretty much the whole time they have existed, mostly because the renderer is seemingly nearly always a bottleneck, and the voxel terrain system likes eating lots of RAM, ...

or such...

otherwise: new codec, BTAC 2A

Posted by , 02 September 2013 - - - - - - · 547 views
audio, codec, compression, BTAC
Some info available here:


this is basically a codec I threw together over the past some-odd days (on-off over the past 10 days), probably a solid 3 or 4 days worth of implementation effort though.

it was intended to try to have a better size / quality tradeoff than my prior BTAC, but thus far it hasn't quite achieved it, though it has gotten "pretty close to not having worse audio quality".

unlike the prior BTAC, it is a little more flexible.
given its use of Huffman-coding and similar, it is a bit more complicated though.

currently, the encoder is hard-coded to use 132 kbps encoding (basically, as a compromise between 88 and 176 kbps that is similar to 128).

both are designed for random access to blocks from within my mixer (as opposed to stream decoding), so this is their main defining feature (as far as the mixer can tell, it looks pretty similar to the prior BTAC).

working backwards: A 2D platformer game?...

Posted by , 01 September 2013 - - - - - - · 866 views

(started writing a post, but not sure if this makes much sense as a post...).

currently, I have a 3D game (FPS style), but also a bit of a problem:
it kind of sucks, pretty bad;
I don't think anyone really cares.

I also have a few drawbacks:
not very good at 3D modeling;
I am worse at map-making (hence... why I ended up mostly using voxels...).

but, I can do 2D art "acceptably".

I was recently thinking of another possibility:
I instead do something "simpler" from a content-creation POV, like a platformer.

the core concept would be "basically rip-off something similar to the MegaMan games".

the next thought is how to structure the worlds.
I was initially thinking of using single giant PNG images for content creation, but ran into a problem: to represent the whole world as a single giant graphics image (with a 1:1 pixel density mapping with current monitors) would require basically absurdly large resolutions, and the graphics editor I am mostly using (Paint.NET) doesn't really behave well (it lags and stalls and uses GBs of RAM with a 65536 x 8192 image). note that it would probably need to be carved up by a tool prior in-game use (so that it can be streamed and fit more nicely in RAM).

another strategy would be basically to compromise, and maybe build the world as a collection of 1024x1024, 2048x2048, or 4096x4096 panels. each panel would then represent a given segment of the world, and is also a more manageable resolution.

if using 4096x4096 panels, it would probably still be needed to carve them up, mostly to avoid obvious stalls if demand-loading them.

the drawback: mostly having to deal with multiple panels in the graphics editor.

partly it has to do with world pixel density, as my estimates showed that a 16-meter tall visible area, with a close to 1:1 mapping to screen pixels (and a native resolution of 1680x1050 or 1920x1080), would use around 64 pixels per meter.

alternatives would be targeting 32 or 16 pixels/meter, but with the drawback of a lower relative resolution.

or, alternatively, using tiles instead of panels (but, for this, I would probably need to go write a tile-editor or similar, or use the trick of using pixel-colors for panel types). could consider "frames" to allow for larger items if needed.

if using tiles, 1 meter is most likely, or maybe 4 meters as a compromise between tiles and larger panels.

not sure if there are any standard practices here...

as for gameplay/logic, probably:
character walks left and right, can fire a gun, optionally with up/down aiming, and can also jump;
enemies will probably walk back and forth, fire projectiles if the player is seen, or other behaviors, probably with anim-loop driven logic.

unclear: if using a client/server architecture makes sense for a platformer, or if it is more sensible to do it with gameplay directly tied to rendering? (does anyone actually do network multiplayer in platformers?...)

more likely I could just start out by having partially disjoint render-entities and game-entities, with the option of switching over to client/server later if needed.

could consider this time writing the game-logic primarily in my script-language.

does still seem a bit like "more of the same...".

can't say if this will go anywhere.