random: video upsample and enhance...
I had an idea for a tool idea to experiment with:
a tool which will take input video, try to up-sample it in a way which "hopefully doesn't look like crap", and save output video.
basically, it just decodes frames from one video, and then resamples and re-encodes them into another video.
the main thing here, granted, is the upsampler.
the algorithm used is currently fairly specialized, and currently only upsamples by powers of 2 (only 2x at a time, but may be used recursively).
I had originally considered trying a neural net (likely being trained over the video prior to upsampling), but realized I don't have anywhere near enough CPU cycles to make this viable (vs a more conventional strategy).
instead, an algorithm is used which combines bicubic filtering for the "main cross" (basically, a cross-shaped region over the pixels being interpolated), with linear extrapolation from the corners (these points become those used by the bicubic filter). the filter actually over-compensates slightly (by itself it introduces noticeable ringing, by design).
additionally, a very slight blur (~ 0.06) is applied both before and after upsampling, where the former helps reduce artifacts in the original image (which otherwise result in noise in the upsampled image, the noise is typically lower-intensity than are major details so is removed more easily by a slight blur), and the latter helps smooth out ringing artifacts.
the reason over-compensating is to try to compensate for the initial loss of sharpness by the first-pass blur, albeit at the cost of producing additional ringing artifacts near edges. the secondary blur helps smooth this out, and makes these look more like proper edges.
in all, it looks "pretty good", or at least, gives slightly nicer looking results than the upsampling provided by VirtualDub (with the files tested, mostly low-res AVIs of various 90s era TV shows). while VirtualDub provides blur filters, they are too strong, and so using a blur before upsampling results in an obvious loss of sharpness, and doing a blur afterwards does not reduce the effect of artifacts, which become much more pronounced after upsampling.
however, beyond this, the goodness ends:
the tool is slow (currently slower than real-time), currently produces overly large files (due mostly to me saving the output as MJPEG with 95% quality), has a fairly limited range of supported formats and codecs (ex: AVI with MJPG/RPZA/XviD/... and PCM, ADPCM, or MP3 audio, ..., *1), is awkward to use (it is a command-line tool), ...
*1: doesn't currently work with other common formats like MPG, FLV, RM, MOV, ... and only a few formats have encoders (MJPEG, RPZA, ...). I tried initially saving as RPZA, but it made little sense to spend so many cycles upsampling simply to save in a format which doesn't really look very great. (while XviD would be good here, using it isn't without issues..., *2).
granted, there is a possible option of just getting funky with the MJPEG encoding to ensure that the file stays under a certain limit (say, we budget 750MB or 900MB for a 1hr video, with per-frame quality being adjusted to enforce this limit).
for example: 1hr at 24Hz and a 900MB limit means, 86400 frames, and a limit of 10.42 kB/frame.
or, alternatively, a running average could be used to adjust quality settings (to maintain an average bit-rate).
*2: patents, dealing with GPL code, or a mandatory VfW dependency.
another limitation is that I don't currently know any good way to counteract the effects of motion compensation.
likewise would be for trying to counteract VHS artifacts (tracking issues, etc...).
ultimately, lost detail is still lost...
but, even as such, a video might still look better upsampled, say, from 352x240 to 704x480, than if simply stretched out using nearest or linear filtering or similar.
all this would be a bit much work though, as my initial goal was mostly just such that some older shows could be watched with more ability to see what was going on (but, then, VirtualDub resampling is still more convenient and works with a wider range of videos...).
then again, for all I know, someone already has a much better tool for this.
meh, adding Theora support...
the lib (libtheora) is BSD, so no huge issue, but had to hack on it a bit to make it buildable.
its code is nearly unreadable, I am almost left wondering off if I would be better off doing my own implementation, but probably just going to use the lib for now and replace it later if needed.
basically, actual video is a case where one has more need for an actual video codec (unlike animated textures / ...), as adding a feature to adaptively adjust quality has shown that 1 hour of video with MJPG with a size < 1GB results in awful quality, passable results seem to require a minimum of around 2GB per hour.
this is mostly because MJPG lacks P-Frames or similar (and there is no way to add them without breaking existing decoders, or at least looking really nasty, namely, every P-Frame comes out as a big grey image).
finding *anything* on this was a pain, as apparently AVI+Theora is very uncommon (just, don't want to mess with Ogg at the moment). but, OTOH, Theora is also one of the few major codecs which isn't "yet another reimplementation or variant of MPEG-4".
left thinking that my "JPEG" library is becoming basically a bit of a beast, basically handling image and video compression/decompression with a collection of formats and custom implementations of various codecs.
could be a little better organized, and ideally with less tangling between the AVI logic and codecs.
elsewhere in the renderer, there is a place where texture-management is mixed up with AVI and codec internals, but this may need to be cleaned up eventually, most likely via some sort of proper API for extended components and layers, and making DXTn be a more canonical image representation, likely also with mipmap and cube-map support.
well, and separating layer-management from codec internals would be nice (as-is, it is necessary to jerk off with layers specifically for each codec which uses them).
documentation would have been nice...
I eventually ended up going and digging around in the FFmpeg source trying to answer the question of how exactly their implementation works, which mostly boiled down to the use of 16-bit length fields between headers. knowing these sorts of things could have saved me a good number of hours.