Why would you have any data in non optimal dimensions, and why would you even bother loading any of those formats? This stuff should be done once, and only once. Write a tool that goes over your assets folder and does any needed conversions.good loading times are hard.
for example, in my case (on a PC), loading textures is a good part of the startup time, and a lot of this due to resampling the (generally small) number of non-power-of-2 textures to be power-of-2 sizes. this is then followed by the inner loops for doing the inverse-filtering for PNG files (more so, the cycle-eating evilness of the Paeth filter, which wouldn't be as bad, except that this tends to be one of the most generally effective, and thus most-used, filters in PNG).
I sometimes also wish that PNG also had a simpler "A+B-C" linear filter, which can often compete well with Paeth, but, more notably, is much cheaper (it could improve performance by potentially dilluting the number of times the encoder picks the Paeth filter). granted, along similar lines, one can also wish that PNG filtered per-block rather than per-scanline, ... but alas. (nevermind all the pros and cons of using custom image formats for textures...).
note that even as such, decoding such a texture may still be faster than loading a simpler uncompressed texture would be (such as BMP or TGA), due mostly to the time it requires to read files from the HDD.
luckily, one can usually cache textures.
unluckily, you still have to load them the first time they are seen.
Take all your files, and convert them to proper resolution (better yet, beat your artist with a blunt object until he makes proper power of 2 images. Images look bad when you change their aspect ratio), and then turn them into compressed DDS files. You're writing a game, not an image editor, it doesn't need to know about PNG or TGA or anything else.
Then when your game runs, dump your compressed DDS directly into VRAM. You don't need to check their dimensions. You don't need to uncompress or convert anything. Just dump it directly. DDS also handles mip levels, which covers your next paragraph.
When the game is running, everything should already be ready to go. Nothing should ever have to be converted or processed on the fly. Keep a good copy of everything for editing purposes, but make sure it's all converted when you build your game. Do a simple date comparison, if any asset is newer than the last converted one, convert it then.
power-of-2 is good, but mandating power of 2 isn't always itself optimal from a content-creation POV. better IMO to accept whatever, and resample it as needed, but generally as a matter of policy try to keep everything power-of-2 (to avoid resampling).
a lot of the ugliness caused by resampling can be reduced by using bicubic interpolation for the texture resampler, but granted, doing this is slow (somewhat slower than bilinear or nearest interpolation).
the problem with DDS is that it chews though a lot more HDD space than PNG or JPEG, and the potential for faster loading is largely hindered by the additional disk IO required (still better than BMP or TGA, which is just wasting HDD space). like, some saved clock-cycles doesn't really offset a slow HDD all that well (as well as still leaving download times longer for end-users, ...).
JPEG tends to gives the smallest files, but being lossy tends to result in lower image quality, and (in its common form) does not support alpha blending.
PNG is more of a compromise.
granted, I have a custom format (roughly based on JPEG, but with more features), but it has its own drawbacks (mostly that it is non-standard).
the Paeth issue more has to do with decoding PNG images, where essentially one ends up with several conditionals per pixel component (3 or 4 times per pixel):
given it is needed to do this about 1M times for a 512x512 image, and conditionals are fairly slow, this isn't free (especially since the branch-predictor can't really accurately predict them).
a direct linear predictor (a+b-c) usually does almost as well as Paeth, but is cheaper (but, sadly, not supported by PNG).
an encoder which then avoids Paeth ends up having to use a generally much less effective filter (resulting in bigger compressed images).
ironically, this issue is sufficiently bad that it manages to (almost single handedly) make decoding PNG images currently slower than decoding JPEG images.
but, alas...