Why does performance works better when the image has those dimensions of 2 to the n?
It has to do with math and the way graphics texturing hardware works.
Having power of two (POT) lengths allows you to make many optimizations because computers are based on POT hardware. When you have non power of two (NPOT) textures you lose those optimizations.
If you go back a few years all the major graphics engines required textures to have a POT length on each edge. So a texture could be 64x2048, or 256x256, or any other size as long as both edges were a POT length. One of the many reasons for this is generating mip-maps; it is easy to generate an image that is half as big, and another image that is half as big, and another image that is half as big, as small as needed. Another is some of the nice mathematical properties involved, such that you can always double the dimensions and store all the lower resolutions in that space, and that if you use any convolution kernels on the image it is guaranteed to work at every detail level. There are filters built in to the major graphics APIs, and they are usually box filters, using a 4x4 box filter works wonderful on POT images, but produce edge effects if the image is not a proper multiple(some NPOT sizes can still be proper multiples of the filter, but it is no longer guaranteed in all NPOT cases). Texture coordinate calculations and texture wrapping modes (floor/ceiling) mathematics is much simpler and can be done with bit shifts rather than multiplication and division. Another is that you can guarantee memory alignment. Etc., etc., etc.
For almost ten years now, newer graphics libraries require that implementations support NPOT textures.
Having NPOT has many indirect effects. Some of these include mipmap generation handling edges differently, texture compression may require encoding of blank space, wrapping and border functionality gets more complex, LOD filtering gets more complex, and there is a loss of compatibility with various algorithms that are optimized for POT sizes.
But getting back to your point, it is going to do with your specific algorithm and your specific implementation details in your code.
You would need to share a lot more details. If you are allowing the video card to do everything and you have a small number of sprites, the video card can easily handle a few hundred sprites, even a 4000-pixel wide background sprite is easy for the card. If instead you are running everything on the CPU then you could easily be trashing your CPU cache with your graphics data, or consuming all your memory bandwidth by moving images back and forth into memory, or having memory access patterns that are cache unfriendly.