You need to build your rendering, sound, and input hardware around your arduino, then write the software routines to interface with it and do what you need. Arduino's are neat, but they're not terribly powerful, and unless you have hardware to actually offload the time-consuming task of generating the video signal, you're literally only going to have about 3% of the CPU to actually run the game. Uzebox is something quite close to what you'd end up with, using fairly minimal hardware. You can get good results with exceedingly clever programming, but the programming model ends up looking much like an old game console, where graphics are controlled by writing to specific memory locations rather than discrete blits, and you face similar 'hard' limits on the complexity of your scene. IIRC, the Uzebox has something like 6 CPU cycles per pixel to generate a color, including any reads from memory. even drawing the background color, (potentially) overlaying a sprite, and resolving the palette color consumes that many instructions.
You're just not going to get anything approaching SDL out of an arduino without complex additional hardware -- simple graphics and sound are doable with single hardware (like R2R dacs, video encoder), but its not going to resemble SDL at all. The practical minimum for anything resembling SDL is probably hardware with fast DMA and enough RAM to support a framebuffer, together with a video encoder IC.