You're throwing away a lot of advantages of a tile map.
IMHO: I always cringe when I see someone defining a tile class for a simple tile map.
For one, the indices of your tile array are all you need for positions. The tiles themselves don't need any, since you can deduce the position from the indices already.
Determining the tiles to draw is a simple calculation. With camera offset and screen size you can calculate the starting end ending indices of tiles on the screen.
Adding entities to tiles sounds good on a glance, but you already saw some disadvantages yourself. Also, when entities move you'll be constantly updating tile pointers. When an entity is on the move between tiles, which tile do you add it onto? Which tiles entities do you check for collision when an entity moves?
I'd put entities in a separate list, optionally a spatial container. This lets you reduce overdraw easily.
What's left for the tiles? Indices to the image. These could be stored directly, so you end up with a much simpler two dimensional array.