How is that going to work when I can move the map off screen?
Because you need to calculate what portion of the map to display after all that panning (usually called the view transformation), you can do the inverse calculation to relate the screen origin back to the reference for the map. Its just a generic answer, I know, but you haven't provided enough details for a more detailed answer.
So far I am thinking the color method is the best and simplest way to go about this... with rgb and 256 colors per channel you have what... ~16.7million colors available for units?
Yes, 24 bits give you 2^24 possible values. However, you need to use another rendering pass. You have to ensure pixel perfect masks, e.g. no AA that blends your color masks. You have to ensure that texels with alpha below 0.5 (i.e. more transparent than opaque) will be discarded, so that transparent texels do not disturb the color mask. And you need to read back the render target to get access on the CPU side. So, after all, the algorithm is probably still less sophisticated, but it does not come for free, and whether testing bounding volumes or looking at pixels is more performant in the end would need to be investigated further.