In the current APIs individual draw calls are expensive because for each draw call the API/Driver are potentially changing and verifying a lot of state. Batching similar draws (say, all the grass tiles, or better yet, all the tiles from the same, large texture atlas, using the same shader, etc) is one way of reducing the number of draw calls you have to make -- thus, the API/Driver overhead is amortized across all those tiles that would otherwise be drawn individually in a naive renderer. Realistically, on a modern desktop or laptop with current APIs you get a couple thousand draw calls before your CPU is completely swamped by the overhead. If you have a game running at just 640x480, using small tiles of 16x16 pixels, drawing just one densely-populated layer of tiles consumes 1200 draw calls if you do them individually. Figure two more sparsely-populated layers for objects and overhead graphics add 50% on top of that. You're 640x480 game has already consumed half of available draw calls per frame -- Now draw lots of characters, throw in some particles and UI -- you're already probably at or around the comfortable limits if your game does any interesting processing, and you haven't drawn a single off-screen tile or entity. Drawing half a screen-width extra in all directions multiplies the cost by 4x and you're way over your draw call budget.
On mobile platforms using mostly OpenGL ES, you can expect to make half or fewer draw calls to stay in budget.
Its mostly batching that's important if you're using a 3D API -- Once the GPU gets a hold of the draw call it'll chew through clipped pixels like nobody's business, and it'll reject them before running expensive pixel shader code. There's no point sending stuff to the GPU that you easily know is not in view, but you don't have to worry about being tile or pixel-perfect about it. Batching will save you far more.
As an aside, new style APIs like Mantle, D3D12, and the console APIs aren't so affected by draw call counts, and have other features to keep re-usable draw commands on the GPU to reduce overhead even further. In statistical analysis, the D3D12 team showed that nearly all games re-use 90% of their draw commands frame-over-frame, so my understanding is that these APIs make it possible to just re-use the command with slightly different properties (say, its transform matrix or lighting properties.), rather than rebuilding the command, sending it to the GPU and verifying it each frame.