I will outline a method I used in a similar situation, though in my case:

- The camera/projection was not constrained: it could rotate in all 3 axes.

- The tiles corresponded to a height map terrain.

1. Obtain a world-space representation of view frustum either by computing it directly with view&projection data or extracting it from a view&projection matrix.

2. Clip the segments that join the near plane to the far plane against a plane representing the minimum height of the tiles.

3. Discard the height component of the end points of the clipped segments (i.e. project the end points onto a 2d plane.)

4. With the above 2D points you can do the following interesting things:

- Compute the 2D AABB which contains the visible tiles.

- Compute a 2D convex hull containing the visible tiles.

- From the convex hull compute a minimum 2D OBB of the visible tiles.

In your case you will only want the 2D AABB and you will want to "round up" to integer extents so that you include all the edge tiles.

If the camera is constrained as you have described then the 2D AABB, 2D convex hull and 2D OBB should all be identical.