So if I understand correctly, you're using an orthographic "camera" set above the terrain in worldspace that looks down on the minimap/player. And you move this camera closer to, or further from, the ground to render the minimap content. Currently, your waypoint marker, which sits out in worldspace, is seen by the camera in the render pass of the terrain. If that's incorrect, disregard everything I'm about to say.
Moving the camera closer to the ground changes the range of the minimap boundaries (i.e. at "normal" zoom, you can encompass 100x100 world-space units, and at "max" zoom it encompasses 10x10 units). If a waypoint is sitting at (10,10) and the minimap is centered over the origin, then the waypoint's draw position on the minimap will be at 10% up and 10% right if you're at normal zoom, but it'll be 100% up and 100% right at max zoom. That's the calculation I'm talking about for determining the waypoint's draw location. Granted this is slightly more work if you're using a smooth-step zoom function, versus fixed increment zoom levels.
As for the rendering scale: obviously the camera moving closer to the terrain (where the waypoint marker sits) will make the marker appear larger. So if you want the waypoint to appear the same size, it can't be rendered with the same view*projection matrix that the minimap camera uses. I'd just make a one-off view*projection matrix that's fixed above the waypoint's object-space coordinates and either set it high enough over the waypoint model that it gives the waypoint the right scale, or scale the waypoint model down for the draw. The translation matrix applied to the waypoint should be calculated so that the waypoint is drawn at the correct position on the minimap render target based on the minimap's zoom level and the minimap camera's position, as mentioned above.
I'll come back and edit with a graphic in a minute, I realize it's kind of a visual thing. Also, this is just how I would accomplish it (had to do something similar for my map editor to draw brush borders and translation grips) but isn't the
way to do it.