Perhaps not half-way, but you're starting out correctly by converting the screen position at the mouse click to a "ray" in world coordinates.
Assuming what you call "ground" is a mesh of triangles, you're trying to determine the position in that mesh under the mouse position. As you seem to understand, that's calling "picking." You can google for further information. E.g., "pick triangle," "directx 11 picking," and various combinations of those terms.
From where you are, the next step is to determine the triangles in the ground mesh that that ray intersects. There's quite a bit of processing to do that.
Briefly:
- set a variable (e.g.) minDistance = FLT_MAX.
- for every triangle in the mesh you want to pick, determine if the ray intersects that face (triangle)
- if so, calculate the barycentric coordinates of that intersection.
- from those barycentric coordinates, calculate the world position of the intersection.
- calculate the distance from that point to the ray origin.
- if that distance is less than minDistance:
- save the (x,y,z) position calculated from the barycentric coords
- set minDistance to that calculated distance
If any intersection was found, the saved (x,y,z) position is the one you seek. If you need it, minDistance provides the distance from the camera to the picked point.
There are several approaches to speed up the process, particularly in the step determining if the ray intersects the triangle.
However, for starters, I recommend you take a look at a tutorial for picking such as this link. When you get that picking algorithm correct, if you need to speed up the process, look at ways to cull triangles before calculating the barycentric coords.