Screenshot 5 looks odd, that does suggest a viewport type issue.
Are screenshot 1 and 2 correct or should the quad in screenshot 2 have be the same size but clipped?
How does a user interact with this? How do they make it zoom, do they drag the view around to move or do they click?
I can't really say what the exact problem is as it's probably a combination of things probably all spread out. I would approach this by trying to boil it all down to having 3 basic values, the zoom level (1x, 2x, etc) and an x/y position which you want to put at the centre of the window. Then I'd make a method that uses just those 3 values and the display size and builds both a projection matrix and a view matrix from it. Those are probably all you need to do it, you can probably get away without a view matrix too as the orthogonal projection can store that quite easily. It's probably more efficient to use only the projection matrix since you are just using 2D.
If any of the values change, such as the screensize, or the magnification or the positions then just call that method which will correctly build your projection. I am not familiar with DirectX but when your display size changes you may need to update the viewport or similar too.
How you update those values depends on how the user interacts so I can't say much but you are definitely on the right track where you convert window space to world space values as doing that makes life much easier.
I think your current solution is more complicated than it needs to be but I certainly don't have as good an overview of your project as you do.