I think your example about the two villages is a good example. If the temple you mention is never going to be used anywhere else, even in different form, go ahead and model it, and just place it in your world. And about the two similar looking but different villages, that is exactly one of the good advantages of using the modular assets.
Poly count, it really depends on your game and your target audience, and how good of computers you expect that target audience to have. The actual number of polys is nowhere near as important as the shaders that are used on said polys. A high end deferred pipeline/shader like UE4s or Unity's PBR shaders will be more expensive than one of Unity's older forward rendering shaders, even if you use less polys on the PBR shaders and more on the forward rendered polys.
A good rule and thing to get used to with modelling is to get used to only putting polys where you need it, but don't be stingy either. If you have a nice flat surface, you don't want to put a ton of polys on it if you can fill it with just a few and have it look the same. But...you need enough geometry to get the curvature of your model. And you need extra loops for anywhere geometry deforms during animation, like shoulders, knees, elbows, etc... There is no perfect way to make things happen, as all aspects of gamedev, even in AAA studios, involve making things "close enough, good enough." Modern human models can easily range from 3000 polys to 30,000 polys or even more. Note that the less actual models in your game at any time, the more polys you can make them have(although that doesn't mean you should, especially if your target audience isn't likely to have computers that can handle it). By the same token, if your game is full of action and hordes of enemies, you are going to have to have less geometry per enemy.