I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...
If you were writing out 32bit ints (where both the hi and lo bits contain 16bit values) that would work. However I suspect the OP is wanting to discard the upper 16bits from each 32bit result, before writing to the array of shorts.....
If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)
With SSE2 that gets a little bit harder. One way would be to use _mm_shufflelo_epi16 / _mm_shufflehi_epi16 to move the 4 x 16bit results next to each other (i.e. leaving V0, V1, 0, 0, V2, V3, 0, 0). Do that for the next 4, and then a _mm_shuffle_ps (with casts!) to combine into a final register of 8 x 16bit results. (5 ops). Worst case scenario there is always _mm_extract_epi16..... ;)
I'm assuming someone will probably come along with a better suggestion.... ;)