I have written a software 3D sound engine during college.
As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers.
This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.
Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.
Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.
Great info, I've been meaning to look into this for awhile myself.
A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?
In my experience with the waveOutOpen() family of functions you need a really big buffer to avoid gapless playback, making low-latency audio impossible. The reason behind this is that this API is not a callback-based API, whereas more advanced APIs like WASAPI on Windows and CoreAudio on OS X allow you to register a callback method which is called from the main system audio thread whenever output audio is needed. The OS/driver maintains the buffer for you and synchronizes the callback so that there is only a few ms of latency between your code and the hardware.