There are various APIs to simplify making sounds play etc but if you want *low level*, then most platforms offer the following scheme:
1) You get from the API some access to a primary sound output buffer (which may again be mixed with other stuff by the OS but you don't need to worry about this). In the case of e.g. a typical 16 bit stereo 44.1khz this might be a sequence of 'tiles' of memory which will be played in order, when it gets to the end it loops back and starts from the first tile again.
2) Your entire job for making audio would then to be to keep this sequence of tiles filled with audio data to play. Most APIs have something like a callback to tell you when need to fill a tile, as it can happen at any time, not just when your game thread is active - if you don't fill a tile on time you will get audio corruption and glitches.
As to what sound you want to play into these looping tiles it is up to you. It could be prerecorded music etc. In the usual case of sound effects for a game, you would make your own list of sounds that are currently playing, how far they are through, etc, and copy the relevant sound data across to the primary buffer when required. You can also add audio effects, reverb, delay, chorus etc depending on your audio chops.
Note that the size of each tile will determine the latency, smaller tiles there will be less gap between playing a sound in your game and it 'appearing' in the sound output, but means there will be more calls to the callback and more 'housekeeping' code. Having a larger overall primary buffer will mean less chance of starving the tiles and audio glitches.
If all this sounds complex, then that is what the various sound 'engine' APIs offer you, perhaps a bit less control but it does all this stuff for you. If you understand how audio works though it is pretty simple stuff (I personally haven't done 3d sound or doppler and a few things like that though), and it makes porting to a different platform a doddle.